Off-loading heavy process

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Off-loading heavy process

Rob Sargent
My apologies if this is too vague to warrant consideration.

In the recent past I managed a naked port, a Selector, a
ThreadPoolExecutor and friends (and it worked well enough...) but a dear
and knowledgeable friend suggested embedding tomcat and using http.[3]

I have that working, one request at a time.

Now to get to "production level":   I have two distinct types of
requests which might be problematic:  one is rather large[1] and the
other rather prolific[2].  Both types are handled by the same servlet. 
The data structure is identical, just much less in case[2].

Is it advisable, practical to (re)establish a ThreadPoolExecutor, queue
etc as a tomcat accessible "Resource" with JDNI lookup, and have my
servlet pass the work off to the Executor's queue?

Any pointers appreciated,

rjs


    [1] Comprised of a data block which occasionally has exceeded java
    max array length,  gzipped streams helped a lot.  There are few of
    these requests (22 per project).  They last days and can "collide"
    in terms of when they finish, though by and large they naturally
    spread themselves out.

    [2] Given infinite EC2 capacity there would be tens of thousands of
    jobs started at once.  Realistic AWS capacity constraints limit this
    to hundreds of instances from a queue of thousands.  The duration of
    any instance varies from hours to days.  But the payload is simple,
    under 5K bytes.

    [3] I'm not entirely confident the original implementation handled
    all "requests" properly.



Reply | Threaded
Open this post in threaded view
|

Re: Off-loading heavy process

Christopher Schultz-2
Rob,

On 12/9/20 23:58, Rob Sargent wrote:
> My apologies if this is too vague to warrant consideration.

It is vague, but we can always ask questions :)

> In the recent past I managed a naked port, a Selector, a
> ThreadPoolExecutor and friends (and it worked well enough...) but a dear
> and knowledgeable friend suggested embedding tomcat and using http.[3]

Managing that yourself can be a pain. The downside to using Tomcat is
that you have to use HTTP. But maybe that can work in your favor in
certain cases, especially if you have other options (e.g. upgrade from
HTTP to Websocket after connection.)

 > I have that working, one request at a time.

Great.

> Now to get to "production level":   I have two distinct types of
> requests which might be problematic:  one is rather large[1] and the
> other rather prolific[2].  Both types are handled by the same servlet.
> The data structure is identical, just much less in case[2].

With respect to your footnotes, you can pretty much ignore anything that
requires more than 1 host, so lets talk about the individual requests a
single instance of the server would expect. To confirm, you have some
requests that are huge and others that are ... not huge? The contents
don't really matter, honestly.

Tomcat can handle both huge and non-huge requests without a problem.
When you implement your servlet, you simply get an InputStream and do
whatever you want. Same thing with responses to the client: get an
OutputStream and write away.

> Is it advisable, practical to (re)establish a ThreadPoolExecutor, queue
> etc as a tomcat accessible "Resource" with JDNI lookup, and have my
> servlet pass the work off to the Executor's queue?

I don't really understand this at all. Are you asking how to mitigate a
self-inflected DOS because you have so many incoming connections?

If you have enough hardware to satisfy the requests, and your usage
pattern is as you suggest, then you will mostly have one or two huge
requests in-flight at any time, and some large number of smaller
(faster?) requests also in-flight at the same time.

Again, no problem. You are constrained only by the resources you have
available:

1. Memory
2. Maximum connections
3. Maximum threads (in your executor / request-processor thread pool)

If you have data that doesn't fix into byte[MAXINT] then maybe you don't
want to try to handle it all at once. That's an application design
decision and if gzipping helps you in the short term, then great. My
recommendation would be to look at ways of handling that request in a
streaming-fashion instead of buffering everything up in memory. The
overall performance of your application will likely improve because of
that change.

> [2] Given infinite EC2 capacity there would be tens of thousands of
> jobs started at once.  Realistic AWS capacity constraints limit this
> to hundreds of instances from a queue of thousands.  The duration of
> any instance varies from hours to days.  But the payload is simple,
> under 5K bytes.
If you are using AWS, then you can load-balance between any number of
back-end Tomcat instances. The lb just has to decide which back-end
instance to use. Sometimes lbs make bad decisions and you get all the
load on a single node. That's bad because (a) one node is overloaded and
(b) the other nodes are under-utilized. It's up to you to figure out how
to get your load distributed in an equitable way.

Back to the problem you are actually trying to solve.

Are these "small requests" in any way a problem? Do they arrive
frequently and are they handled quickly? If so, then you can probably
mostly just ignore them. It's the huge requests that are (likely) the
problem.

If you want to hand-off control of a request to another thread for
processing, there are a few ways to do that I can think of:

1. new Thread(new Runnable() { /* your stuff */ }).start();

This is bad for a few of reasons:

1a. Unlimited threads created by remote clients? Bad.

1b. Call to Thread.start() returns immediately and the servlet's
execution ends. There is no way to reply to the client's, and if you
haven't read all their input, Bad Things will happen in Tomcat. (Like,
your request and response objects will be re-used and you'll observe
mass-chaos).

2. sharedExecutor.submit(new Runnable() { /* your stuff */ });

This is bad for the same reason as 1b above, but it does not suffer from
1a. 1a is now replaced by:

2a. Unlimited jobs submitted by remote clients? Bad.

3. Use servlet async processing.

I think this is probably ideal for your use-case. When you go into
asynchronous mode, the request-processing thread is allowed to go back
and service other requests, so you get a more responsive server, at
least from your clients' perspectives.

The bad news is that asynchronous mode requires that you completely
change the way you think about communicating with a client. Instead of
reading the input until you are satisfied, performing your
business-logic, then writing to the client until you are done, you have
to subscribe to I/O events and handle them all appropriately. If you get
it wrong, you can make a mess of things.

It would help to understand the nature of what has to happen with the
data once it's received by your servlet. Does the processing take a long
time, or is it mostly the data-upload that takes forever? Is there a way
for the client to cut-up the work in any way? Are there other ways for
the client to get a response? Making an HTTP connection and then waiting
18 hours for a response is ... not how HTTP is supposed to work.

Long ago I worked on a product where we had to perform some long
operation on the server and client browsers would time-out. (This was a
web browser, so they have timeouts of like 60 seconds or so. They aren't
custom clients where you say just say "wait 18 hours for a response.").

Our solution was to accept the job via HTTP request and respond saying
"okay, not done yet". The expectation was that the client would them
poll to see if the job was done. We even provided a response saying "not
done yet, 50% done" or something like that.

On the server-side, we implemented it like this (roughly):

servlet {
   get {
     if(isNewJob(request)) {
       Job job = new Job();
       job.populate(request.getInputStream());
       if(job.isOkay()) {
         request.getSession().setAttribute("job", job);

         sharedThreadPool.submit(job);
       } else {
         return "500 Internal Server Error";
       }
     } else if(isJobCheck(request)) {
       Job job = (Job)request.getSession().getAttribute("job");

       if(job.isDone()) {
         return "200 Job all done";
       } else {
         return "200 Job running, " + job.getPercentDone() + " done.";
       }
     }
   }
}

And then the job would just do this:

Job {
   private volatile boolean done = false;

   public void run() {
     /* do long-running stuff */
     setDone(true);
   }

   public boolean getDone() { return done; }
   protected void setDone(boolean done) { this.done = done; }
   public int getPercentDone() { return whatever; }
}

The above is super psuedo-code and ignores a whole bunch of details
(e.g. multiple jobs per client, error handling, etc.). The client is
required to maintain session-state using either cookies or jsessionid
path parameters. (At least, that's how we implemented it.) Instead of
the session, you could store your jobs someplace accessible to all
requests such as the application (ServletContext) and give them all
unique (and difficult to guess) ids.

This allows the server to make progress even if the client times out, or
loses a network connection, or power, or whatever. It also allows the
server to use that network connection used to submit the initial job for
accepting other connections. And you don't have to use servlet async and
re-write your whole process.

I don't know if any of this helps, but I thik it will get you thinking
in a more HTTP-style way rather than a connection-oriented service like
the one you had originally implemented.

-chris

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Off-loading heavy process

Rob Sargent
Chris,
Thank you for the completeness.  I always miss on getting the correct detail (too little, too much) in my postings.

> On Dec 11, 2020, at 11:31 AM, Christopher Schultz <[hidden email]> wrote:
>
> Rob,
>
> On 12/9/20 23:58, Rob Sargent wrote:
>> My apologies if this is too vague to warrant consideration.
>
> It is vague, but we can always ask questions :)
>
>> In the recent past I managed a naked port, a Selector, a ThreadPoolExecutor and friends (and it worked well enough...) but a dear and knowledgeable friend suggested embedding tomcat and using http.[3]
>
> Managing that yourself can be a pain. The downside to using Tomcat is that you have to use HTTP. But maybe that can work in your favor in certain cases, especially if you have other options (e.g. upgrade from HTTP to Websocket after connection.)
>
> > I have that working, one request at a time.
>
> Great.
>
>> Is it advisable, practical to (re)establish a ThreadPoolExecutor, queue etc as a tomcat accessible "Resource" with JDNI lookup, and have my servlet pass the work off to the Executor's queue?
>
> I don't really understand this at all. Are you asking how to mitigate a self-inflected DOS because you have so many incoming connections?
>
> If you have enough hardware to satisfy the requests, and your usage pattern is as you suggest, then you will mostly have one or two huge requests in-flight at any time, and some large number of smaller (faster?) requests also in-flight at the same time.
>
> Again, no problem. You are constrained only by the resources you have available:
>
> 1. Memory
> 2. Maximum connections
> 3. Maximum threads (in your executor / request-processor thread pool)
>

For the large request, the middle-ware (my impl, or tomcat layer) takes the payload from the client an writes “lots” of records in the database.  Do I want that save() call in the servlet or should I queue it up for some other handler. All on the same hardware, but that frees up the servlet.  In the small client (my self-made DOS), there’s only a handful of writes, but still faster to hand that memory to a queue and let the servlet go back to the storm.

That’s the thinking behind the question of accessing a ThreadPoolExecutor via JDNI.  I know my existing impl does queue jobs so (so the load is greater than the capacity to handle requests).  I worry that without off-loading Tomcat would just spin up more servlet threads, exhaust resources.  I can lose a client, but would rather not lose the server (that looses all clients...)


> If you have data that doesn't fix into byte[MAXINT] then maybe you don't want to try to handle it all at once. That's an application design decision and if gzipping helps you in the short term, then great. My recommendation would be to look at ways of handling that request in a streaming-fashion instead of buffering everything up in memory. The overall performance of your application will likely improve because of that change.

Re-working the structure of the payload (break it up) is an option, but not a pleasant one :)

>
>> [2] Given infinite EC2 capacity there would be tens of thousands of jobs started at once.  Realistic AWS capacity constraints limit this to hundreds of instances from a queue of thousands.  The duration of any instance varies from hours to days.  But the payload is simple, under 5K bytes.
> If you are using AWS, then you can load-balance between any number of back-end Tomcat instances. The lb just has to decide which back-end instance to use. Sometimes lbs make bad decisions and you get all the load on a single node. That's bad because (a) one node is overloaded and (b) the other nodes are under-utilized. It's up to you to figure out how to get your load distributed in an equitable way.
>
Not anxious to add more Tomcat instances.  I can manually throttle both types of requests for now.

> Back to the problem you are actually trying to solve.
>
> Are these "small requests" in any way a problem? Do they arrive frequently and are they handled quickly? If so, then you can probably mostly just ignore them. It's the huge requests that are (likely) the problem.
>


> If you want to hand-off control of a request to another thread for processing, there are a few ways to do that I can think of:
>
> 1. new Thread(new Runnable() { /* your stuff */ }).start();
>
> This is bad for a few of reasons:
>
> 1a. Unlimited threads created by remote clients? Bad.
>
> 1b. Call to Thread.start() returns immediately and the servlet's execution ends. There is no way to reply to the client's, and if you haven't read all their input, Bad Things will happen in Tomcat. (Like, your request and response objects will be re-used and you'll observe mass-chaos).
>
> 2. sharedExecutor.submit(new Runnable() { /* your stuff */ });
>
> This is bad for the same reason as 1b above, but it does not suffer from 1a. 1a is now replaced by:
>
> 2a. Unlimited jobs submitted by remote clients? Bad.
>
> 3. Use servlet async processing.
>
> I think this is probably ideal for your use-case. When you go into asynchronous mode, the request-processing thread is allowed to go back and service other requests, so you get a more responsive server, at least from your clients' perspectives.
>
> The bad news is that asynchronous mode requires that you completely change the way you think about communicating with a client. Instead of reading the input until you are satisfied, performing your business-logic, then writing to the client until you are done, you have to subscribe to I/O events and handle them all appropriately. If you get it wrong, you can make a mess of things.
>
Here’s where my ignorance will really shine.  Are you talking about HttpClient.sendAsync or are you talking about a Tomcat mode of operation?
The clients die as soon as they send the request.  I wait for an ok, but don’t really have to.  (Would be nice if I could take statusCode() != 200 and write to a file (as I used to) but I can lose a client or two, occasionally.  Release 2.1 :) )
There’s very little two-way communication:  AWS queue starts clients. Clients ask for data to work on, do a bunch of simulations, send results to middleware; middle-ware make db calls.
Client does not know about db.


> It would help to understand the nature of what has to happen with the data once it's received by your servlet. Does the processing take a long time, or is it mostly the data-upload that takes forever? Is there a way for the client to cut-up the work in any way? Are there other ways for the client to get a response? Making an HTTP connection and then waiting 18 hours for a response is ... not how HTTP is supposed to work.
>
> Long ago I worked on a product where we had to perform some long operation on the server and client browsers would time-out. (This was a web browser, so they have timeouts of like 60 seconds or so. They aren't custom clients where you say just say "wait 18 hours for a response.").

Your “Job” example seems along the lines of get-it-off-the-servlet, which again points back to my current queue handler I think.

> This allows the server to make progress even if the client times out, or loses a network connection, or power, or whatever. It also allows the server to use that network connection used to submit the initial job for accepting other connections. And you don't have to use servlet async and re-write your whole process.
>
> I don't know if any of this helps, but I thik it will get you thinking in a more HTTP-style way rather than a connection-oriented service like the one you had originally implemented.
Helps a ton.  Very thankful for the indulgence.
I hope I’ve given useful responses.

Next up, is SSL.  One of the reason’s I must switch from my naked socket impl.


rjs


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Off-loading heavy process

Christopher Schultz-2
Rob,

On 12/11/20 15:00, Rob Sargent wrote:
 > [huge snip]
>
> Your “Job” example seems along the lines of get-it-off-the-servlet,
> which again points back to my current queue handler I think.

Yes, I think so. So let's get back to your original idea -- which I
think is a good one -- to use a shared queue to manage the jobs.

Just to be clear, the servlet is going to reply to the client ASAP by
saying "I have accepted this job and will do my best to complete it", or
it will return an error (see below), or it will refuse a connection (see
below). Sound okay so far?

> [My servlet] takes the payload from the client an writes “lots” of
> records in the database.  Do I want that save() call in the servlet
> or should I queue it up for some other handler. All on the same
> hardware, but that frees up the servlet.
If the client doesn't care about job status information, then
fire-and-forget clients is a reasonable methodology. You may find that
at some point, they will want to get some job-status information. You
could implement that, later. Version 2.2 maybe?

On the other hand, if you can process some of the request in a streaming
way, then you can be writing to your database before your client is done
sending the request payload. You can still do that with fire-and-forget,
but it requires some more careful handling of the streams and stuff like
that.

The one thing you cannot do is retain a reference to the request
(response, etc.) after your servlet's service() method ends. Well,
unless you go async but that's a whole different thing which doesn't
sound like what you want to do, now that I have more info.

Calling save() from the servlet would tie-up the request-processing
thread until the save completes. That's where you get your 18-hour
response times, which is not very HTTP-friendly.

Avoiding calling save() from the servlet requires that you fully-read
the request payload before queuing the save() call into a thread pool
bundled-up with your data. (Well, there are some tricks you could use
but they are a little dirty and may not buy you much.)

> In the small client (my self-made DOS), there’s only a handful of
> writes, but still faster to hand that memory to a queue and let the
> servlet go back to the storm.
I would make everything work the same way unless there is a compelling
reason to have different code paths.

> That’s the thinking behind the question of accessing a ThreadPoolExecutor via JDNI.  I know my existing impl does queue jobs so (so the load is greater than the capacity to handle requests).  I worry that without off-loading Tomcat would just spin up more servlet threads, exhaust resources.  I can lose a client, but would rather not lose the server (that looses all clients...)

Agreed: rejecting a single request is preferred over the service coming
down -- and all its in-flight jobs with it.

So I think you want something like this:

servlet {
   post {
     // Buffer all our input data
     long bufferSize = request.getContentLengthLong();
     if(bufferSize > Integer.MAX_VALUE || bufferSize < 0) {
       bufferSize = 8192; // Reasonable default?
     }
     ByteArrayOutputStream buffer = new
ByteArrayOutputStream((int)bufferSize);

     int count;
     byte[] buffer = new byte[8192];
     while(-1 != (count = in.read(buf)) {
         buffer.write(buf, 0, count);
     }

     // All data read: tell the client we are good to go
     Job job = new Job(buffer);
     try {
       sharedExecutor.submit(job); // Fire and forget

       response.setStatus(200); // Ok
     } catch (RejectedExecutionException ree) {
       response.setStatus(503); // Service Unavailable
     }
   }
}

Obviously, the job needs to know how to execute itself (making it
Runnable means you can use the various Executors Java provides). Also,
you need to decide what to do about creating the executor.

I used the ByteArrayOutputStream above to avoid the complexity of
re-scaling buffers in example code. If you have huge buffers and you
need to convert to byte[] at the end, then you are going to need 2x heap
space to do it. Yuck. Consider implementing the auto-re-sizing
byte-array yourself and avoiding ByteArrayOutputStream.

There isn't anything magic about JNDI. You could also put the thread
pool directly into your servvlet:

servlet {
   ThreadPoolExecutor sharedExecutor;
   constructor() {
     sharedExecutor = new ThreadPoolExecutor(...);
   }
   ...
}

You get to choose the parameters for the thread pool executor. I think
you probably want to limit the number of jobs to something "reasonable".
You may even want to have separate executors for different kinds of jobs:

servlet {
   ThreadPoolExecutor smallJobExecutor;
   ThreadPoolExecutor bigJobExecutor;
   constructor() {
     smallJobExecutor = new ThreadPoolExecutor(10, 100);
     bigJobExecutor = new ThreadPoolExecutor(1, 5);
   }
   ...

     Job job = new Job(buffer);
     try {
       if(buffer.size() > SMALL_JOB_MAX_SIZE) {
         bigJobExecutor.submit(job); // Fire and forget
       } else {
         smallJobExecutor.submit(job); // Fire and forget
       }
   ...
}

This will limit you to 5 concurrent big jobs and 100 concurrent small
jobs. You could do the same thing with atomic counters or whatever, but
this way is pretty straightforward, too. It also means that there is
"always" come reserved capacity for big jobs even when you are facing a
huge number of small jobs. That is, the small jobs can't "starve" the
big jobs out of the server simply by submitting lots of small jobs.

If you want to put those executors into JNDI, you are welcome to do so,
but there is no particular reason to. If it's convenient to configure a
thread pool executor via some JNDI injection something-or-other, feel
free to use that.

But ultimately, you are just going to get a reference to the executor
and drop the job on it.

> Next up, is SSL.  One of the reason’s I must switch from my naked socket impl.

Nah, you can do TLS on a naked socket. But I think using Tomcat embedded
(or not) will save you the trouble of having to learn a whole lot and
write a lot of code.

TLS should be fairly easy to get going in Tomcat as long as you already
understand how to create a key+certificate.

-chris

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Off-loading heavy process

Rob Sargent
Chris,

This is _so_ helpful.

On 12/11/20 3:00 PM, Christopher Schultz wrote:

> Rob,
>
> On 12/11/20 15:00, Rob Sargent wrote:
> > [huge snip]
>>
>> Your “Job” example seems along the lines of get-it-off-the-servlet,
>> which again points back to my current queue handler I think.
>
> Yes, I think so. So let's get back to your original idea -- which I
> think is a good one -- to use a shared queue to manage the jobs.
>
> Just to be clear, the servlet is going to reply to the client ASAP by
> saying "I have accepted this job and will do my best to complete it",
> or it will return an error (see below), or it will refuse a connection
> (see below). Sound okay so far?
>
>> [My servlet] takes the payload from the client an writes “lots” of
>> records in the database.  Do I want that save() call in the servlet
>> or should I queue it up for some other handler. All on the same
>> hardware, but that frees up the servlet.
> If the client doesn't care about job status information, then
> fire-and-forget clients is a reasonable methodology. You may find that
> at some point, they will want to get some job-status information. You
> could implement that, later. Version 2.2 maybe?
>
Yeah, my clients are only visible through the AWS console currently. 
Any "progress/dashboard" won't show up 'til version 2.345


> On the other hand, if you can process some of the request in a
> streaming way, then you can be writing to your database before your
> client is done sending the request payload. You can still do that with
> fire-and-forget, but it requires some more careful handling of the
> streams and stuff like that.
>
> The one thing you cannot do is retain a reference to the request
> (response, etc.) after your servlet's service() method ends. Well,
> unless you go async but that's a whole different thing which doesn't
> sound like what you want to do, now that I have more info.
>
> Calling save() from the servlet would tie-up the request-processing
> thread until the save completes. That's where you get your 18-hour
> response times, which is not very HTTP-friendly.
Certainly don't want to pay for 18 EC2 hours of idle.

>
> Avoiding calling save() from the servlet requires that you fully-read
> the request payload before queuing the save() call into a thread pool
> bundled-up with your data. (Well, there are some tricks you could use
> but they are a little dirty and may not buy you much.)
>
>> In the small client (my self-made DOS), there’s only a handful of
>> writes, but still faster to hand that memory to a queue and let the
>> servlet go back to the storm.
> I would make everything work the same way unless there is a compelling
> reason to have different code paths.
>
The two payloads are impls of an a base class. Jackson/ObjectMapper
unravels them to Type. Type.save();


>> That’s the thinking behind the question of accessing a
>> ThreadPoolExecutor via JDNI.  I know my existing impl does queue jobs
>> so (so the load is greater than the capacity to handle requests).  I
>> worry that without off-loading Tomcat would just spin up more servlet
>> threads, exhaust resources.  I can lose a client, but would rather
>> not lose the server (that looses all clients...)
>
> Agreed: rejecting a single request is preferred over the service
> coming down -- and all its in-flight jobs with it.
>
> So I think you want something like this:
>
> servlet {
>   post {
>     // Buffer all our input data
>     long bufferSize = request.getContentLengthLong();
>     if(bufferSize > Integer.MAX_VALUE || bufferSize < 0) {
>       bufferSize = 8192; // Reasonable default?
>     }
>     ByteArrayOutputStream buffer = new
> ByteArrayOutputStream((int)bufferSize);
>
>     int count;
>     byte[] buffer = new byte[8192];
>     while(-1 != (count = in.read(buf)) {
>         buffer.write(buf, 0, count);
>     }
>
>     // All data read: tell the client we are good to go
>     Job job = new Job(buffer);
>     try {
>       sharedExecutor.submit(job); // Fire and forget
>
>       response.setStatus(200); // Ok
>     } catch (RejectedExecutionException ree) {
>       response.setStatus(503); // Service Unavailable
>     }
>   }
> }
>
This is working:

       protected void doPost(HttpServletRequest req, HttpServletResponse
    resp) /*throws ServletException, IOException*/ {
         lookupHostAndPort();

         Connection conn = null;
         try {

           ObjectMapper jsonMapper = JsonMapper.builder().addModule(new
    JavaTimeModule()).build();
           jsonMapper.setSerializationInclusion(Include.NON_NULL);

           try {

             AbstractPayload payload =
    jsonMapper.readValue(req.getInputStream(), AbstractPayload.class);
             logger.error("received payload");
             String redoUrl =
    String.format("jdbc:postgresql://%s:%d/%s", getDbHost(),
    getDbPort(), getDbName(req));
            Connection copyConn = DriverManager.getConnection(redoUrl,
    getDbRole(req), getDbRole(req)+getExtension());
             payload.setConnection(copyConn);
             payload.write();
             //HERE THE CLIENT IS WAITING FOR THE SAVE.  Though there
    can be a lot of data, COPY is blindingly fast
             resp.setContentType("plain/text");
             resp.setStatus(200);
             resp.getOutputStream().write("SGS_OK".getBytes());
             resp.getOutputStream().flush();
             resp.getOutputStream().close();
           }
             //Client can do squat at this point.
           catch
    (com.fasterxml.jackson.databind.exc.MismatchedInputException mie) {
             logger.error("transform failed: " + mie.getMessage());
             resp.setContentType("plain/text");
             resp.setStatus(461);
             String emsg = "PAYLOAD NOT
    SAVED\n%s\n".format(mie.getMessage());
             resp.getOutputStream().write(emsg.getBytes());
             resp.getOutputStream().flush();
             resp.getOutputStream().close();
           }
         }
         catch (IOException | SQLException ioe) {
         etc }

> Obviously, the job needs to know how to execute itself (making it
> Runnable means you can use the various Executors Java provides). Also,
> you need to decide what to do about creating the executor.
>
> I used the ByteArrayOutputStream above to avoid the complexity of
> re-scaling buffers in example code. If you have huge buffers and you
> need to convert to byte[] at the end, then you are going to need 2x
> heap space to do it. Yuck. Consider implementing the auto-re-sizing
> byte-array yourself and avoiding ByteArrayOutputStream.
>
> There isn't anything magic about JNDI. You could also put the thread
> pool directly into your servvlet:
>
> servlet {
>   ThreadPoolExecutor sharedExecutor;
>   constructor() {
>     sharedExecutor = new ThreadPoolExecutor(...);
>   }
>   ...
> }
>
Yes, I see now that the single real instance of the servlet can master
the sharedExcutor.

I have reliable threadpool code at hand.  I don't need to separate the
job types:  In practice all the big ones are done first: they define the
small ones.  It's when I'm spectacularly successful and two (2)
investigators want to use the system ...

> If you want to put those executors into JNDI, you are welcome to do
> so, but there is no particular reason to. If it's convenient to
> configure a thread pool executor via some JNDI injection
> something-or-other, feel free to use that.
>
> But ultimately, you are just going to get a reference to the executor
> and drop the job on it.
>
>> Next up, is SSL.  One of the reason’s I must switch from my naked
>> socket impl.
>
> Nah, you can do TLS on a naked socket. But I think using Tomcat
> embedded (or not) will save you the trouble of having to learn a whole
> lot and write a lot of code.
>
No thanks.
> TLS should be fairly easy to get going in Tomcat as long as you
> already understand how to create a key+certificate.
>
I've made keys/certs in previous lives (not to say I understand them). 
I'm waiting to hear on whether or not I'll be able to self-sign etc. 
Talking to AWS Monday on the things security/HIPAA

I'm sure I'll be back, but I think I can move forward.  Much appreciated.

rjs



Reply | Threaded
Open this post in threaded view
|

Re: Off-loading heavy process

Christopher Schultz-2
Rob,

On 12/11/20 18:52, Rob Sargent wrote:

> Chris,
>
> This is _so_ helpful.
>
> On 12/11/20 3:00 PM, Christopher Schultz wrote:
>> Rob,
>>
>> On 12/11/20 15:00, Rob Sargent wrote:
>> > [huge snip]
>>>
>>> Your “Job” example seems along the lines of get-it-off-the-servlet,
>>> which again points back to my current queue handler I think.
>>
>> Yes, I think so. So let's get back to your original idea -- which I
>> think is a good one -- to use a shared queue to manage the jobs.
>>
>> Just to be clear, the servlet is going to reply to the client ASAP by
>> saying "I have accepted this job and will do my best to complete it",
>> or it will return an error (see below), or it will refuse a connection
>> (see below). Sound okay so far?
>>
>>> [My servlet] takes the payload from the client an writes “lots” of
>>> records in the database.  Do I want that save() call in the servlet
>>> or should I queue it up for some other handler. All on the same
>>> hardware, but that frees up the servlet.
>> If the client doesn't care about job status information, then
>> fire-and-forget clients is a reasonable methodology. You may find that
>> at some point, they will want to get some job-status information. You
>> could implement that, later. Version 2.2 maybe?
>>
> Yeah, my clients are only visible through the AWS console currently. Any
> "progress/dashboard" won't show up 'til version 2.345
>
>
>> On the other hand, if you can process some of the request in a
>> streaming way, then you can be writing to your database before your
>> client is done sending the request payload. You can still do that with
>> fire-and-forget, but it requires some more careful handling of the
>> streams and stuff like that.
>>
>> The one thing you cannot do is retain a reference to the request
>> (response, etc.) after your servlet's service() method ends. Well,
>> unless you go async but that's a whole different thing which doesn't
>> sound like what you want to do, now that I have more info.
>>
>> Calling save() from the servlet would tie-up the request-processing
>> thread until the save completes. That's where you get your 18-hour
>> response times, which is not very HTTP-friendly.
> Certainly don't want to pay for 18 EC2 hours of idle.

So your clients spin-up an EC2 instance just to send the request to your
server? That sounds odd.

>> Avoiding calling save() from the servlet requires that you fully-read
>> the request payload before queuing the save() call into a thread pool
>> bundled-up with your data. (Well, there are some tricks you could use
>> but they are a little dirty and may not buy you much.)
>>
>>> In the small client (my self-made DOS), there’s only a handful of
>>> writes, but still faster to hand that memory to a queue and let the
>>> servlet go back to the storm.
>> I would make everything work the same way unless there is a compelling
>> reason to have different code paths.
>>
> The two payloads are impls of an a base class. Jackson/ObjectMapper
> unravels them to Type. Type.save();

Okay, so it looks like Type.save() is what needs to be called in the
separate thread (well, submitted to a job scheduler; just get it off the
request processing thread so you can return 200 response to the client).

>>> That’s the thinking behind the question of accessing a
>>> ThreadPoolExecutor via JDNI.  I know my existing impl does queue jobs
>>> so (so the load is greater than the capacity to handle requests).  I
>>> worry that without off-loading Tomcat would just spin up more servlet
>>> threads, exhaust resources.  I can lose a client, but would rather
>>> not lose the server (that looses all clients...)
>>
>> Agreed: rejecting a single request is preferred over the service
>> coming down -- and all its in-flight jobs with it.
>>
>> So I think you want something like this:
>>
>> servlet {
>>   post {
>>     // Buffer all our input data
>>     long bufferSize = request.getContentLengthLong();
>>     if(bufferSize > Integer.MAX_VALUE || bufferSize < 0) {
>>       bufferSize = 8192; // Reasonable default?
>>     }
>>     ByteArrayOutputStream buffer = new
>> ByteArrayOutputStream((int)bufferSize);
>>
>>     int count;
>>     byte[] buffer = new byte[8192];
>>     while(-1 != (count = in.read(buf)) {
>>         buffer.write(buf, 0, count);
>>     }
>>
>>     // All data read: tell the client we are good to go
>>     Job job = new Job(buffer);
>>     try {
>>       sharedExecutor.submit(job); // Fire and forget
>>
>>       response.setStatus(200); // Ok
>>     } catch (RejectedExecutionException ree) {
>>       response.setStatus(503); // Service Unavailable
>>     }
>>   }
>> }
>>
> This is working:
>
>        protected void doPost(HttpServletRequest req, HttpServletResponse
>     resp) /*throws ServletException, IOException*/ {
>          lookupHostAndPort();
>
>          Connection conn = null;
>          try {
>
>            ObjectMapper jsonMapper = JsonMapper.builder().addModule(new
>     JavaTimeModule()).build();
>            jsonMapper.setSerializationInclusion(Include.NON_NULL);
>
>            try {
>
>              AbstractPayload payload =
>     jsonMapper.readValue(req.getInputStream(), AbstractPayload.class);
>              logger.error("received payload");
>              String redoUrl =
>     String.format("jdbc:postgresql://%s:%d/%s", getDbHost(),
>     getDbPort(), getDbName(req));
>             Connection copyConn = DriverManager.getConnection(redoUrl,
>     getDbRole(req), getDbRole(req)+getExtension());

So it's here you cannot pool the connections? What about:

     Context ctx = new InitialContext();

     DataSource ds = (DataSource)ctx.lookup("java:/comp/env/jdbc/" +
getJNDIName(req));

Then you can define your per-user connection pools in JNDI and get the
benefit of connection-pooling.

>              payload.setConnection(copyConn);
>              payload.write();

Is the above call the one that takes hours?

>              //HERE THE CLIENT IS WAITING FOR THE SAVE.  Though there
>     can be a lot of data, COPY is blindingly fast

Maybe the payload.write() is not slow. Maybe? After this you don't do
anything else...

>              resp.setContentType("plain/text");
>              resp.setStatus(200);
>              resp.getOutputStream().write("SGS_OK".getBytes());
>              resp.getOutputStream().flush();
>              resp.getOutputStream().close();
>            }
>              //Client can do squat at this point.
>            catch
>     (com.fasterxml.jackson.databind.exc.MismatchedInputException mie) {
>              logger.error("transform failed: " + mie.getMessage());
>              resp.setContentType("plain/text");
>              resp.setStatus(461);
>              String emsg = "PAYLOAD NOT
>     SAVED\n%s\n".format(mie.getMessage());
>              resp.getOutputStream().write(emsg.getBytes());
>              resp.getOutputStream().flush();
>              resp.getOutputStream().close();
>            }
>          }
>          catch (IOException | SQLException ioe) {
>          etc }
>
>> Obviously, the job needs to know how to execute itself (making it
>> Runnable means you can use the various Executors Java provides). Also,
>> you need to decide what to do about creating the executor.
>>
>> I used the ByteArrayOutputStream above to avoid the complexity of
>> re-scaling buffers in example code. If you have huge buffers and you
>> need to convert to byte[] at the end, then you are going to need 2x
>> heap space to do it. Yuck. Consider implementing the auto-re-sizing
>> byte-array yourself and avoiding ByteArrayOutputStream.
>>
>> There isn't anything magic about JNDI. You could also put the thread
>> pool directly into your servvlet:
>>
>> servlet {
>>   ThreadPoolExecutor sharedExecutor;
>>   constructor() {
>>     sharedExecutor = new ThreadPoolExecutor(...);
>>   }
>>   ...
>> }
>>
> Yes, I see now that the single real instance of the servlet can master
> the sharedExcutor.
>
> I have reliable threadpool code at hand.  I don't need to separate the
> job types:  In practice all the big ones are done first: they define the
> small ones.  It's when I'm spectacularly successful and two (2)
> investigators want to use the system ...

Sounds good.

But I am still confused as to what is taking 18 hours. None of the calls
above look like they should take a long time, given your comments.

>> If you want to put those executors into JNDI, you are welcome to do
>> so, but there is no particular reason to. If it's convenient to
>> configure a thread pool executor via some JNDI injection
>> something-or-other, feel free to use that.
>>
>> But ultimately, you are just going to get a reference to the executor
>> and drop the job on it.
>>
>>> Next up, is SSL.  One of the reason’s I must switch from my naked
>>> socket impl.
>>
>> Nah, you can do TLS on a naked socket. But I think using Tomcat
>> embedded (or not) will save you the trouble of having to learn a whole
>> lot and write a lot of code.
>>
> No thanks.
>> TLS should be fairly easy to get going in Tomcat as long as you
>> already understand how to create a key+certificate.
>>
> I've made keys/certs in previous lives (not to say I understand them).
> I'm waiting to hear on whether or not I'll be able to self-sign etc.
> Talking to AWS Monday on the things security/HIPAA

AWS may tell you that simply using TLS at the load-balancer (which is
fall-off-a-log easy; they will even auto-renew with an AWS-signed CA),
which should be sufficient for your needs. You may not have to configure
Tomcat for TLS at all.

> I'm sure I'll be back, but I think I can move forward.  Much appreciated.

Any time.

-chris

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Off-loading heavy process

Rob Sargent


>>> Calling save() from the servlet would tie-up the request-processing thread until the save completes. That's where you get your 18-hour response times, which is not very HTTP-friendly.
>> Certainly don't want to pay for 18 EC2 hours of idle.
>
> So your clients spin-up an EC2 instance just to send the request to your server? That sounds odd.
>
Maybe more than you want to hear:  I fill an AWS Queue with job definitions.  Each job is run on separate EC2 instance, pulls an id or two from the job def/command line and requests data from the database.  Uses that data to run simulations and sends the analysis of the simulations back to the database.  If I didn’t spin the work off to the ThreadPoolExec,  the “large” version would have to wait for many, many records to be saved.  I avoid this.  (I actually had to look back to see where the “18 hours” came from...)
>>>
>> The two payloads are impls of an a base class. Jackson/ObjectMapper unravels them to Type. Type.save();
>
> Okay, so it looks like Type.save() is what needs to be called in the separate thread (well, submitted to a job scheduler; just get it off the request processing thread so you can return 200 response to the client).
>

Yes, I think I’m covered once I re-establish TPExec.

>>>> That’s the thinking behind the question of accessing a ThreadPoolExecutor via JDNI.  I know my existing impl does queue jobs so (so the load is greater than the capacity to handle requests).  I worry that without off-loading Tomcat would just spin up more servlet threads, exhaust resources.  I can lose a client, but would rather not lose the server (that looses all clients...)
>>>
>>> Agreed: rejecting a single request is preferred over the service coming down -- and all its in-flight jobs with it.
>>>
>>> So I think you want something like this:
>>>
>>> servlet {
>>>   post {
>>>     // Buffer all our input data
>>>     long bufferSize = request.getContentLengthLong();
>>>     if(bufferSize > Integer.MAX_VALUE || bufferSize < 0) {
>>>       bufferSize = 8192; // Reasonable default?
>>>     }
>>>     ByteArrayOutputStream buffer = new ByteArrayOutputStream((int)bufferSize);
>>>
>>>     int count;
>>>     byte[] buffer = new byte[8192];
>>>     while(-1 != (count = in.read(buf)) {
>>>         buffer.write(buf, 0, count);
>>>     }
>>>
>>>     // All data read: tell the client we are good to go
>>>     Job job = new Job(buffer);
>>>     try {
>>>       sharedExecutor.submit(job); // Fire and forget
>>>
>>>       response.setStatus(200); // Ok
>>>     } catch (RejectedExecutionException ree) {
>>>       response.setStatus(503); // Service Unavailable
>>>     }
>>>   }
>>> }
>>>
>> This is working:
>>       protected void doPost(HttpServletRequest req, HttpServletResponse
>>    resp) /*throws ServletException, IOException*/ {
>>         lookupHostAndPort();
>>         Connection conn = null;
>>         try {
>>           ObjectMapper jsonMapper = JsonMapper.builder().addModule(new
>>    JavaTimeModule()).build();
>>           jsonMapper.setSerializationInclusion(Include.NON_NULL);
>>           try {
>>             AbstractPayload payload =
>>    jsonMapper.readValue(req.getInputStream(), AbstractPayload.class);
>>             logger.error("received payload");
>>             String redoUrl =
>>    String.format("jdbc:postgresql://%s:%d/%s", getDbHost(),
>>    getDbPort(), getDbName(req));
>>            Connection copyConn = DriverManager.getConnection(redoUrl,
>>    getDbRole(req), getDbRole(req)+getExtension());
>
> So it's here you cannot pool the connections? What about:
>
>    Context ctx = new InitialContext();
>
>    DataSource ds = (DataSource)ctx.lookup("java:/comp/env/jdbc/" + getJNDIName(req));

I’ll see if I need this (If I’m never getting a pooled connection).  But JNDI is not a good place for the “second investigator’s name (et al)"
>
> Then you can define your per-user connection pools in JNDI and get the benefit of connection-pooling.
>
>>             payload.setConnection(copyConn);
>>             payload.write();
>
> Is the above call the one that takes hours?

The beginning of it for sure.  The COPY work happens pleasantly quickly but does need it’s own db connection.  Payload says thanks, then goes on to using the temp tables filled by COPY to write to the real tables.  This is the slow part as we can be talking about millions of records into/updating a table with indexed. (This is done in 1/16ths. Don’t ask how.)

>
>>             //HERE THE CLIENT IS WAITING FOR THE SAVE.  Though there
>>    can be a lot of data, COPY is blindingly fast
>
> Maybe the payload.write() is not slow. Maybe? After this you don't do anything else...
>
>>             resp.setContentType("plain/text");
>>             resp.setStatus(200);
>>             resp.getOutputStream().write("SGS_OK".getBytes());
>>             resp.getOutputStream().flush();
>>             resp.getOutputStream().close();
>>           }
>>             //Client can do squat at this point.
>>           catch
>>    (com.fasterxml.jackson.databind.exc.MismatchedInputException mie) {
>>             logger.error("transform failed: " + mie.getMessage());
>>             resp.setContentType("plain/text");
>>             resp.setStatus(461);
>>             String emsg = "PAYLOAD NOT
>>    SAVED\n%s\n".format(mie.getMessage());
>>             resp.getOutputStream().write(emsg.getBytes());
>>             resp.getOutputStream().flush();
>>             resp.getOutputStream().close();
>>           }
>>         }
>>         catch (IOException | SQLException ioe) {
>>         etc }
>>> Obviously, the job needs to know how to execute itself (making it Runnable means you can use the various Executors Java provides). Also, you need to decide what to do about creating the executor.
>>>
>>> I used the ByteArrayOutputStream above to avoid the complexity of re-scaling buffers in example code. If you have huge buffers and you need to convert to byte[] at the end, then you are going to need 2x heap space to do it. Yuck. Consider implementing the auto-re-sizing byte-array yourself and avoiding ByteArrayOutputStream.
>>>
>>> There isn't anything magic about JNDI. You could also put the thread pool directly into your servvlet:
>>>
>>> servlet {
>>>   ThreadPoolExecutor sharedExecutor;
>>>   constructor() {
>>>     sharedExecutor = new ThreadPoolExecutor(...);
>>>   }
>>>   ...
>>> }
>>>
>> Yes, I see now that the single real instance of the servlet can master the sharedExcutor.
>> I have reliable threadpool code at hand.  I don't need to separate the job types:  In practice all the big ones are done first: they define the small ones.  It's when I'm spectacularly successful and two (2) investigators want to use the system ...
>
> Sounds good.
>
> But I am still confused as to what is taking 18 hours. None of the calls above look like they should take a long time, given your comments.

I think I've explain the slow part above.  TL/DR: DB writes are expensive

>
>>> If you want to put those executors into JNDI, you are welcome to do so, but there is no particular reason to. If it's convenient to configure a thread pool executor via some JNDI injection something-or-other, feel free to use that.
>>>
>>> But ultimately, you are just going to get a reference to the executor and drop the job on it.
>>>
>>>> Next up, is SSL.  One of the reason’s I must switch from my naked socket impl.
>>>
>>> Nah, you can do TLS on a naked socket. But I think using Tomcat embedded (or not) will save you the trouble of having to learn a whole lot and write a lot of code.
>>>
>> No thanks.
>>> TLS should be fairly easy to get going in Tomcat as long as you already understand how to create a key+certificate.
>>>
>> I've made keys/certs in previous lives (not to say I understand them). I'm waiting to hear on whether or not I'll be able to self-sign etc. Talking to AWS Monday on the things security/HIPAA
>
> AWS may tell you that simply using TLS at the load-balancer (which is fall-off-a-log easy; they will even auto-renew with an AWS-signed CA), which should be sufficient for your needs. You may not have to configure Tomcat for TLS at all.
>
I will definitely bring this up.  Thanks.
>> I'm sure I'll be back, but I think I can move forward.  Much appreciated.
>

> Any time.
>
> -chris
>
Same threat as before ;)
Thanks a ton,

rjs

> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Off-loading heavy process

Christopher Schultz-2
Rob,

Apologies for the top-post, but at this point I think (a) you are
satisfied you are on the right track and (b) I have become more
confused. Given that (a) is much more important than (b), we can just
leave it at that.

:)

Feel free to come back for further clarifications of suggestions if you
need them.

Good luck,
-chris

On 12/14/20 11:32, Rob Sargent wrote:

>
>
>>>> Calling save() from the servlet would tie-up the request-processing thread until the save completes. That's where you get your 18-hour response times, which is not very HTTP-friendly.
>>> Certainly don't want to pay for 18 EC2 hours of idle.
>>
>> So your clients spin-up an EC2 instance just to send the request to your server? That sounds odd.
>>
> Maybe more than you want to hear:  I fill an AWS Queue with job definitions.  Each job is run on separate EC2 instance, pulls an id or two from the job def/command line and requests data from the database.  Uses that data to run simulations and sends the analysis of the simulations back to the database.  If I didn’t spin the work off to the ThreadPoolExec,  the “large” version would have to wait for many, many records to be saved.  I avoid this.  (I actually had to look back to see where the “18 hours” came from...)
>>>>
>>> The two payloads are impls of an a base class. Jackson/ObjectMapper unravels them to Type. Type.save();
>>
>> Okay, so it looks like Type.save() is what needs to be called in the separate thread (well, submitted to a job scheduler; just get it off the request processing thread so you can return 200 response to the client).
>>
>
> Yes, I think I’m covered once I re-establish TPExec.
>>>>> That’s the thinking behind the question of accessing a ThreadPoolExecutor via JDNI.  I know my existing impl does queue jobs so (so the load is greater than the capacity to handle requests).  I worry that without off-loading Tomcat would just spin up more servlet threads, exhaust resources.  I can lose a client, but would rather not lose the server (that looses all clients...)
>>>>
>>>> Agreed: rejecting a single request is preferred over the service coming down -- and all its in-flight jobs with it.
>>>>
>>>> So I think you want something like this:
>>>>
>>>> servlet {
>>>>    post {
>>>>      // Buffer all our input data
>>>>      long bufferSize = request.getContentLengthLong();
>>>>      if(bufferSize > Integer.MAX_VALUE || bufferSize < 0) {
>>>>        bufferSize = 8192; // Reasonable default?
>>>>      }
>>>>      ByteArrayOutputStream buffer = new ByteArrayOutputStream((int)bufferSize);
>>>>
>>>>      int count;
>>>>      byte[] buffer = new byte[8192];
>>>>      while(-1 != (count = in.read(buf)) {
>>>>          buffer.write(buf, 0, count);
>>>>      }
>>>>
>>>>      // All data read: tell the client we are good to go
>>>>      Job job = new Job(buffer);
>>>>      try {
>>>>        sharedExecutor.submit(job); // Fire and forget
>>>>
>>>>        response.setStatus(200); // Ok
>>>>      } catch (RejectedExecutionException ree) {
>>>>        response.setStatus(503); // Service Unavailable
>>>>      }
>>>>    }
>>>> }
>>>>
>>> This is working:
>>>        protected void doPost(HttpServletRequest req, HttpServletResponse
>>>     resp) /*throws ServletException, IOException*/ {
>>>          lookupHostAndPort();
>>>          Connection conn = null;
>>>          try {
>>>            ObjectMapper jsonMapper = JsonMapper.builder().addModule(new
>>>     JavaTimeModule()).build();
>>>            jsonMapper.setSerializationInclusion(Include.NON_NULL);
>>>            try {
>>>              AbstractPayload payload =
>>>     jsonMapper.readValue(req.getInputStream(), AbstractPayload.class);
>>>              logger.error("received payload");
>>>              String redoUrl =
>>>     String.format("jdbc:postgresql://%s:%d/%s", getDbHost(),
>>>     getDbPort(), getDbName(req));
>>>             Connection copyConn = DriverManager.getConnection(redoUrl,
>>>     getDbRole(req), getDbRole(req)+getExtension());
>>
>> So it's here you cannot pool the connections? What about:
>>
>>     Context ctx = new InitialContext();
>>
>>     DataSource ds = (DataSource)ctx.lookup("java:/comp/env/jdbc/" + getJNDIName(req));
>
> I’ll see if I need this (If I’m never getting a pooled connection).  But JNDI is not a good place for the “second investigator’s name (et al)"
>>
>> Then you can define your per-user connection pools in JNDI and get the benefit of connection-pooling.
>>
>>>              payload.setConnection(copyConn);
>>>              payload.write();
>>
>> Is the above call the one that takes hours?
>
> The beginning of it for sure.  The COPY work happens pleasantly quickly but does need it’s own db connection.  Payload says thanks, then goes on to using the temp tables filled by COPY to write to the real tables.  This is the slow part as we can be talking about millions of records into/updating a table with indexed. (This is done in 1/16ths. Don’t ask how.)
>>
>>>              //HERE THE CLIENT IS WAITING FOR THE SAVE.  Though there
>>>     can be a lot of data, COPY is blindingly fast
>>
>> Maybe the payload.write() is not slow. Maybe? After this you don't do anything else...
>>
>>>              resp.setContentType("plain/text");
>>>              resp.setStatus(200);
>>>              resp.getOutputStream().write("SGS_OK".getBytes());
>>>              resp.getOutputStream().flush();
>>>              resp.getOutputStream().close();
>>>            }
>>>              //Client can do squat at this point.
>>>            catch
>>>     (com.fasterxml.jackson.databind.exc.MismatchedInputException mie) {
>>>              logger.error("transform failed: " + mie.getMessage());
>>>              resp.setContentType("plain/text");
>>>              resp.setStatus(461);
>>>              String emsg = "PAYLOAD NOT
>>>     SAVED\n%s\n".format(mie.getMessage());
>>>              resp.getOutputStream().write(emsg.getBytes());
>>>              resp.getOutputStream().flush();
>>>              resp.getOutputStream().close();
>>>            }
>>>          }
>>>          catch (IOException | SQLException ioe) {
>>>          etc }
>>>> Obviously, the job needs to know how to execute itself (making it Runnable means you can use the various Executors Java provides). Also, you need to decide what to do about creating the executor.
>>>>
>>>> I used the ByteArrayOutputStream above to avoid the complexity of re-scaling buffers in example code. If you have huge buffers and you need to convert to byte[] at the end, then you are going to need 2x heap space to do it. Yuck. Consider implementing the auto-re-sizing byte-array yourself and avoiding ByteArrayOutputStream.
>>>>
>>>> There isn't anything magic about JNDI. You could also put the thread pool directly into your servvlet:
>>>>
>>>> servlet {
>>>>    ThreadPoolExecutor sharedExecutor;
>>>>    constructor() {
>>>>      sharedExecutor = new ThreadPoolExecutor(...);
>>>>    }
>>>>    ...
>>>> }
>>>>
>>> Yes, I see now that the single real instance of the servlet can master the sharedExcutor.
>>> I have reliable threadpool code at hand.  I don't need to separate the job types:  In practice all the big ones are done first: they define the small ones.  It's when I'm spectacularly successful and two (2) investigators want to use the system ...
>>
>> Sounds good.
>>
>> But I am still confused as to what is taking 18 hours. None of the calls above look like they should take a long time, given your comments.
>
> I think I've explain the slow part above.  TL/DR: DB writes are expensive
>>
>>>> If you want to put those executors into JNDI, you are welcome to do so, but there is no particular reason to. If it's convenient to configure a thread pool executor via some JNDI injection something-or-other, feel free to use that.
>>>>
>>>> But ultimately, you are just going to get a reference to the executor and drop the job on it.
>>>>
>>>>> Next up, is SSL.  One of the reason’s I must switch from my naked socket impl.
>>>>
>>>> Nah, you can do TLS on a naked socket. But I think using Tomcat embedded (or not) will save you the trouble of having to learn a whole lot and write a lot of code.
>>>>
>>> No thanks.
>>>> TLS should be fairly easy to get going in Tomcat as long as you already understand how to create a key+certificate.
>>>>
>>> I've made keys/certs in previous lives (not to say I understand them). I'm waiting to hear on whether or not I'll be able to self-sign etc. Talking to AWS Monday on the things security/HIPAA
>>
>> AWS may tell you that simply using TLS at the load-balancer (which is fall-off-a-log easy; they will even auto-renew with an AWS-signed CA), which should be sufficient for your needs. You may not have to configure Tomcat for TLS at all.
>>
> I will definitely bring this up.  Thanks.
>>> I'm sure I'll be back, but I think I can move forward.  Much appreciated.
>>
>
>> Any time.
>>
>> -chris
>>
> Same threat as before ;)
> Thanks a ton,
>
> rjs
>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Off-loading heavy process

Rob Sargent
oh,oh.  If you're confused, then likely I am and don't know it.  When I
finally realize the point of confusion, I'll come crawling back.  (Would
love to un-confuse you, but I think I've proven inadequate there.)

Cheers, and thanks for all your time and help.

rjs

On 12/14/20 3:31 PM, Christopher Schultz wrote:

> Rob,
>
> Apologies for the top-post, but at this point I think (a) you are
> satisfied you are on the right track and (b) I have become more
> confused. Given that (a) is much more important than (b), we can just
> leave it at that.
>
> :)
>
> Feel free to come back for further clarifications of suggestions if
> you need them.
>
> Good luck,
> -chris
>
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]