[logback-user] AsyncAppenderBase not flushing queue during JVM shutdown

Fri Mar 7 04:57:04 CET 2014

Hi David,

Pull request 177 <https://github.com/qos-ch/logback/pull/177> has a fairly
thorough implementation of a shutdown hook that handles flushing the queue
as discussed (a queue idle period used by the hook to decide that other
threads are done logging and a maximum runtime to prevent an excessive
delay during shutdown). I added a few unit tests to cover the queue idle
period, maximum runtime, and to verify worker and hook threads. I just
thought of one more unit test that should be added (to verify shutdown hook
installation and removal) - I'll add that tomorrow.

Regards,

Mike Reinhold

On Thu, Mar 6, 2014 at 2:16 PM, David Roussel
<nabble at diroussel.xsmail.com>wrote:

> Sounds good. Create a pull request and link back to it from this thread.
>
> On 6 Mar 2014, at 18:48, Michael Reinhold <mike at coursescheduler.io> wrote:
>
> Hi David, Chris,
>
> Thanks for the feedback. In the test case that I ran with the
> AsyncAppender, LoggerContext.stop did not wait for the async appender to
> finish flushing before exiting - the stop method returned immediately. I
> just retested this scenario to be sure. My test should generate exactly 135
> log events (also going to a local file appender, so I can confirm that
> entries are present or missing). Without the async appender I get all of
> the expected events (at much lower throughput); with the async appender
> (using the call to LoggerContext.stop prior to exiting) I get a variable
> number (generally less than 20 events). It's possible that the lower
> latency of flushing to local files results in the async appender being able
> to flush completely to disk before the JVM exits, whereas the latency to
> Loggly is just too high. Queue size probably also has an impact - I allow
> for a fairly large queue because the penalty of falling back to synchronous
> logging for Loggly is large and my logging tends to come in bursts.
>
> If stop() did wait for async appenders to flush, that would improve the
> situation. Although the use-case that involves shutdown hooks that want to
> log would still need to be addressed. The only way that I can see that does
> address allowing shutdown hooks to log is via a queue flushing shutdown
> hook (requiring a call to LoggerContext.stop won't help for shutdown hooks
> or applications that use JVM implicit shutdowns). My opinion is that a
> shutdown hook can satisfy both needs with minimal impact to other classes
> or semantics on how LoggerContext.stop functions. One hook per async
> appender would be simple to implement and would also allow for parallel
> flushing of async appender queues.
>
> A default shutdown timeout as you described makes sense to me and fits
> with what I was thinking for implementation of a shutdown hook.
>
> I could probably throw together a quick and dirty implementation tonight
> as a proof of concept. Thoughts?
>
> Regards,
>
> Mike Reinhold
>
>
>
> On Thu, Mar 6, 2014 at 1:01 PM, Chris Pratt <thechrispratt at gmail.com>wrote:
>
>> Another option might be to require calling the LoggerContext.stop when
>> using AsyncAppender's.  The stop method could signal to the AsyncAppenders
>> that they should drain their queue's and exit, which would allow the
>> application to shut down naturally (if the AsyncAppenders aren't started as
>> demon's)   (*Chris*)
>>
>>
>> On Thu, Mar 6, 2014 at 9:55 AM, David Roussel <
>> nabble at diroussel.xsmail.com> wrote:
>>
>>> Mike,
>>>
>>> I would expect the shutdown of logback to wait for async appenders to
>>> drain their queues before we can consider the shutdown down.
>>>
>>> I have used async appenders to provide faster logging when writing to
>>> local files. And that's what I expected happens.
>>>
>>> Of course there is the problem of waiting for too long. How about an
>>> async appender has a default shutdown timeout which can be overridden by
>>> config. So it can wait 2 seconds by default, but can be overridden to 30
>>> seconds for the loggly usecase.
>>>
>>> The simple solution is to put this all in AsyncAppender(Base), but if we
>>> want to allow multiple async appenders to shutdown in parallel, then the
>>> fix might involve more changes.
>>>
>>> David
>>>
>>> On 6 Mar 2014, at 12:51, Michael Reinhold <mike at coursescheduler.io>
>>> wrote:
>>>
>>> Hi David,
>>>
>>> I forgot to mention this in my original note and it is a good thought,
>>> but yes I have tried to cleanly shutdown Logback. Even when the
>>> LogbackContext stop method is invoked prior to JVM exit, the async queue is
>>> not flushed completely. When Logback is stopped cleanly, the async worker
>>> thread exits the "waiting for events in the queue" while loop and moves on
>>> to the "flush remaining elements from the queue" for loop (confirmed by the
>>> info message when debug is enabled and via breakpoint). Ultimately, because
>>> the thread is a daemon thread - this process is still interrupted by JVM
>>> shutdown. There is no guarantee that the async worker will get enough time
>>> to process all events.
>>>
>>> From a theoretical standpoint, stopping Logback properly does not solve
>>> flushing issues with the AsycAppender in a number of cases (it may be fair
>>> to argue that it should not attempt to cover all of these cases):
>>>
>>>    - Queue is relatively full or wrapped appender latency is high - the
>>>    async worker may not be able to finish processing the queued items between
>>>    when the LogbackContext.stop method is invoked and when the JVM actually
>>>    exits
>>>    - Implicit JVM exit upon end of last user thread - some applications
>>>    don't explicitly call System.exit but rather rely on implicit JVM exit. In
>>>    that case, where would you cleanly shutdown Logback? It can probably be
>>>    done, but would require an application redesign to avoid reliance on
>>>    implicit exit or some form of thread tracking thread.
>>>    - Shutdown hooks are used to clean up resources - some applications
>>>    have components or modules that need to clean up after themselves when the
>>>    JVM shuts down. Typically, shutdown hooks are used for this purpose,
>>>    however the AsyncAppender's worker thread is unable to process queued
>>>    events after shutdown has been initiated (while the shutdown hooks are
>>>    running). This also prevents shutdown hooks from being able to log events
>>>    - Signal handling on *nix systems - signals can be sent to the
>>>    application for a number of reasons, most often to request graceful
>>>    shutdown or to terminate something that is non-responsive. If the signal
>>>    corresponds to something that generally means "graceful shutdown" on the
>>>    host system, normal cleanup routines should execute in the application
>>>    (typically implemented as shutdown hooks, so maybe this is the same as the
>>>    previous one).
>>>
>>> From a practical standpoint, I am running into the first and the third
>>> scenarios. Even shutting down cleanly, the latency and queue depth can
>>> cause elements to be missed; additionally, I have shutdown hooks that clean
>>> up resources and expect the ability to log events and errors.
>>>
>>> My thought is that a modified AsyncAppenderBase implementation could
>>> (optionally) install a shutdown hook that continues watching the queue for
>>> new events. A configurable "queue idle time" could be used to tell the
>>> shutdown hook that it has probably caught all events and can safely
>>> shutdown (not foolproof, I'm still thinking about other ways of handling
>>> it). Additionally, the hook could have a configurable "max processing time"
>>> so that it doesn't keep the JVM alive for an undesirable amount of time
>>> (thinking in the context of automation software where shutdown requests
>>> typically have a max execution time before they are considered as "failed
>>> online").
>>>
>>> I hope this helps clarify the scenario I am considering and a proposed
>>> solution!
>>>
>>> Regards,
>>>
>>> Mike Reinhold
>>>
>>>
>>> On Wed, Mar 5, 2014 at 5:30 PM, David Roussel <
>>> nabble at diroussel.xsmail.com> wrote:
>>>
>>>> Did you try shutting down logback cleanly. Like this
>>>> http://stackoverflow.com/questions/3678755/do-i-need-to-flush-events-when-shutting-down-using-logback
>>>>
>>>> David
>>>>
>>>> On 5 Mar 2014, at 20:44, Michael Reinhold <mike at coursescheduler.io>
>>>> wrote:
>>>>
>>>> Hi Ceki,
>>>>
>>>> I am currently using the AsyncAppender in combination with the
>>>> LogglyAppender from the Logback extensions project. While working on a
>>>> number of aspects of my application, I found that I was not consistently
>>>> getting all of the log messages that I expected. In particular, when the
>>>> application shuts down immediately (or very shortly) after a burst of
>>>> logging activity, the tail of those log events is often not present in
>>>> Loggly. From a number of tests, this issue is not restricted to use with
>>>> the LogglyAppender, but is simply more evident because of the latency
>>>> involved.
>>>>
>>>> Looking through the source code for the AsyncAppenderBase, I saw the
>>>> following:
>>>>
>>>> You create the Async sender thread as a Daemon thread
>>>>
>>>>
>>>> addInfo("Setting discardingThreshold to " + discardingThreshold);
>>>>     worker.setDaemon(true);
>>>>     worker.setName("AsyncAppender-Worker-" + worker.getName());
>>>>     // make sure this instance is marked as "started" before staring
>>>> the worker Thread
>>>>     super.start();
>>>>     worker.start();
>>>>
>>>>
>>>> In the sender thread, if you determine that the parent thread has
>>>> stopped or the async sender thread has been interrupted, you allow the
>>>> worker thread to flush remaining log events in the queue.
>>>>
>>>>
>>>> while (parent.isStarted()) {
>>>>         try {
>>>>           E e = parent.blockingQueue.take();
>>>>           aai.appendLoopOnAppenders(e);
>>>>         } catch (InterruptedException ie) {
>>>>           break;
>>>>         }
>>>>       }
>>>>
>>>>       addInfo("Worker thread will flush remaining events before
>>>> exiting. ");
>>>>       for (E e : parent.blockingQueue) {
>>>>         aai.appendLoopOnAppenders(e);
>>>>       }
>>>>
>>>>       aai.detachAndStopAllAppenders();
>>>>
>>>>
>>>> From what I can tell, during JVM shutdown (for a standalone SE
>>>> instance, the same is probably not true for EE instances with application
>>>> servers) daemon threads may be terminated without allowing the the
>>>> AsyncAppenderBase to escape the while loop and proceed onto the queue flush
>>>> for loop.
>>>>
>>>> From Brian Goetz in Java Concurrency in Practice:
>>>> "When a thread exits, the JVM performs an inventory of running threads,
>>>> and if the only threads that are left are daemon threads, it initiates an
>>>> orderly shutdown. When the JVM halts, any remaining daemon threads are
>>>> abandoned finally blocks are not executed, stacks are not unwound the JVM
>>>> just exits. Daemon threads should be used sparingly few processing
>>>> activities can be safely abandoned at any time with no cleanup. In
>>>> particular, it is dangerous to use daemon threads for tasks that might
>>>> perform any sort of I/O. Daemon threads are best saved for "housekeeping"
>>>> tasks, such as a background thread that periodically removes expired
>>>> entries from an in-memory cache."
>>>>
>>>> To test this, I inserted a break point in the AsyncAppenderBase at the
>>>> call to addInfo and ran a variety of different scenarios. At no point in
>>>> time was I able to get the breakpoint to stop at the addInfo line.
>>>>
>>>> I don't think there are any clear cut solutions to this. Making the
>>>> worker thread a user thread instead of daemon thread has its own
>>>> implications, particularly that if all other user threads have exited the
>>>> async threads would keep the JVM instance alive (unless System.exit has
>>>> been called, in which case I believe that you will still have lost log
>>>> events even if the async processing thread is not a daemon). It might be
>>>> possible to create a shutdown hook that does the queue flushing for the
>>>> async worker - though you may need to consider the possibility of other
>>>> shutdown hooks wanting to log events as well.
>>>>
>>>> I'm currently mocking up a version of the AsyncAppenderBase and
>>>> AsyncAppender that installs a shutdown hook as described previously. I
>>>> think a "queue idle time" configuration might be the best way to handle
>>>> other shutdown hooks adding log events (aka - after processing whatever was
>>>> in the queue, if no new events are added within *x* ms then the
>>>> shutdown hook can assume nothing else will be adding log events and can
>>>> exit).  An alternative might be to have the shutdown hook query the
>>>> ThreadMXBean API to determine if other shutdown hooks are still executing
>>>> and possibly adding log events (though the threads that are expected to be
>>>> running during shutdown may be not only application specific but also JVM
>>>> implementation specific... I'm not sure).
>>>>
>>>> Let me know what you think. I'll let you know if I feel that my mockup
>>>> may be viable...
>>>>
>>>> Regards,
>>>>
>>>> Mike Reinhold
>>>>
>>>> _______________________________________________
>>>> Logback-user mailing list
>>>> Logback-user at qos.ch
>>>> http://mailman.qos.ch/mailman/listinfo/logback-user
>>>>
>>>>
>>>> _______________________________________________
>>>> Logback-user mailing list
>>>> Logback-user at qos.ch
>>>> http://mailman.qos.ch/mailman/listinfo/logback-user
>>>>
>>>
>>> _______________________________________________
>>> Logback-user mailing list
>>> Logback-user at qos.ch
>>> http://mailman.qos.ch/mailman/listinfo/logback-user
>>>
>>>
>>> _______________________________________________
>>> Logback-user mailing list
>>> Logback-user at qos.ch
>>> http://mailman.qos.ch/mailman/listinfo/logback-user
>>>
>>
>>
>> _______________________________________________
>> Logback-user mailing list
>> Logback-user at qos.ch
>> http://mailman.qos.ch/mailman/listinfo/logback-user
>>
>
> _______________________________________________
> Logback-user mailing list
> Logback-user at qos.ch
> http://mailman.qos.ch/mailman/listinfo/logback-user
>
>
>
> _______________________________________________
> Logback-user mailing list
> Logback-user at qos.ch
> http://mailman.qos.ch/mailman/listinfo/logback-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.qos.ch/pipermail/logback-user/attachments/20140306/c0400b4f/attachment-0001.html>