[logback-user] how to graceful shutdown logback with AsyncAppenders and without immediateFlush

Michael Reinhold mike at coursescheduler.io
Wed Mar 19 02:48:43 CET 2014


Hi Gena,

Thanks for the detailed response - I'll try to be as thorough in return!

I agree completely with you that a timeout of 0 (meaning wait for the
worker thread to join, regardless of the time it takes) makes a lot of
sense for local appenders, such as the RollingFileAppender example you
gave. On the other hand, when the appender has a high latency, perhaps
because the appender is sending the log events to a remote system
(database, cloud service, syslog daemon, etc), it may not be feasible for
the JVM to wait indefinitely for log events to flush. For instance, sending
log events to Loggly (http://www.loggly.com) can have latencies between 200
and 600ms. Depending on the queue depth, this could take a very long time
to flush completely. For some environments, it *may* be preferable to drop
log events over taking multiple minutes to flush the log queue. For other
environments, the log events may be important enough to stall the JVM
shutdown indefinitely. For this reason, we determined that it made the most
sense to make the join call timeout configurable, using the previously
hardcoded value as a default in order to keep the semantics the same. This
way, the application developer can determine what is correct for the
application by specifying an appropriate timeout value.

Of course any events posted to a logger after calling LoggerContext.stop
will be ignored - Logback only makes guarantees about events submitted to
loggers that have been started. Since LoggerContext.stop stops all loggers
and appenders, any subsequent log messages will not be appended to the
target, which makes perfect sense. The key aspect here is that
LoggerContext.stop should not be called unless no more log events are
expected. In some cases, the application may be able to call it right
before System.exit, but as discussed in other scenarios it may need to be
triggered via a shutdown hook.

I mis-understood your solution - I got the impression that you were
suggesting that the user code would have to change the Logback mode.

The shutdown hook scenario is something that I also run into. The scenarios
that you describe are completely valid! Any time an application has
background threads (user or daemon) or shutdown hooks that may log events,
there is the potential for lost log events due to timing / race conditions.

As you said, using a timeout delay for the shutdown hook clearly is
non-optimal for some scenarios (though it may be acceptable for others).
The upside to a delayed cleanup shutdown hook is that the user can profile
their application and determine a reasonable value for the delay parameter.
Obviously this is still not failsafe, but with appropriate understanding of
the work that the shutdown hook is attempting to do and the maximum amount
of time that the task should take, a delay cleanup hook *could* work within
the requirements of the application.

Obviously this does not solve all scenarios for all applications. As a
result, the ShutdownHookAction that I am adding to Joran will operate in a
manner similar to the AppenderAction in that it will utilize a class
parameter that defines which ShutdownHook implementation should be
instantiated. For instance, if the application can reasonably expect that
the other shutdown hooks will complete in 500 ms, a DelayedCleanupHook
could be used with a delay twice the expected runtime of the other hooks:

<shutdownHook class="ch.qos.logback.classic.spi.DelayingCleanupHook">
<delay>1000</delay>
</shutdownHook>

If the application controlled all of the other shutdown hooks (no
third-party shutdown hooks are utilized or need to log events), then a
custom ShutdownHook could be written that looks for some condition set by
the other shutdown hooks:

<shutdownHook class="com.foo.package.CustomCleanupHook">
</shutdownHook>

where the CustomCleanupHook checks against some globally accessible state
(say a boolean value) to determine if it is safe to stop the LoggerContext.

Because of this flexibility and extensibility, I don't believe that the
ShutdownHookAction would be useless in your scenario. Depending on your
application needs and specific scenario, you should be able to define a
ShutdownHook implementation that meets your requirements. Generally
speaking, it is difficult to define a "universally perfect" solution
because every application has different requirements and acceptable
tolerances.

Perhaps with more information about your specific scenario, we could come
up with a ShutdownHook design that covers your needs adequately and include
that in the pull request. There are other ShutdownHooks that I am
interested in developing for inclusion such as:

   - idle time based shutdown hook - when all Loggers have been idle (no
   new events appender for a configurable time period), then trigger
   LoggerContext.stop. This would require some level of Logback internal
   statistics in order to implement.
   - thread count shutdown hook - for applications where the number of
   active threads is deterministic, it may be acceptable to have the
   LoggerContext stop once all non-essential threads have stopped. Threads can
   be queried via the ThreadMXBean API, though the number and function of the
   JVM internal threads may vary by implementation, making this design
   difficult
   - synchronized variable state shutdown hook - if other shutdown hooks &
   threads can set a variable indicating their state, perhaps a semaphore or
   counting lock indicating the number of open threads, which will stop the
   LoggerContext when all of the locks are released. This would require
   co-operation of the other threads and shutdown hooks - limited ability to
   support Third-Party threads or hooks...

I hope this helps to explain how the ShutdownHook design is supposed to
actually work. I think that it should provide a foundation that will
support most scenarios. That being said, for some use-cases it may be
necessary to design additional flexibility into Logback - for instance, my
described idle time based shutdown hook would require some way of tracking
logger use statistics which could be a sizable modification.

Thank you again for your input! I look forward to hearing your thoughts on
the described framework.

Regards,

Mike Reinhold


On Tue, Mar 18, 2014 at 6:08 PM, Gena Makhomed <gmm at csdoc.com> wrote:

> On 18.03.2014 17:27, Michael Reinhold wrote:
>
>  In my opinion, the solution that Ceki and I have been working on is
>> somewhat simpler and probably easier to implement. The full conversation
>> can be viewed in the pull request comments, but basically we have
>> settled on a small alteration to AsyncAppenderBase in Logback-core that
>> makes the worker.join(long) call have a configurable timeout instead of
>> hardcoded to 1000ms. By doing this, LoggerContext.stop is now guaranteed
>> to flush as many queued events as possible within the specified timeout
>> - which can be tuned via the maxShutdownTime configuration element for
>> AsyncAppender. Additionally, since the maxShutdownTime configuration
>> element has the same semantics as Thread.join(long), using a value of 0
>> will cause the LoggerContext.stop method to wait for all queued events
>> to flush regardless of how long it takes.
>>
>
> If using AsyncAppender with RollingFileAppender on local file system -
> IMHO the best solution is to flush all events from AsyncAppender queue
> with call Thread.join(0) - without any possible information lost here.
> Queued log events is more valuable than JVM process shutdown speed.
> I can't find any reason to change this timeout from 0 to anything else.
>
> Ok, but this is partial solution - it only flush
> events which already in the AsyncAppender queue.
>
> After LoggerContext.stop() finished - all subsequent logger.info(...)
> calls will lost information and do not write messages into log file.
>
>
>  This solution is better in my
>> mind because it involves very few code changes and does not alter
>> shutdown semantics for any existing users (no additional calls to change
>> the Logback mode), while still making it possible for AsyncAppender to
>> fully flush its queue by adjusting the timeout.
>>
>
> In my proposed solution also not need any additional calls to change
> the Logback mode - all work will be done by Logback shutdown handler,
> transparently and completely invisible for all logback users.
>
>
>  The second piece of the design that we have been working on is how to
>> allow for flushing the queue when it is not possible to call
>> LoggerContext.stop prior to application exit (such as when an
>> application exits implicitly or from) or when shutdown hooks expect the
>> ability to log events. This is a little more complex, but per Ceki's
>> description in the pull request comments, adding a new action to Joran
>> to allow the installation of a shutdown hook that calls
>> LoggerContext.stop is pretty flexible.
>>
>
> "when shutdown hooks expect the ability to log events"
> - yes, this is my case.
>
> also, if System.exit() called manually - in this case any application
> thread expect the ability to log events - they don't know about exiting.
>
> in any case of shutdown hook, after calling LoggerContext.stop
> - all rest events will be lost.
>
> how to solve this problem/bug - I describe in previous message:
>
> do not call LoggerContext.stop at all, and just turn logback into
> failsafe mode, with AsyncAppender working in transparent sync mode,
> and with all FileAppenders turned immediateFlush into true.
>
> in attach - TestApplication.java with two shutdown hooks.
>
> imagine, what logback shutdown hook wait 2 seconds
> before call LoggerContext.stop
> and application shutdown hook work 10 seconds before completion.
>
> so - all events logged from application shutdown hook
> (and all other live application threads) will be lost.
>
> add many odd timeouts to logback is not optimal solution, IMHO,
> because it will be well known "race condition" and nonoptimal work:
>
> even if I force logback shutdown hook wait 15 seconds before
> LoggerContext.stop - this is not solve problem of events lost,
> because in rare cases application shutdown hook can work 20 seconds.
> - this is "race condition".
>
> also, if application shutdown hook complete work in 1 or 2 seconds -
> logback shutdown hook will force wait complete 15 seconds before
> application exit - this is nonoptimal work (unnecessary waste time).
>
> see http://www.qos.ch/shop/index
> "Quality Open Software [...] mission-critical enterprise software"
>
> possible workaround - call LoggerContext.stop() manually
> in own application shutdown hook after all work completed.
> but in this case, - ShutdownHookAction in joran will be useless.
>
> and this is partial workaround, it can help only in case
> of single application shutdown hook. if present multiple
> application/libraries shutdown hooks - we in trouble again.
>
> ===============================================================
>
> in my proposed solution - application exits as soon as possible,
> and without any lost events from all application shutdown hooks.
>
> event lost from (daemon) threads still possible,
> but this is unavoidable race condition between
> JVM successful exit and between (daemon) thread
> attempt to log any info.
>
> to minimize "deleterious consequences" all what we can
> do - is to forcibly set immediateFlush into true
> and call outputStream.flush() for all FileAppenders
> and forcibly turn AsyncAppenders into transparent sync mode.
>
> in this case is (daemon) threads attempt to log any info
> can cause partially written log file lines or not -
> I not investigate so deeply. Probably it can.
> In this case we lost only two log lines,
> this partially written, and first line
> of other process written to this log file in future.
>
> obvious workaround - always firstly call logger.info("")
> to write empty line to log file at every application startup.
> in this case we can lost only one partially written log file line.
>
> P.S.
>
> It is slightly uncomfortable to lost log events,
> so I prefer perfect or almost perfect solutions.
>
> right now I use such solution:
>
> public class LogbackConfigurator {
>
>     public void init() {
>
>         // ...
>
>         Runtime.getRuntime().addShutdownHook(new Thread(new
> LogbackShutdownHook(), "LogbackShutdownHook"));
>     }
>
>     private static class LogbackShutdownHook implements Runnable {
>
>         @Override
>         public void run() {
>             try {
>                 Thread.sleep(LOGBACK_STOP_DELAY_MILLIS);
>             } catch (InterruptedException ignored) {
>             }
>             LoggerContext loggerContext = (LoggerContext) LoggerFactory.
> getILoggerFactory();
>             loggerContext.stop();
>         }
>     }
> }
>
> but it is not perfect.
>
>  On Tue, Mar 18, 2014 at 6:37 AM, Gena Makhomed wrote:
>>
>>     Re: AsyncAppenderBase not flushing queue during JVM shutdown
>>
>>     Michael Reinhold,
>>     I can propose better solution of described problem:
>>
>>     we just need make two modes of work for logback:
>>
>>     1. normal mode, as it work all time after configuration finishing.
>>
>>     2. ready to shutdown mode, - fail safe mode, without messages lost.
>>
>>     switching from "normal mode" to "ready to shutdown mode"
>>     do via JVM shutdown hook, - just one shutdown hook for logback.
>>
>>     if logback turned from "normal mode" to "ready to shutdown mode",
>>     logback change own behavior in such way:
>>
>>     1. all appenders turn "immediateFlush = true",
>>     regardless of initial value in logback configuration,
>>     to prevent message lost during forthcoming JVM shutdown.
>>
>>     2. all AsyncAppenders turns into transparent and sync mode.
>>     if AsyncAppender work in this "transparent and sync mode",
>>     all behavior of
>>
>>     app -> AsyncAppender -> RollingFileAppender -> file
>>
>>     completly identical to behaviour of
>>
>>     app -> RollingFileAppender -> file
>>
>>     after switching from "normal mode" to "ready to shutdown mode",
>>     JVM shutdown hook finished work and now logback is ready
>>     to continue logging and also ready to forthcoming JVM shutdown.
>>
>>     all other components of system still can use logback
>>     for logging during JVM/application shutdown process.
>>
>>     here is only one unobvious nuance: how to transparently
>>     switch each logback AsyncAppender from async to sync mode
>>     without messages lost and without messages reordering in log.
>>
>>     but I am almost sure this is possible to do using java
>>
>>     without perceptible performance degrade for AsyncAppender
>>     when it work in normal (async) mode.
>>
>>     ==============================__==================
>>
>>
>>     IMHO, this is the most robust and most useful way to almost
>>     completly prevent any log messages lost during application and JVM
>>     shutdown.
>>
>>     LoggerContext.stop() not prevent messages lost,
>>     so it should not be called before JVM shutdown.
>>
>>     as I understand it mostly will be useful only for web applications,
>>     to be called in ServletContextListener.__contextDestroyed() callback.
>>
>>
>>     Let me know, what you think about this new proposed solution.
>>
>
> --
> Best regards,
>  Gena
>
> _______________________________________________
> Logback-user mailing list
> Logback-user at qos.ch
> http://mailman.qos.ch/mailman/listinfo/logback-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.qos.ch/pipermail/logback-user/attachments/20140318/73d43298/attachment-0001.html>


More information about the Logback-user mailing list