[logback-dev] [JIRA] Commented: (LBCLASSIC-158) Interleaved messages from multiple threads

Michael Olson (JIRA) noreply-jira at qos.ch
Mon Oct 5 21:12:44 CEST 2009


    [ http://jira.qos.ch/browse/LBCLASSIC-158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=11324#action_11324 ] 

Michael Olson commented on LBCLASSIC-158:
-----------------------------------------

Update: I was going through my logs this morning to see what kinds of errors were generated over the weekend.  I just found the first instance in my RollingFileAppender log of interleaved messages before the shutdown sequence is initiated; here is the log excerpt on pastebin:
http://logback.pastebin.com/f41292fea

And here it is again just included, since pastebin has been acting funny:
2009-10-03 06:09:06,464 [main]       DEBUG               Eventweb - IQ: 2       OQ: 0
2009-10-03 06:09:09,786 [Calais-02]  DEBUG        CalaisSubmitter - Caught ; nested exception is: 
        java.net.ConnectException: Connection timed out while submitting content of size 2,094; retry 10
2009-10-03 06:09:09,7862009-10-03 06:09:09,7862009-10-03 06:09:09,786 [ [ [Calais-02Calais-01Calais-03]]] WARN         CalaisSubmitter - Too many NON_FATAL;  backing off system for a minute (last: ; nested exception is: 
        java.net.ConnectException: Connection timed out on size: 2,094)
2009-10-03 06:09:09,7862009-10-03 06:09:09,7862009-10-03 06:09:09,786 [ [ [Calais-02Calais-01Calais-03]]] DEBUG        CalaisSubmitter - Caught ; nested exception is: 
        java.net.ConnectException: Connection timed out while submitting content of size 613; retry 6
2009-10-03 06:09:09,7862009-10-03 06:09:09,7862009-10-03 06:09:09,786 [ [ [Calais-02Calais-01Calais-03]]] DEBUG        CalaisSubmitter - Caught ; nested exception is: 
        java.net.ConnectException: Connection timed out while submitting content of size 1,071; retry 10
2009-10-03 06:09:09,786 [Calais-01]  WARN         CalaisSubmitter - Too many NON_FATAL;  backing off system for a minute (last: ; nested exception is: 
        java.net.ConnectException: Connection timed out on size: 1,071)
2009-10-03 06:09:09,786 [Calais-05]  DEBUG        CalaisSubmitter - Caught ; nested exception is: 
        java.net.ConnectException: Connection timed out while submitting content of size 213; retry 10
2009-10-03 06:09:09,786 [Calais-05]  WARN         CalaisSubmitter - Too many NON_FATAL;  backing off system for a minute (last: ; nested exception is: 
        java.net.ConnectException: Connection timed out on size: 213)
2009-10-03 06:09:16,465 [main]       DEBUG               Eventweb - IQ: 2       OQ: 0

This makes it clear that Thread.interrupt() is not actually the problem, as that function is never called until the shutdown sequence.

> Interleaved messages from multiple threads
> ------------------------------------------
>
>                 Key: LBCLASSIC-158
>                 URL: http://jira.qos.ch/browse/LBCLASSIC-158
>             Project: logback-classic
>          Issue Type: Bug
>          Components: appender
>    Affects Versions: 0.917
>         Environment: Mac OS X 10.5.8, Eclipse 20090920-1017, Java SE 6, Mac Pro with 2 processors with 4 cores each
>            Reporter: Michael Olson
>            Assignee: Logback dev list
>            Priority: Critical
>
> Note: I'm using pastebin so that the formatting is not messed with by JIRA and so that I can refer to line numbers.  My apologies if that makes it more difficult to examine my files.
> I've just moved to logback from log4j (which I was using only with the default configuration) using the slf4j wrappers.  I'm using a configuration which is a slightly changed version of one found in the documentation; the configuration can be found here:
> http://logback.pastebin.com/m1a6f2875
> These two log file excerpts are from that configuration:
> http://logback.pastebin.com/m5e08cb57 - note lines 3 and 4
> http://logback.pastebin.com/m7e575564 - note lines 2, 3, and 4
> Then this configuration, which only changes the class levels, is for the next log (I wanted to be precise):
> http://logback.pastebin.com/f6195004e
> This log file is from this morning:
> http://logback.pastebin.com/f42bdb83f
> Note lines 2 and 3.
> You'll notice, I'm sure, that these are all from the end of my log files; I have not observed this behavior elsewhere in my logs.  The observed problems have been during my shutdown procedure.  Specifically, the class PostItemSubmitter, with 4 threads, is waiting on input from CalaisSubmitter (it is trying to retrieve an object from a LinkedBlockingQueue with a get() operation).  In these particular logs, CalaisSubmitter has run into fatal errors (over SLA, whoops) and has put no data in the queue, so all four threads are currently blocked on their input queue.
> After all of the CalaisSubmitter threads are shutdown, the main thread tries to shut down the PostItemSubmitter threads.  It does this by calling a function in the thread's instance which sets a boolean value to false that controls the continued execution of the main program loop.  After it has set those variables, the main thread will check the size of the queue; if it is 0, which it is in this case, then it will interrupt any threads that have not stopped after one second.  It will do this immediately, so all four threads will receive Thread.interrupt() calls in sequence.
> When those threads are interrupted, they throw an InterruptedException.  The catch statement looks like this:
> 	catch (InterruptedException e) {
> 		if (running) logger.error("Interrupted getting item from queue");
> 		else logger.info("Shutdown aborted queue operation");
> 	}
> In at least two of these three logs, it seems clear that the problem lies in how the interrupted threads are using the static logger defined for their class, which looks like this:
> 	private static Logger logger = LoggerFactory.getLogger(PostItemSubmitter.class);
> In the other log, the interleaving comes with the CalaisSubmitter, but you can see at least one aborted queue operation, so the reason could be similar (that class is also interrupted on an empty queue and catches those interrupts in the same way).
> Let me know if you need any more information to help diagnose the problem.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.qos.ch/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        


More information about the logback-dev mailing list