[logback-user] Logback becomes excruciatingly slow during rollover
Vesuvius Prime
vesuvius_prime at hotmail.com
Wed May 11 10:02:50 UTC 2016
Hi,
Thank you for the suggestion. This was the first thing I had suggested to the customer during the live session with them but then they said they didn't want to be a "testbed" for us, so they didn't do it. Now I asked our support guys to ask the customer again to do this and this time the customer agreed. After waiting a while for the logs to roll over, here's the feedback:
"Hello, Customer just provided an update. Log file is not rolling over, however it has grown to over 90mb, but the [product] has not slowed down."
So, there's an improvement -- at least now we're not becoming unusable due to excessive slowness. However, the log grows without a limit. The bottom line is that this is an improvement, but not a solution.
I would still love somebody with deep knowledge of Logback (presumably its authors) to provide some feedback here. For example, if this is a known problem and it has been solved in some version of Logback, we may go ahead an modify the Virgo server to use that version. Alternatively, if Logback is unreliable, we may have to switch to some other logging framework. I've been seeing various reports about rollover problems, even in this mail list. We can't afford to lose customers due to logging. Besides, this creates a really bad image to Logback itself.
Thank you for reading this.
________________________________________
Date: Thu, 28 Apr 2016 09:26:32 -0400
From: Chris Hamilton <chamilton-dev at compucafe.com>
To: logback users list <logback-user at qos.ch>
Subject: Re: [logback-user] Logback becomes excruciatingly slow during
rollover
Message-ID: <D1E6C509-E709-45E4-8381-D9751F4E477B at compucafe.com>
Content-Type: text/plain; charset="windows-1252"
I notice that you have defined the files to be rolled with a .zip extension.
We am using Logback 1.0.13 in my project and experienced issues with log rolling causing the JVM effectively halt while the rolled log file is compressed (we were using GZ). It appears that the rolling of the log is protected by a synchronous lock (makes since) and the compression routine is being called within that lock (not good). My only evidence of this is empirical because as I said everything else just stopped.
We resolved the problem by simply not using compression in the logback configuration. This was done by removing the gz extension (.zip in your case) in the FileNamePattern tag.
We then wrote a cron job to find all the uncompressed log files and compress them. This moves the compression of old log files from your application’s process to another [cron initiated] process that is not (as) mission critical.
You might want to give this a try...
> On Apr 26, 2016, at 3:13 PM, Vesuvius Prime <vesuvius_prime at hotmail.com> wrote:
>
> An update:
>
> The customer disabled Splunk and are still reproducing the problem. They also report that Process Explorer indicates that no other process holds a handle to the log files.
>
>
> From: Vesuvius Prime <vesuvius_prime at hotmail.com>
> Sent: Sunday, April 24, 2016 6:57 PM
> To: logback-user at qos.ch
> Subject: Logback becomes excruciatingly slow during rollover
>
> Hi all,
>
> We use Virgo server and it comes with Logback 1.0.7.v20121108-1250. One of the users of our product started complaining that after about a week of using our product, it was becoming so slow that it was basically unusable. We had a live session with the client and it turned out that the problem was in Logback. The client uses Windows. So far the problem has not been reported on Linux. Here's what I did to determine that Logback was the cause of the problem:
>
> 1. The appender is defined as:
>
> <appender name="LOG_FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
> <file>D:/logs/product.log</file>
> <rollingPolicy class="ch.qos.logback.core.rolling.FixedWindowRollingPolicy">
> <FileNamePattern>D:/logs/product_%i.log.zip</FileNamePattern>
> <MinIndex>0</MinIndex>
> <MaxIndex>9</MaxIndex>
> </rollingPolicy>
> <triggeringPolicy class="ch.qos.logback.core.rolling.SizeBasedTriggeringPolicy">
> <MaxFileSize>50MB</MaxFileSize>
> </triggeringPolicy>
> <encoder class="ch.qos.logback.classic.encoder.PatternLayoutEncoder">
> <Pattern>[%d{yyyy-MM-dd'T'HH:mm:ss.SSSXXX}] [%-5level] %-28.28thread %X{sessionGuid} %-64.64logger{64} %X{medic.eventCode} %msg %ex%n</Pattern>
> </encoder>
> </appender>
>
> In the appender configuration above, I modified the directories and log names a little bit, so that the actual product name would not be visible.
>
> 2. All the rolled-over zip files were generated and present, and the main log file, product.log, had exceeded its maximum size of 50 MB and had reached 64 MB. We had something like:
>
> product.log <------- 64 MB
> product_0.log.zip
> product_1.log.zip
> product_2.log.zip
> product_3.log.zip
> product_4.log.zip
> product_5.log.zip
> product_6.log.zip
> product_7.log.zip
> product_8.log.zip
> product_9.log.zip
>
> 3. This was suspicious, so I examined the log file and saw many messages that were logged out of order, e.g.
>
> 09:55:43.676
> 09:55:42.836
> 09:55:46.165
> 09:55:44.583
> ...
>
> Furthermore, even the simplest operations, e.g. simply sending a redirect response back to the browser, was taking several seconds. (However, before sending the redirect, we were logging the URL to which we were redirecting, so I suspected that the logger was extremely slow.)
>
> 3. I moved the archived log files to a different directory and they got recreated *immediately*. I did this 7-8 times, each time with exactly the same result -- the .zip files got recreated immediately.
>
> 4. The main log, product.log, was locked by the JVM, so I couldn't move it or delete it. However, I opened it with a text editor, selected the whole content (Ctrl+A), deleted it (Del) and saved the file. As soon as I did that, the performance problems disappeared. The product began working normally. Also, the log file started growing with normal speed. Previously, it was growing super-slowly.
>
>
> So, we told the users to do the following workaround when the preconditions from step 2 above were reached and the product became very slow:
>
> (1) Open the main log file, select all, delete, save.
> (2) Delete all archived .zip files
>
> This workaround has been working fine so far. However, we still don't know why Logback behaves like this. That's why I'm writing this email. Can someone, who knows the Logback source code, figure out what might be happening and what triggers it?
>
> Here are some additional details that we've been able to figure out:
>
> 1. When the users were experiencing the problem during the live session with them, I opened ProcessExplorer and saw that Splunk was holding a lock onto the main log file -- product log. I asked the users to kill the Splunk service and they did, but that didn't solve the problem. However, depending on the way Logback is written, it may turn out that it indeed went crazy because of Splunk, but *after* it goes crazy, killing the other process that hold a lock on the file doesn't help.
>
> 2. We managed to reproduce the same behavior in our company by letting two different appenders write to the same log file. Then, as soon as the preconditions on step 2 above were reached, everything became extremely slow, the logs couldn't roll over and the main log started growing beyond its specified limit of 50 MB. However, the users do *not* have such a configuration in which two appenders write to the same file. I'm only giving you information on how we managed to reproduce the slowness in case it might help you figure out what's wrong. The end result is the same -- Logback slows down to a crawl, but apparently the trigger in our users' environment is different and it's not related to appenders writing to the same file.
>
> 3. A thread dump while we were reproducing the problem (with two appenders writing to the same file) reveals the following:
>
> 3.1. One thread holds a lock (0x00000000dfa5d600):
>
> "Thread-48" #199 daemon prio=5 os_prio=0 tid=0x0000000024b61000 nid=0x772c runnable [0x000000002422e000]
> java.lang.Thread.State: RUNNABLE
> at java.util.zip.Deflater.deflateBytes(Native Method)
> at java.util.zip.Deflater.deflate(Deflater.java:442)
> - locked <0x00000000fdb8dcf0> (a java.util.zip.ZStreamRef)
> at java.util.zip.Deflater.deflate(Deflater.java:364)
> at java.util.zip.DeflaterOutputStream.deflate(DeflaterOutputStream.java:251)
> at java.util.zip.DeflaterOutputStream.write(DeflaterOutputStream.java:211)
> at java.util.zip.ZipOutputStream.write(ZipOutputStream.java:331)
> - locked <0x00000000fdb8da00> (a java.util.zip.ZipOutputStream)
> at ch.qos.logback.core.rolling.helper.Compressor.zipCompress(Compressor.java:110)
> at ch.qos.logback.core.rolling.helper.Compressor.compress(Compressor.java:58)
> at ch.qos.logback.core.rolling.FixedWindowRollingPolicy.rollover(FixedWindowRollingPolicy.java:147)
> at ch.qos.logback.core.rolling.RollingFileAppender.rollover(RollingFileAppender.java:111)
> - locked <0x00000000dfa5d460> (a ch.qos.logback.core.spi.LogbackLock)
> at ch.qos.logback.core.rolling.RollingFileAppender.subAppend(RollingFileAppender.java:144)
> - locked <0x00000000dfa5d600> (a ch.qos.logback.core.rolling.SizeBasedTriggeringPolicy)
> at ch.qos.logback.core.OutputStreamAppender.append(OutputStreamAppender.java:103)
> at ch.qos.logback.core.UnsynchronizedAppenderBase.doAppend(UnsynchronizedAppenderBase.java:88)
> at ch.qos.logback.core.spi.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:48)
> at ch.qos.logback.classic.Logger.appendLoopOnAppenders(Logger.java:280)
> at ch.qos.logback.classic.Logger.callAppenders(Logger.java:267)
> at ch.qos.logback.classic.Logger.buildLoggingEventAndAppend(Logger.java:449)
> at ch.qos.logback.classic.Logger.filterAndLog_0_Or3Plus(Logger.java:403)
> at ch.qos.logback.classic.Logger.log(Logger.java:803)
> at org.apache.commons.logging.impl.SLF4JLocationAwareLog.warn(SLF4JLocationAwareLog.java:185)
> ...
>
> 3.2. A multitude of threads are blocked and waiting to acquire that lock, e.g.:
>
> "Thread-55" #206 daemon prio=5 os_prio=0 tid=0x000000001f50d000 nid=0x7288 waiting for monitor entry [0x00000000263ef000]
> java.lang.Thread.State: BLOCKED (on object monitor)
> at ch.qos.logback.core.rolling.RollingFileAppender.subAppend(RollingFileAppender.java:142)
> - waiting to lock <0x00000000dfa5d600> (a ch.qos.logback.core.rolling.SizeBasedTriggeringPolicy)
> at ch.qos.logback.core.OutputStreamAppender.append(OutputStreamAppender.java:103)
> at ch.qos.logback.core.UnsynchronizedAppenderBase.doAppend(UnsynchronizedAppenderBase.java:88)
> at ch.qos.logback.core.spi.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:48)
> at ch.qos.logback.classic.Logger.appendLoopOnAppenders(Logger.java:280)
> at ch.qos.logback.classic.Logger.callAppenders(Logger.java:267)
> at ch.qos.logback.classic.Logger.buildLoggingEventAndAppend(Logger.java:449)
> at ch.qos.logback.classic.Logger.filterAndLog_0_Or3Plus(Logger.java:403)
> at ch.qos.logback.classic.Logger.log(Logger.java:803)
> at org.apache.commons.logging.impl.SLF4JLocationAwareLog.warn(SLF4JLocationAwareLog.java:185)
> ...
>
>
>
> I have attached the logging configuration from the Virgo server. It's called serviceability.xml. (That's the default configuration file name for the Virgo server.)
>
> I'm sorry the mail turned out so long. I hope this will not discourage you from reading it. I will be extremely grateful if you, guys, can help us figure out what makes Logback go crazy.
> _______________________________________________
> logback-user mailing list
> logback-user at qos.ch
> http://mailman.qos.ch/mailman/listinfo/logback-user
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.qos.ch/pipermail/logback-user/attachments/20160428/9300a119/attachment.html>
------------------------------
Subject: Digest Footer
_______________________________________________
logback-user mailing list
logback-user at qos.ch
http://mailman.qos.ch/mailman/listinfo/logback-user
------------------------------
End of logback-user Digest, Vol 55, Issue 12
********************************************
More information about the logback-user
mailing list