[logback-user] Unsuccessful delete in FixedWindowRollingPolicy causes disk trashing and cpu at 100%

Carl-Erik Kopseng carlerik at diffia.com
Wed Jun 23 06:06:40 CEST 2021


My problem is now down to finding out how logback (or the java app) can
hold *two* file handles for the same file.

I managed to verify that having the file open in another process, like
glogg, when the rollover happened was enough to trigger the issue. Using
Process Explorer from SysInternals or the Resource Monitor would show the
file handle being used by both processes and after closing the other
process. Once the other process was closed, another logging statement would
then again trigger the rollover and delete the file (*usually*):
[image: Skjermbilde 2021-06-23 kl. 00.56.54.png][image: Skjermbilde
2021-06-23 kl. 00.50.34.png]

The problem is that sometimes this situation arose:

[image: Skjermbilde 2021-06-23 kl. 00.15.44.png]

There are three file handles, of which two belong to the java process.
Killing the external process this time did nothing, as the java process
itself prevented itself from deleting the file. I am not sure how this can
happen? This is the real issue here. Either it is one thread holding two
references or two threads holding one each. Trying to find out how to see
which code/class/instance owns these file handles is a problem of its own
<https://stackoverflow.com/questions/68091703/find-out-which-class-instance-holds-a-reference-to-an-open-file>.
If you have any tips on how I can find out, please give me a tip - not
rocking the OQL :)

Seeing that I am able to verify the issue quite easily on the server
(lowering the file size for rollover to 50KB and opening the file in a log
viewer) makes me optimistic that I should be able to verify it locally, as
well. I think one fix for this issue is to simply try and truncate the file
on failing to delete, as writing obviously works. That would get rid of the
entire "recompress entire log file on each log statement" behavior it
currently displays, without any immediate downsides, AFAIK.


tir. 22. jun. 2021 kl. 17:28 skrev Carl-Erik Kopseng <carlerik at diffia.com>:

> Hi, for some weeks we have observed some really mysterious behavior when
> enabling one of our appenders. This appender was configured specially to
> use the `FixedWindowRollingPolicy`. What we observed, was that after some
> time of debug logging, the CPU would suddenly hit the roof and have at
> least one core fully pegged at 100%. Using Jprofiler we found the issue to
> be various cases of `logger.debug()` that would be used by this special
> appender. Disabling the appender immediately fixed the issue (as we have
> `scan=true`)
>
> I debugged this yesterday and found the answer partly in the logs and in
> the logback code. This is the line that basically shows the issue:
>
>    if (!file2zip.delete()) {
>         addStatus(new WarnStatus("Could not delete [" + nameOfFile2zip + "].", this));
>     }
>
>
> The logs were full of this and I also found references to this online. I
> documented the entire process and findings in this repo:
> https://github.com/fatso83/logback-bug. The repo also contains a minimal
> attempt at recreating what I saw on the Windows servers on my local
> machine.
>
> I was unable to reproduce this locally, which I kind of need in order to
> fix this bug. So, I was just wondering if any of you had any tips? What
> could cause the file to be "locked" / prevented from being deleted? Would
> it be enough if someone had a read-handle on the file, for instance a log
> reader like `glogg <https://glogg.bonnefon.org>`? I made sure to close my
> own programs when looking at this, but there are multiple users that look
> at these logs.
>
> There is a single JVM running this app.
>
> --
> Carl-Erik Kopseng
> Senior developer
> Diffia AS
>


-- 
Carl-Erik Kopseng
Senior developer
Diffia AS
m: +47 400 65 078 <+4740065078>
a: StartupLab, Gaustadalléen 21, 0349, Oslo, Norway
<https://maps.google.com/?q=Gaustadall%C3%A9en+21,+0349,+Oslo,+norway&entry=gmail&source=g>
<https://www.facebook.com/diffia/>
<https://www.linkedin.com/company/10602313/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.qos.ch/pipermail/logback-user/attachments/20210623/7686b396/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Skjermbilde 2021-06-23 kl. 00.15.44.png
Type: image/png
Size: 47713 bytes
Desc: not available
URL: <http://mailman.qos.ch/pipermail/logback-user/attachments/20210623/7686b396/attachment-0003.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Skjermbilde 2021-06-23 kl. 00.56.54.png
Type: image/png
Size: 29248 bytes
Desc: not available
URL: <http://mailman.qos.ch/pipermail/logback-user/attachments/20210623/7686b396/attachment-0004.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Skjermbilde 2021-06-23 kl. 00.50.34.png
Type: image/png
Size: 27878 bytes
Desc: not available
URL: <http://mailman.qos.ch/pipermail/logback-user/attachments/20210623/7686b396/attachment-0005.png>


More information about the logback-user mailing list