[logback-dev] About LBCORE-224...

Mon Oct 3 19:09:04 CEST 2011

On 30.09.2011, at 19:09, Holger Hoffstaette wrote:

> On Fri, 30 Sep 2011 18:36:54 +0200, Ceki Gülcü wrote:
> 
>> I don't see how anything a caller of logback, in this case zookeeper,
>> could do to cause the readLock to be unlocked before it is locked.
>> Actually, how could you trigger the observed behavior even if you wanted
>> to trigger it on purpose? Do you have any ideas?
> 
> The most common cause seems to be a swallowed inner exception. Several
> other findings seem to support this:
> 
> http://java.net/jira/browse/GLASSFISH-1881

They had a bug in their codebase that they subsequently fixed.

> http://lists.terracotta.org/pipermail/ehcache-list/2011-July/thread.html

They extended ReentrantReadWriteLock in a rather complex way and I'm pretty sure that they simply made a mistake somewhere.
See https://fisheye2.atlassian.com/browse/ehcache/tags/ehcache-core-2.4.2/src/main/java/net/sf/ehcache/store/compound/Segment.java?hb=true
I don't want to examine 909 lines of code to find the real reason of their issue.

> http://cs.oswego.edu/pipermail/concurrency-interest/2006-June/002738.html
> 
> See last one for a good example.

I can see the reason for that exception perfectly well. They just misused the locks.

> 
> Curiously it seems that this problem reared its head some time ago:
> http://qos.ch/pipermail/logback-dev/2009-February/003620.html
> though I think that was more about safe lock usage, which seems to be
> addressed now.
> 
> Maybe somehow there _is_ a silent exception happening between
> Logger.appendLoopOnAppenders() and AAI, since it also appears on the
> JIRA's stack trace? I'm not familiar enough with logback's internals to
> make more educated guesses right now.
> 
> The only thing I can immediately think of would be some weird recursive
> logging from a threaded appender, with bits of StackOverflow (maybe
> causing more logging..) thrown in.
> 
> -h
> 

I checked all the links above and they are all doing way more complex stuff.

The method in question is very, very simple:

  public int appendLoopOnAppenders(E e) {
    int size = 0;
    r.lock();
    try {
      for (Appender<E> appender : appenderList) {
        appender.doAppend(e);
        size++;
      }
    } finally {
      r.unlock();
    }
    return size;
  }

There is simply *no* way this exception should be able to happen.
r is a private final variable so, assuming no one is accessing that variable maliciously via reflection, locking is guaranteed to be symmetric. Usage of variable r in the rest of the class is also correct.

I'm just baffled by this issue. And I hate being baffled by something that is very obviously correct.
Yes, it's easy to make a mistake but there is just no apparent problem in the previous implementation of AppenderAttachableImpl. I triple-checked it.

This leaves only two options:
I either fundamentally misunderstand something about java concurrency (which I would be very eager to rectify!) or there is a serious bug in either the implementation of (at least unfair) ReentrantReadWriteLock and/or some Java VMs (which makes me highly nervous).

Joern.