[logback-dev] [JIRA] Commented: (LBCORE-97) Starvation on AppenderBase.doAppend

Fri May 29 17:45:10 CEST 2009

    [ http://jira.qos.ch/browse/LBCORE-97?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=11127#action_11127 ] 

Joern Huxhorn commented on LBCORE-97:
-------------------------------------

I could try to provide a sample program that shows the general problem of starvation.
Would this have to be logback related or is it sufficient if I show that what the documenation says is correct?

Or are you talking about a real JUnit testcase that's using logback? Would this be worth anything if it fails on some systems and works on others? As always, assumptions about absolute program execution are pretty hard to make when it comes to multi-threaded applications.

The example attached to LBCORE-96 shows that the threads are "synchronized" with the patch in place, i.e. every pool gets the chance to log a message

2009-05-19 22:03:14,520 DEBUG [pool-1-thread-1] [EX1] 5936
2009-05-19 22:03:14,520 DEBUG [pool-3-thread-1] [MAIN] - [EX3] 5925
2009-05-19 22:03:14,520 DEBUG [pool-2-thread-1] [EX2] 5926
2009-05-19 22:03:14,520 DEBUG [pool-5-thread-1] [MAIN] - [EX5] 5900
2009-05-19 22:03:14,520 DEBUG [pool-4-thread-1] [MAIN] - [EX4] 5925

2009-05-19 22:03:14,521 DEBUG [pool-1-thread-1] [MAIN] - [EX1] 5937
2009-05-19 22:03:14,521 DEBUG [pool-3-thread-1] [EX3] 5926
2009-05-19 22:03:14,521 DEBUG [pool-2-thread-1] [MAIN] - [EX2] 5927
2009-05-19 22:03:14,521 DEBUG [pool-5-thread-1] [EX5] 5901
2009-05-19 22:03:14,521 DEBUG [pool-4-thread-1] [EX4] 5926

I inserted newlines to better show the point. It's always executing 1-3-2-5-4.

Without the patch, the threads are executed unevenly. This is solely caused by the non-fairness of synchronized as the rest of the code is untouched.

The only thing that I could do is to write some program that shows this more drastically, e.g. letting 10 threads run for some time in an endless loop, each containing a
synchronized(sameObject) block, counting the times they loop.
After a specific time, e.g. 10 sec, I could interrupt all threads and they would certainly have quite different counter values (the amount of difference would vary from system to system and VM to VM).

Running the same code with a fair ReentrantLock would generate results with (nearly, because of the interruption of the threads) identical numbers. That's the whole point of fair locks.

I had the exact same wrong assumption about fairness in a wait()/notifyAll() situation in Lilith. A thread with the purpose of removing received events from an intermittent cyclic buffer just wouldn't get the chance to remove the events. This was caused by updating the used JRE to Java 1.6. It took me a while to understand the problem and I was quite baffled.

> Starvation on AppenderBase.doAppend
> -----------------------------------
>
>                 Key: LBCORE-97
>                 URL: http://jira.qos.ch/browse/LBCORE-97
>             Project: logback-core
>          Issue Type: Bug
>          Components: Appender
>    Affects Versions: 0.9.15
>            Reporter: Joern Huxhorn
>            Assignee: Logback dev list
>            Priority: Critical
>         Attachments: AppenderBaseLock.patch
>
>
> The problem we are facing here is that several threads are trying to obtain the object monitor of the exact same resource, namely the Appender.
> The assumption that multiple threads waiting for ownership of a monitor would receive the monitor in the order that they tried to acquire it is simply incorrect. This is documented behavior.
> See the last paragraph of [1]:
> "Likewise, no assumptions should be made about the order in which threads are granted ownership of a monitor or the order in which threads wake in response to the notify or notifyAll method. An excellent reference for these topics is Chapter 9, "Threads," in Joshua Bloch's book Effective Java Programming Language Guide. "
> The documentation of Object.notifyAll() [2] states the following:
> "[..] The awakened threads will compete in the usual manner with any other threads that might be actively competing to synchronize on this object; for example, the awakened threads enjoy no reliable privilege or disadvantage in being the next thread to lock this object."
> The documentation in [5] also lists the following as a weak-spot of Built-in Synchronization in J2SE 1.4.x:
> "No way to alter the semantics of a lock, for example, with respect to reentrancy, read versus write protection, or fairness."
> In LBCORE-96 Ceki stated the following:
> "Logback is using the standard synchronization mechanism found in the JDK. You are saying that that mechanism is unsuitable which, truth be told, sounds quite suspicious."
> Yes, it's unsuitable in this situation because obtaining the object monitor is not guaranteed to be fair. It's not working in a "first come, first serve" manner. I assumed that, too, but it isn't the case. I had the exact same problem in Lilith some time ago because I made exactly this very same wrong assumption.
> Fairness of object monitor lock acquisition seems to be logical and a "good thing" but it's not specified that way, and for good reasons.
> Without fairness in place, the VM can optimize the execution of an application much better. A context switch is a costly operation for a CPU so performance is increased significantly if such a switch can be omitted.
> Concerning a test-case, this would be pretty hardcore to implement since it's 100% implementation dependent. One implementation *may* handle the locking of object monitors fairly while others don't.
> Therefore I'll try the following first:
> I assume I could convince you that object monitor acquisition (OMA) is not fair.
> If we take that for granted the following scenario should show my point:
> There are 4 Threads (e.g. 4 Threads handling concurrent webapp-requests) that have one single chokepoint, the appender.
> Since OMA isn't fair, it's possible that only some of those threads can work at all (see [3]). Exactly that is happening right now in our webapp running on a 4-core Solaris on JDK6.
> My next assumption is that we both agree that this isn't acceptable behavior.
> Logging should be handled "first come, first served" so that if Thread A is waiting to append an event and Thread B is waiting to append an event subsequently, the actual appending order should be A, then B.
> This is currently not the case. It *may* be the case but there is no guarantee of it.
> One could even argue that the logging system is working incorrectly (aside from the starvation problem) because events are appended in a different order than the actual execution of the logger call.
> The only way to prevent this is the introduction of fairness into the locking process. The way to do this is to use ReentrantLock [4].
> From the ReentrantLock javadoc:
> "The constructor for this class accepts an optional fairness parameter. When set true, under contention, locks favor granting access to the longest-waiting thread. Otherwise this lock does not guarantee any particular access order. Programs using fair locks accessed by many threads may display lower overall throughput (i.e., are slower; often much slower) than those using the default setting, but have smaller variances in times to obtain locks and guarantee lack of starvation."
> This is exactly what is absolutely necessary for a logging framework.
> Logging must *NOT* be able to introduce a starvation problem into a multi-threaded application! And it does exactly that right now.
> I doubt that the performance will decrease in a significant manner due to the use of a fair ReentrantLock but even if this would be the case it would be necessary anyway.
> Otherwise, appending simply isn't working correctly.
> [1] http://java.sun.com/j2se/1.5.0/docs/guide/vm/thread-priorities.html
> [2] http://java.sun.com/javase/6/docs/api/java/lang/Object.html#notifyAll()
> [3] http://en.wikipedia.org/wiki/Starvation_(computing)
> [4] http://java.sun.com/javase/6/docs/api/java/util/concurrent/locks/ReentrantLock.html
> [5] http://java.sun.com/developer/technicalArticles/J2SE/concurrency/

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.qos.ch/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira