[logback-dev] [JIRA] Commented: (LBCORE-97) Starvation on AppenderBase.doAppend

Sun May 31 15:11:34 CEST 2009

Hi Ralph.

On 31.05.2009, at 09:21, Ralph Goers wrote:

>
> On May 30, 2009, at 6:23 PM, Joern Huxhorn wrote:
>
>> Let's move this discussion over to the dev list to leave the bug  
>> alone...
>>
>> On 30.05.2009, at 20:52, Ralph Goers (JIRA) wrote:
>>
>>>
>>>  [ http://jira.qos.ch/browse/LBCORE-97?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=11134#action_11134 
>>>  ]
>>>
>>> Ralph Goers commented on LBCORE-97:
>>> -----------------------------------
>>>
>>> Your notion that UnsynchronizedAppenderBase is "downright  
>>> dangerous" is complete nonsense. That is like saying multithreaded  
>>> programming is dangerous so don't do it.
>>>
>>
>> I'd call "complete nonsense" a bit harsh. It's only one step away  
>> from bullshit ;)
>
> Yes, I tried to think of a phrase that wasn't so harsh but came up  
> short. I probably wouldn't have replied quite that way if it was on  
> this list, but in a Jira issue where it will more easily be  
> referenced I felt it appropriate to word it a bit more strongly and  
> a bit less diplomatic.

No problem ;)

>
>
>>
>> What I would say, however, is that multi-threaded programming is  
>> more complex and dangerous than single-threaded, so use it with care.
>>
>>> The Appender that extends UnsynchronizedAppenderBase is called on  
>>> the same thread that called the logger method. All  
>>> UnsynchronizedAppenderBase does is leave whatever synchronization  
>>> is necessary up to the actual Appender. If the appender is going  
>>> to pass work of to some other thread then the work queue elements  
>>> need to be fully populated with data on the thread creating the  
>>> work elememnts, i.e. the same thread that called the logging  
>>> method. Trying to have the worker thread call methods on objects  
>>> in a logger event is a very poor design.
>>>
>>
>> I agree 100%.
>> The only thing that I don't like about UnsynchronizedAppenderBase  
>> is that it's more error prone because everything you mentioned in  
>> the last paragraph must be implemented by every developer that  
>> extends UnsynchronizedAB.
>>
>> I know all the above, you know it, Ceki knows it, but lots of  
>> people won't have enough know how to know it.
>>
>> Asynchronous appenders can be implemented using the synchronous  
>> AppenderBase. Just put the LoggingEvent => DataObject  
>> transformation in the doAppend call and handle all the rest in  
>> different threads.
>
> Well, yes. But that introduces needless inefficiency. The Appender  
> will be synchronized while adding the work to a queue. But the queue  
> must also be synchronized so that the worker thread can remove it  
> from the queue safely.
>
> The synchronization of AppenderBase is, IMO, a mistake. Log4j did it  
> and they have plenty of bugs reported about deadlocks. Ceki has  
> already had a couple here. They aren't actually caused by Logback of  
> course, but they often appear to be which is almost as bad as if  
> they were. Every Appender in Logback could be modified so that the  
> synchronization could be moved to a smaller granularity.  
> Unfortunately, some of the layout processing isn't thread-safe so  
> there isn't much point in trying to do that until that is changed.
>

The sledgehammer synchronization using synchronized on the instance  
itself is definitely a mistake as I've shown in the bug report.
A first (and absolutely necessary step!!) is to replace it with a fair  
ReentrantLock so Logback isn't responsible for starvation situations  
anymore.

>>
>>
>>> In many cases the only synchronization needed in an appender will  
>>> be in the JDK itself when it performs some kind of I/O.
>>>
>>> There have been many demonstrated examples where the  
>>> synchronization that is occuring within logback has caused  
>>> deadlocks because the application using logback was unaware that  
>>> the locking would occur. UnsynchronizedAppenderBase actually helps  
>>> with avoiding deadlocks since the locking moves down to where it  
>>> is actually needed, not at some higher level "just in case".
>>>
>>
>> I see your point.
>> You want to perform the LE => DO transformation without a lock and  
>> lock afterwards while performing the actual appending. That's a  
>> *very* good idea and would indeed have the potential to solve  
>> issues like http://www.nabble.com/Best-practices-to-avoid-deadlocks-while-logging-td23639031.html 
>>  ...
>>
>> The only real problem I see is the one I showed in the bug.
>>
>> It's absolutely crucial that appenders perform fair locking and  
>> most people (including Ceki and me) are/were absolutely unaware  
>> that synchronized can be very unfair - even though it *is*  
>> documented!
>> Leaving the synchronization to the appender implementation is  
>> therefore somewhat dangerous :p and can result in starvation if  
>> done wrong. And it's easy to do it wrong.
>
> I have news for you. In a multithreaded program you aren't going to  
> be able to insure that log records ever occur in a particular order  
> unless the application does the synchronization itself. For example,  
> it is highly likely you will see logging like:
> Thread 1 - about to do action 1
> Thread 2 - about to do action 1
> Thread 1 - action 1 performed
> Thread 3 - about to do action 1
> Thread 3 - action 1 performed
> Thread 2 - action 1 performed
>
> This is regardless of whatever locking is going on in Logback. And,  
> of course, there is no assurance that the order of the log entries  
> corresponds to the order in which the actions actually occurred.

That's not news but the whole point of multi-threaded programming ;)

>
>>
>>
>> It would probably be a good idea to write a base class that  
>> performs the LE => DO transformation (converting message arguments,  
>> transforming MDC to String, etc.) without a lock, handing the DO to  
>> a method that performs a fair lock which in turn executes an  
>> abstract method that performs the actual work?
>> I'm just brainstorming here... My primary aim is just to prevent a  
>> situation like the one in the bug in lots of appenders that extend  
>> UnsynchronizedAppenderBase.
>> We'd need dedicated Logback Logging(- and Access?)DOs in that case.
>>
> I'm not sure what "bug in lots of appenders that extend  
> UnsynchronizedAppenderBase" you are talking about. In reading your  
> opening remarks you discuss "fairness" in locking. But it doesn't  
> really matter.

Whoa, it doesn't really matter?? This is complete nonsense ;), read  
below.

> Thread 1 can call logger.info() and before the appender even gets a  
> chance to lock Thread 2 can call logger.info(). The CPU might give  
> conrol to Thread 2 and let it finish ahead of thread 1. Even if you  
> have multiple CPUs, as long as you have more threads hitting the  
> same logic than there are CPUs you will run into this issue. This is  
> just a fact of life when dealing with multithreaded applications. If  
> you look at a standard debug log with a fair amount of logging you  
> will see this all the time. It isn't readily apparent because the  
> granularity of the time is usually a millisecond, but occasionally  
> you will see log records that appear to be out of order.
>

I'm talking about the following situation in case of JDK 1.6 on multi- 
core systems:
Results for usingSynchronized:
runnables[0]: SynchronizedRunnable[counter=394, running=false]
runnables[1]: SynchronizedRunnable[counter=1, running=false]
runnables[2]: SynchronizedRunnable[counter=1, running=false]
runnables[3]: SynchronizedRunnable[counter=1, running=false]
runnables[4]: SynchronizedRunnable[counter=1, running=false]
runnables[5]: SynchronizedRunnable[counter=1, running=false]
runnables[6]: SynchronizedRunnable[counter=1, running=false]
runnables[7]: SynchronizedRunnable[counter=1, running=false]
runnables[8]: SynchronizedRunnable[counter=1, running=false]
runnables[9]: SynchronizedRunnable[counter=605, running=false]

This is the result of the app appended to http://jira.qos.ch/browse/LBCORE-97 
  in that case.
Please take a look at the code. All 10 threads are executing the same  
Runnable:
public static class SynchronizedRunnable
	implements Runnable
{
	private final Object lockObject;
	private int counter;
	private boolean running;

	public SynchronizedRunnable(Object lockObject)
	{
		this.lockObject=lockObject;
		this.counter=0;
		this.running=false;
	}

	public void run()
	{
		running=true;
		for(;;)
		{
			synchronized(lockObject)
			{
				counter++;
				try
				{
					Thread.sleep(10);
				}
				catch(InterruptedException ex)
				{
					break;
				}
			}
		}
		running=false;
	}

	public String toString()
	{
		return "SynchronizedRunnable[counter="+counter+", running="+running 
+"]";
	}
}

I'm simulating the AppenderBase situation here, lots of threads are  
trying to access the very same resource.
With Java 1.6 on multi-core systems, only <number of cores> threads  
will ever get the chance to obtain the object monitor. All other  
threads are starving.
And this is a valid simplification because it shows exactly what's  
happening in our webapp right now. We had to disable logging to get it  
back into a usable state.

Using a fair ReentrantLock fixes this. It makes sure that threads are  
handled fairly concerning the shared resource, i.e. they are served in  
a "first come, first served" order.

I assume you hadn't understood the point of LBCORE-97 until now... and  
please don't take that as an insult ;)
I wrote code under the assumption that synchronized would be at least  
sufficiently semi-fair, Ceki did in both LOG4J and Logback, and  
reality up until Java 1.6 on multi-core seemed to indicate that it's  
ok to assume that total fairness isn't necessary... but all this  
changed because synchronized is absolutely unfair now - and the  
documentation shows that the fault is on our side. We were simply  
working with wrong assumptions.

This just proves my point that locking should be handled by the  
logging system in a correct, i.e. fair, way. It is "downright  
dangerous" to assume that every person that implements an appender is  
aware of this situation and the problem will only show up on Java 1.6  
multi-core systems. This is a *very* big risk.

But I agree 100% that the transformation/evaluation of the logging  
event can and should be handled outside of whatever fair locking  
mechanism is used. This would solve some issues and seems very  
logical. We should definitely aim for this in the future... Ideally,  
AppenderBase would be changed to support this, IMHO.

It would probably be a better idea to use a

protected final ReentrantReadWriteLock readWriteLock=new  
ReentrantReadWriteLock(true);

fair ReentrantReadWriteLock in AppenderBase instead.
doAppend would then use the write lock while actual implementations  
would have the freedom of using either read or write lock, as needed.  
This would remove the need of synchronizing the worker queue you  
mentioned earlier. I'm not sure if we really need a ReadWriteLock,  
though, as removal from the queue would be a write, too. But *some*  
appender implementation could have a use case for read-only-access...

One way to handle transformation of the event outside the lock is what  
I suggested in http://jira.qos.ch/browse/LBCLASSIC-45.
Just dumb down LoggingEvent enough so that it already is detached from  
the rest of the system after it's created. But someone, and I honestly  
can't remember but I think it was you, said that it would be bad if  
the MDC was already transformed into Map<String,String> at that point...

I've just changed my patch to make lock protected instead of private.  
Seems to be a much better idea :p

And just to get my point straight once more:
A logging system MUST NOT be a point of starvation! It will naturally  
always be a chokepoint but it MUST NEVER be a point of starvation!

Regards,
Joern.

> Ralph
> _______________________________________________
> logback-dev mailing list
> logback-dev at qos.ch
> http://qos.ch/mailman/listinfo/logback-dev