[logback-dev] [JIRA] Updated: (LBCORE-63) Deadlock when running on multiple core processors

Tue May 19 18:04:10 CEST 2009

     [ http://jira.qos.ch/browse/LBCORE-63?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joern Huxhorn updated LBCORE-63:
--------------------------------

    Attachment: AppenderBaseLock.patch

Hi guys,

Our webapp running on a 4-core solaris is suffering* the same problem as described in http://jira.qos.ch/browse/LBCORE-63 and http://jira.qos.ch/browse/LBCORE-96 and I think I have an idea about the reason behind this.
Threading policy has changed from 1.5 to 1.6. I can't give you a reference for this, it's purely based on observation. In 1.5, a thread waiting for a lock was somewhat preferred by the VM over threads that would arrive at the wait() at a later time. In 1.6, however, a thread that releases synchronization is much more likely to immediately obtain the object lock again if it's in a consumer/producer loop.

for(;;)
{
	synchronized(obj)
	{
		// thread is likely to reobtain lock
	}
}

The same is the case for wait/notifyAll combinations. One can not assume that a longer waiting thread will likely obtain the lock after notifyAll.

This is documented in http://java.sun.com/javase/6/docs/api/java/lang/Object.html#notifyAll() so the changed behavior between 1.5 and 1.6 is perfectly legal.

My assumption is that the perceived deadlock, which is actually a congestion/starvation problem, occurs if one thread is logging "like crazy" while the other thread is sluggish as hell because he can't obtain the lock.

This should be solvable by using a fair ReentrantLock
(See http://java.sun.com/javase/6/docs/api/java/util/concurrent/locks/ReentrantLock.html#ReentrantLock(boolean) )

>From the javadoc:
"The constructor for this class accepts an optional fairness parameter. When set true, under contention, locks favor granting access to the longest-waiting thread. Otherwise this lock does not guarantee any particular access order. Programs using fair locks accessed by many threads may display lower overall throughput (i.e., are slower; often much slower) than those using the default setting, but have smaller variances in times to obtain locks and guarantee lack of starvation."

The guaranteed lack of starvation is what we are looking for, I think, This is something that pure synchronization simply can't provide.

I'd suggest to give the attached patch a try, I can't do so myself because I can't risk our webapp and I have no other chance to test this right now.

I'll crosspost this to both tickets cause I'm not sure which one is more "active" right now.

* The webapp shows that behavior in case of either console, file and/or lilith/socket appenders. It's just much more likely to happen with the lilith/socket appender in place since it takes significantly longer to send events over the wire than just writing them to disk or console.

> Deadlock when running on multiple core processors
> -------------------------------------------------
>
>                 Key: LBCORE-63
>                 URL: http://jira.qos.ch/browse/LBCORE-63
>             Project: logback-core
>          Issue Type: Sub-task
>    Affects Versions: 0.9.9
>         Environment: Operating System: Windows
> Platform: PC
>            Reporter: Toni Heimala
>            Assignee: Ceki Gulcu
>            Priority: Blocker
>         Attachments: AppenderBaseLock.patch, patch.txt, patch2.txt, patch3.txt
>
>
> When you run logging into same file from many threads on a system that has more than one physical processor (Dual Core for example), a deadlock will occur after a while. This can not be reproduced on HyperThreading processors. Here's an example program that will demonstrate the behavior:
> -----------------------------
> Main.java
> -----------------------------
> import java.util.Date;
> import java.util.concurrent.ScheduledThreadPoolExecutor;
> import java.util.concurrent.TimeUnit;
> import org.slf4j.Logger;
> import org.slf4j.LoggerFactory;
> import ch.qos.logback.classic.LoggerContext;
> import ch.qos.logback.classic.joran.JoranConfigurator;
> import ch.qos.logback.core.joran.spi.JoranException;
> public class Main extends Thread
> {
>     private final static String LOGGER_CONFIGURATION_FILE = "logger.xml";
>     private final Logger logger = LoggerFactory.getLogger(Main.class);
>     
>     private final long start;
>     
>     public Main()
>         throws JoranException
>     {
>         start = new Date().getTime();
>         LoggerContext lc = (LoggerContext)LoggerFactory.getILoggerFactory();
>         JoranConfigurator configurator = new JoranConfigurator();
>         lc.shutdownAndReset();
>         configurator.setContext(lc);
>         configurator.doConfigure(LOGGER_CONFIGURATION_FILE);
>     }
>     
>     public void start()
>     {
>         ScheduledThreadPoolExecutor ex1 = new ScheduledThreadPoolExecutor(1);
>         ScheduledThreadPoolExecutor ex2 = new ScheduledThreadPoolExecutor(1);
>         ScheduledThreadPoolExecutor ex3 = new ScheduledThreadPoolExecutor(1);
>         ScheduledThreadPoolExecutor ex4 = new ScheduledThreadPoolExecutor(1);
>         ScheduledThreadPoolExecutor ex5 = new ScheduledThreadPoolExecutor(1);
>         ex1.scheduleAtFixedRate(new Task("EX1"), 10, 10, TimeUnit.MICROSECONDS);
>         ex2.scheduleAtFixedRate(new Task("EX2"), 10, 10, TimeUnit.MICROSECONDS);
>         ex3.scheduleAtFixedRate(new Task("EX3"), 10, 10, TimeUnit.MICROSECONDS);
>         ex4.scheduleAtFixedRate(new Task("EX4"), 10, 10, TimeUnit.MICROSECONDS);
>         ex5.scheduleAtFixedRate(new Task("EX5"), 10, 10, TimeUnit.MICROSECONDS);
>         
>         super.start();
>     }
>     
>     public void run()
>     {
>         try
>         {
>             while(true)
>             {
>                 logger.debug("[MAIN] {}", new Date().getTime() - start);
>                 Thread.sleep(10);
>             }
>         }
>         catch (InterruptedException e)
>         {
>             logger.info("[MAIN]: Interrupted: {}", e.getMessage());
>         }
>     }
>     
>     public static void main(String[] args)
>     {
>         try
>         {
>             Main main = new Main();
>             main.start();
>         }
>         catch (JoranException e)
>         {
>             System.out.println("Failed to load application: " + e.getMessage());
>         }
>     }
> }
> -------------------------------
> Task.java
> -------------------------------
> import java.util.Date;
> import org.slf4j.Logger;
> import org.slf4j.LoggerFactory;
> public class Task implements Runnable
> {
>     private final Logger logger = LoggerFactory.getLogger(Task.class);
>     private final Logger logger_main = LoggerFactory.getLogger(Main.class);
>     private final String name;
>     private final long start;
>     
>     public Task(final String name)
>     {
>         this.name = name;
>         start = new Date().getTime();
>     }
>     public void run()
>     {
>         logger.debug("[{}] {}", name, new Date().getTime() - start);
>         logger_main.debug("[MAIN] - [{}] {}", name, new Date().getTime() - start);
>     }
> }

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.qos.ch/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira