[logback-dev] [JIRA] Commented: (LBCORE-168) Locking issues when prudent logging to a (windows) clustered storage device on owner-switch
noreply-jira at qos.ch
Fri Sep 17 11:11:51 CEST 2010
[ http://jira.qos.ch/browse/LBCORE-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=11831#action_11831 ]
alex commented on LBCORE-168:
I am not a windows expert at all so I may explain it wrong, but the gist of it is: we have a SAN and two windows servers. The windows servers can each have "ownership" of the SAN (active/passive). This can be configured using the "Cluster Administrator" that comes with windows. If a node goes down, the ownership is transferred to the other node (we can also switch ownership manually). During such a switch the network drive is inaccessible for ~10 seconds.
I do not know how windows handles this internally nor how the jvm handles such a case, but once the ownership is switched, the output written to an outputstream goes nowhere. I have not been able (using the filechannel etc) to detect such a case, but I do not have extensive experience with the nio package.
In case of the file locking, I just had a talk with someone more knowledge on the subject of windows and he says that the file lock would be kept by the server that owns the drive at that time. Switching owner would mean that the new owner has no idea that there was a lock on the file, but the jvm would still have it in memory. Maybe this creates a situation where java is unable to release the lock and from the jvm perspective the file stays locked?
> Locking issues when prudent logging to a (windows) clustered storage device on owner-switch
> Key: LBCORE-168
> URL: http://jira.qos.ch/browse/LBCORE-168
> Project: logback-core
> Issue Type: Bug
> Components: Appender
> Affects Versions: 0.9.24
> Environment: Windows Server 2003 Enterprise edition
> Reporter: alex
> Assignee: Logback dev list
> Priority: Blocker
> We have a clustered windows environment (2 nodes) and each node is logging to the same file which is stored on a clustered storage device (prudent is set to true)
> Logging works great until we switch the owner of the storage device where the logs are being written to at which point one of two things happens:
> - one or both servers go into a sort of deadlock state, meaning the code that logs just hangs indefinatly. it seems the safeWrite() in FileAppender may be to blame because of the "lock()" statement (note that a jvm reboot will fix this problem). I have been able to reproduce this locked state and verify that it is the lock() by trying to get a lock on the log file in the same jvm but without using logback code. This code will hang as well.
> - if there is no deadlock, any log written after the switch will just vanish into thin air (I have no idea where the outputstream points to when you start switching the owner)
> I have no idea how this can be fixed without resorting to a reload/reboot of our servers (which is not an option in production).
> We used to log synchronously, but then each thread that called logback would hang resulting in a massive amount of idling threads, so now we log asynchronously (this was already an option in the system we built around it) which gives us one advantage: we never write concurrently to the same file, although different nodes may write serially.
> As a temporary work around I am thinking to implement a simple appender that opens/closes the outputstream for each write (performance is less of an issue) and that does not implement locking.
This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: http://jira.qos.ch/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the logback-dev