[slf4j-user] Best practice for logging XML and other byte oriented formats with slf4j?
Joern Huxhorn
jhuxhorn at googlemail.com
Thu Nov 19 22:33:06 CET 2009
On 19.11.2009, at 19:06, Thorbjoern Ravn Andersen wrote:
> Hi.
>
> We have reached a situation where I basically want to log a data structure in order to be able to process it later.
>
> After a bit of pondering, I have concluded that the best approach for us to do this would be to use the XMLEncoder/XMLDecoder in Java 1.4+ and log the generated XML snippets.
>
> The issue I want to solve is that the XMLEncoder writes an UTF-8 encoded XML file to an OutputStream, i.e. a byte oriented destination. To the best of my knowledge the slf4j backends all deal with Strings, i.e. character oriented destinations and the output files are written in the default encoding for the platform.
>
> The question now is, what is the best way to handle the OutputStream generated by XMLEncoder so it will survive all attempts to mess up any unicode characters inside due to encoding differences on the way. I will be using a custom layout anyway so much can be done :) A humanly readable transport format will be preferred.
>
XMLEncoder will have a severe impact on performance, I've tested this extensively.
Have a look at http://sourceforge.net/apps/trac/lilith/wiki/SerializationPerformance
In my testcases, XMLEncoder serialized 300 events while a protobuf serializer managed to handle nearly 10.000!
I'd therefore suggest that you take a mixed approach. Using protobuf to serialize the events to a file and writing an additional converter to convert that files to whatever you'd like as XML-Output as needed.
A discussion about such a topic was started here: http://marc.info/?l=logback-dev&m=124905434331308&w=2 but I completely forgot to file an RFE for it.
I've done just that now, thanks for the reminder!
http://jira.qos.ch/browse/LBCORE-128
> My current thoughts is to use a ByteArrayOutputStream and generate a String using the UTF-8 decoding. The resulting string contains a <?xml ... encoding="UTF-8"?> which is stripped resulting in an XML String containing Unicode chars (instead of encoded bytes). This can then be flattened to an ASCII version, by converting all characters outside 32..127 to their numeric entity (Ӓ), and THAT can be safely logged. I guess :)
>
That would probably work but it would further decrease the serialization speed.
Logback (assuming you use Logback) should really support binary, i.e. byte-based, logfiles since this would really make a major performance difference. This should be discussed over at logback-dev, though.
> I'd appreciate comments on my thoughts, as this is a rather important intermediate step in us using log files to store information which can be used to simulate an external system when replaying an interesting sesion.
>
>
HTH,
Joern.
More information about the slf4j-user
mailing list