[slf4j-user] Best practice for logging XML and other byte oriented formats with slf4j?

Maarten Bosteels mbosteels.dns at gmail.com
Fri Nov 20 00:04:06 CET 2009


On Thu, Nov 19, 2009 at 10:33 PM, Joern Huxhorn <jhuxhorn at googlemail.com>wrote:

>
> On 19.11.2009, at 19:06, Thorbjoern Ravn Andersen wrote:
>
> > Hi.
> >
> > We have reached a situation where I basically want to log a data
> structure in order to be able to process it later.
> >
> > After a bit of pondering, I have concluded that the best approach for us
> to do this would be to use the XMLEncoder/XMLDecoder in Java 1.4+ and log
> the generated XML snippets.
> >
> > The issue I want to solve is that the XMLEncoder writes an UTF-8 encoded
> XML file to an OutputStream, i.e. a byte oriented destination.  To the best
> of my knowledge the slf4j backends all deal with Strings, i.e. character
> oriented destinations and the output files are written in the default
> encoding for the platform.
> >
> > The question now is, what is the best way to handle the OutputStream
> generated by XMLEncoder so it will survive all attempts to mess up any
> unicode characters inside due to encoding differences on the way.  I will be
> using a custom layout anyway so much can be done :)  A humanly readable
> transport format will be preferred.
> >
>
> XMLEncoder will have a severe impact on performance, I've tested this
> extensively.
> Have a look at
> http://sourceforge.net/apps/trac/lilith/wiki/SerializationPerformance
> In my testcases, XMLEncoder serialized 300 events while a protobuf
> serializer managed to handle nearly 10.000!
> I'd therefore suggest that you take a mixed approach. Using protobuf to
> serialize the events to a file and writing an additional converter to
> convert that files to whatever you'd like as XML-Output as needed.
>
> A discussion about such a topic was started here:
> http://marc.info/?l=logback-dev&m=124905434331308&w=2 but I completely
> forgot to file an RFE for it.
> I've done just that now, thanks for the reminder!
> http://jira.qos.ch/browse/LBCORE-128
>

I agree with Joern that XMLEncoder is not really suited when throughput is
important to you.


> > My current thoughts is to use a ByteArrayOutputStream and generate a
> String using the UTF-8 decoding.  The resulting string contains a <?xml ...
> encoding="UTF-8"?> which is stripped resulting in an XML String containing
> Unicode chars (instead of encoded bytes).
>

What is the difference between "Unicode chars" and "encoded bytes" ?
Every unicode codepoint has to be encoded somehow, no ?  UTF-8 is one way to
encode the codepoint (and imho the encoding everyone should use)


> This can then be flattened to an ASCII version, by converting all
> characters outside 32..127 to their numeric entity (&#1234;), and THAT can
> be safely logged.  I guess :)
> >
>

If you want to use XML, then I really don't see the problem with leaving it
in UTF-8 ?
Especially since you state that "a humanly readable transport format will be
preferred."  I would prefer to see Σ instead of &#931;

Of course, it should be possible to tell the XMLEncoder which encoding to
use (instead of using the default encoding of the platform).

regards,
Maarten


> That would probably work but it would further decrease the serialization
> speed.
> Logback (assuming you use Logback) should really support binary, i.e.
> byte-based, logfiles since this would really make a major performance
> difference. This should be discussed over at logback-dev, though.
>
> > I'd appreciate comments on my thoughts, as this is a rather important
> intermediate step in us using log files to store information which can be
> used to simulate an external system when replaying an interesting sesion.
> >
> >
>
> HTH,
> Joern.
>
> _______________________________________________
> user mailing list
> user at slf4j.org
> http://www.slf4j.org/mailman/listinfo/user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://qos.ch/pipermail/slf4j-user/attachments/20091120/ed26af01/attachment.htm>


More information about the slf4j-user mailing list