[slf4j-user] Best practice for logging XML and other byte oriented formats with slf4j?

Fri Nov 20 19:44:17 CET 2009

Maarten Bosteels skrev:
>
>
>     XMLEncoder will have a severe impact on performance, I've tested
>     this extensively.
>     Have a look at
>     http://sourceforge.net/apps/trac/lilith/wiki/SerializationPerformance
>     In my testcases, XMLEncoder serialized 300 events while a protobuf
>     serializer managed to handle nearly 10.000!
>     I'd therefore suggest that you take a mixed approach. Using
>     protobuf to serialize the events to a file and writing an
>     additional converter to convert that files to whatever you'd like
>     as XML-Output as needed
>

I think I didn't catch on to that discussion when you had it. Probably 
because I didn't understand it enough from a brief skim :)

>     A discussion about such a topic was started here:
>     http://marc.info/?l=logback-dev&m=124905434331308&w=2
>     <http://marc.info/?l=logback-dev&m=124905434331308&w=2> but I
>     completely forgot to file an RFE for it.
>     I've done just that now, thanks for the reminder!
>     http://jira.qos.ch/browse/LBCORE-128
>
>
> I agree with Joern that XMLEncoder is not really suited when 
> throughput is important to you.
For our purpose these "log this complex object" happen rarely enough 
that we are willing to accept a penalty here, to get a humanly readable 
rendering.

>
>
>     > My current thoughts is to use a ByteArrayOutputStream and
>     generate a String using the UTF-8 decoding. The resulting string
>     contains a <?xml ... encoding="UTF-8"?> which is stripped
>     resulting in an XML String containing Unicode chars (instead of
>     encoded bytes).
>
>
> What is the difference between "Unicode chars" and "encoded bytes" ?
I am talking about internal representation as char's and the encoded 
version which is a stream of bytes (which usually is put raw in a file).

> Every unicode codepoint has to be encoded somehow, no ? UTF-8 is one 
> way to encode the codepoint (and imho the encoding everyone should use)
>
>     This can then be flattened to an ASCII version, by converting all
>     characters outside 32..127 to their numeric entity (&#1234;), and
>     THAT can be safely logged. I guess :)
>     >
>
>
> If you want to use XML, then I really don't see the problem with 
> leaving it in UTF-8 ?
There is absolutely no guarantee that the final destination of the log 
string will be able to handle UTF-8 encoded strings. How does UTF-8 
encoded strings end up looking when written using MacRoman under OS X?

> Especially since you state that "a humanly readable transport format 
> will be preferred." I would prefer to see Σ instead of &#931;
>
> Of course, it should be possible to tell the XMLEncoder which encoding 
> to use (instead of using the default encoding of the platform).
>
XMLEncoder does not have the encoding public. Bah :)

-- 
  Thorbjørn Ravn Andersen  "...plus... Tubular Bells!"