[logback-dev] Improving Joran's RuleStore performance
ceki
ceki at qos.ch
Tue Jun 18 13:18:21 CEST 2013
Hello all,
I am currently working on improving the performance of Joran rule
store. As described in [1], Joran interprets XML files by applying
rules while traversing the XML tree. A rule consists of a pattern and
an action.
While traversing the tree of XML elements, Joran asks its RuleStore
for the Action applicable for the current element.
Let assume that the RuleStore consists of just two rules
rule1: "configuration" -> ConfigurationAction
rule2: "configuration/statusListener" -> StatusListenerAction
here is a short config file:
<configuration>
<statusListener class="..." />
<logger .../>
</configuration>
for the <configuration> element, the rule 1 matches and
ConfigurationAction is applicable; for the <statusListener> element
nested within <configuration> rule 2 matches and StatusListenerAction
is applicable; for the <logger> element there is no matching rule.
Joran's RuleStore understands 3 types of patterns:
1) EXACT patterns, such as "configuration" and
"configuration/statusListener"
2) PREFIX patterns, such as configuration/appender/sift/*
3) SUFFIX patterns, such as "*/param", "*/if", "*if/then"
4) SUB-STRING patterns, such as "*/if/then/*" and "*/if/else/*"
Currently there are 27 rules using exact patterns, one rule using a
prefix pattern, four rules using suffix patterns and two rules using
sub-string patterns. The rules are defined in [2] and [3].
The current implementation of the RuleStore [4] is rather naive. It
iterates over the various patterns to see if there is a match.
Placing the various patterns as keys in a trie seems like a good way
of improving performance. BTW, Joran instances are not shared so that
support for concurrent access is not required.
Given that there are only a few rules, I was initially thinking of
using trie nodes with 256 links. Assuming an average key length of 20
and 30 rules, that would use about 600x256 node pointers bytes or
1.2MB of memory (assuming 8byte pointers). However, since many of the
patterns share the same prefixes, it makes sense to compress the trie,
as done in the Patricia trie.
Anyway, for exact and prefix patterns, the trie data structure seems
highly appropriate. For suffix patterns, a reverse trie could do the
job. For sub-strings patterns, I guess looping over the patterns would
be acceptable (there are only two sub-string patterns).
I should note that when the traversing the XML tree, often enough none
of the rules are applicable and Joran falls back on implicit
rules. Thus, in practice, much of the configuration work is done by
implicit actions which are triggered when the search for an explicit
rule fails.
Enough said.
Your comments, questions and ideas for improvement are most welcome,
[1] http://logback.qos.ch/manual/onJoran.html
[2] http://tinyurl.com/JoranConfiguratorBase
[3] http://tinyurl.com/JoranConfigurator
[4] http://tinyurl.com/SimpleRuleStore
--
Ceki
65% of statistics are made up on the spot
More information about the logback-dev
mailing list