[logback-dev] Improving Joran's RuleStore performance
Ceki Gülcü
ceki at qos.ch
Thu Jun 20 20:53:22 CEST 2013
I've done some work on a trie-based rule store for improving the
performance of Joran. The good news is that the initial version of the
trie performs 5x better than the list-based rule store. Moreover, the
performance of the trie could be significantly improved by using a
less brain-dead indexing scheme for child nodes.
As to the bad news, Joran's performance is impacted mostly by the XML
parser and the application of actions. Computations done by the
existing rule-store probably account for less than 20% of the overall
cost of invoking Joran.
You can get the code at [1]. My trie implementation should be easy to
understand for those familiar with tries.
[1] https://github.com/ceki/trie2
On 6/18/2013 1:18 PM, ceki wrote:
> Hello all,
>
> I am currently working on improving the performance of Joran rule
> store. As described in [1], Joran interprets XML files by applying
> rules while traversing the XML tree. A rule consists of a pattern and
> an action.
>
> While traversing the tree of XML elements, Joran asks its RuleStore
> for the Action applicable for the current element.
>
> Let assume that the RuleStore consists of just two rules
>
> rule1: "configuration" -> ConfigurationAction
> rule2: "configuration/statusListener" -> StatusListenerAction
>
> here is a short config file:
>
> <configuration>
> <statusListener class="..." />
> <logger .../>
> </configuration>
>
> for the <configuration> element, the rule 1 matches and
> ConfigurationAction is applicable; for the <statusListener> element
> nested within <configuration> rule 2 matches and StatusListenerAction
> is applicable; for the <logger> element there is no matching rule.
>
> Joran's RuleStore understands 3 types of patterns:
>
> 1) EXACT patterns, such as "configuration" and
> "configuration/statusListener"
>
> 2) PREFIX patterns, such as configuration/appender/sift/*
>
> 3) SUFFIX patterns, such as "*/param", "*/if", "*if/then"
>
> 4) SUB-STRING patterns, such as "*/if/then/*" and "*/if/else/*"
>
> Currently there are 27 rules using exact patterns, one rule using a
> prefix pattern, four rules using suffix patterns and two rules using
> sub-string patterns. The rules are defined in [2] and [3].
>
> The current implementation of the RuleStore [4] is rather naive. It
> iterates over the various patterns to see if there is a match.
>
> Placing the various patterns as keys in a trie seems like a good way
> of improving performance. BTW, Joran instances are not shared so that
> support for concurrent access is not required.
>
> Given that there are only a few rules, I was initially thinking of
> using trie nodes with 256 links. Assuming an average key length of 20
> and 30 rules, that would use about 600x256 node pointers bytes or
> 1.2MB of memory (assuming 8byte pointers). However, since many of the
> patterns share the same prefixes, it makes sense to compress the trie,
> as done in the Patricia trie.
>
> Anyway, for exact and prefix patterns, the trie data structure seems
> highly appropriate. For suffix patterns, a reverse trie could do the
> job. For sub-strings patterns, I guess looping over the patterns would
> be acceptable (there are only two sub-string patterns).
>
> I should note that when the traversing the XML tree, often enough none
> of the rules are applicable and Joran falls back on implicit
> rules. Thus, in practice, much of the configuration work is done by
> implicit actions which are triggered when the search for an explicit
> rule fails.
>
> Enough said.
>
> Your comments, questions and ideas for improvement are most welcome,
>
> [1] http://logback.qos.ch/manual/onJoran.html
> [2] http://tinyurl.com/JoranConfiguratorBase
> [3] http://tinyurl.com/JoranConfigurator
> [4] http://tinyurl.com/SimpleRuleStore
>
More information about the logback-dev
mailing list