[Mondrian] optimization of expression and named set caching
John V. Sichi
jsichi at gmail.com
Thu Jun 7 03:42:34 EDT 2007
In the recent eigenchanges below, Rushan has been implementing
optimizations in this area to avoid unnecessarily clearing these
query-scoped caches (the previous overconservative clearing was required
in order to toss invalid values computed as a result of aggregate-lookup
The optimization work went through a few iterations, but we think it's
now close to as good as it gets. Please review and let us know if you
see any chance of incorrect results due to the optimizations.
For named set caching, we now avoid ever caching bad values (by making
sure that the call to evaluateExp returns a good value). So there's no
longer any need to clear the named set cache).
For expression caching, we do cache bad values temporarily, but we clear
them (and only them) where before we were clearing all cached values
(both good and bad) indiscriminately. We detect whether a value is good
or bad by checking to see whether the aggregate miss count has increased
during the evaluation of the expression.
Why do we bother caching bad values at all? This may seem
counterintuitive, but it turns out that a bad value can be requested
over and over before the outermost expression evaluation is finally done
(and then the result gets discarded anyway because it was derived from
bad lower-level values). So, if the bad value is expensive to compute,
doing it over and over would be really...bad.
A concrete example is FILTER(foo,RANK(bar,ORDER(...)) < limit). RANK
caches the ORDER argument. The outermost expression (FILTER) is going
to require a lot of iteration. The ORDER is expensive to compute
(regardless of whether it returns the correct value).
Rushan, correct me if I got any of this wrong. The interaction between
the "dry-run" batching cell reader and query-scoped caches is certainly
On a side note, we've noted a potential cache correctness problem in
Clearing the aggregate cache should clear the schema's non-empty tuple
set cache as well.
More information about the Mondrian