[Mondrian] optimization of expression and named set caching

John V. Sichi jsichi at gmail.com
Thu Jun 7 03:42:34 EDT 2007


In the recent eigenchanges below, Rushan has been implementing 
optimizations in this area to avoid unnecessarily clearing these 
query-scoped caches (the previous overconservative clearing was required 
in order to toss invalid values computed as a result of aggregate-lookup 
misses).

9416
9394
9369
9336

The optimization work went through a few iterations, but we think it's 
now close to as good as it gets.  Please review and let us know if you 
see any chance of incorrect results due to the optimizations.

For named set caching, we now avoid ever caching bad values (by making 
sure that the call to evaluateExp returns a good value).  So there's no 
longer any need to clear the named set cache).

For expression caching, we do cache bad values temporarily, but we clear 
them (and only them) where before we were clearing all cached values 
(both good and bad) indiscriminately.  We detect whether a value is good 
or bad by checking to see whether the aggregate miss count has increased 
during the evaluation of the expression.

Why do we bother caching bad values at all?  This may seem 
counterintuitive, but it turns out that a bad value can be requested 
over and over before the outermost expression evaluation is finally done 
(and then the result gets discarded anyway because it was derived from 
bad lower-level values).  So, if the bad value is expensive to compute, 
doing it over and over would be really...bad.

A concrete example is FILTER(foo,RANK(bar,ORDER(...)) < limit).  RANK 
caches the ORDER argument.  The outermost expression (FILTER) is going 
to require a lot of iteration.  The ORDER is expensive to compute 
(regardless of whether it returns the correct value).

Rushan, correct me if I got any of this wrong.  The interaction between 
the "dry-run" batching cell reader and query-scoped caches is certainly 
tricky.

On a side note, we've noted a potential cache correctness problem in 
this document:

http://docs.eigenbase.org/MondrianCacheOverview#Non-Empty_Tuple_Set_Cache

Clearing the aggregate cache should clear the schema's non-empty tuple 
set cache as well.

JVS



More information about the Mondrian mailing list