thejoe at gmail.com
Tue Jan 18 17:32:45 EST 2011
After spending the past few months looking at mondrian profiles, it
definitely seems like optimizing RolapEvaluator.push() would be a big
win. Both RolapEvaluator.push() and RolapEvaluator.setContext() show
prominently as being called from just about everywhere in the call
stack -- thousands or milllions of invocations per query execution.
Would just have to measure the before and after to be sure that
push()-as-copy with no pop is actually slower than
push()-onto-stack-and-then-rollback; seems like it should be, but you
never know with these types of changes... Even if it ends up being
approximately the same, though, IF the cellrequest construction can be
dramatically sped up by precomputing it in the evaluator, that part of
it sounds the most promising, based on cellrequest construction &
lookup being some of the largest parts of the profiles I've been
looking at. Would be a 3.3-timeframe change, or would it have to wait
until 4.0? (tangentially related question: should I be working on the
parallel execution stuff in 3.3 or somewhere else?)
On Sat, Jan 15, 2011 at 9:26 PM, Julian Hyde <jhyde at pentaho.com> wrote:
> I've noticed that RolapEvaluator.push() occurs a lot in
> computation-intensive queries, and it's not a cheap operation. Each
> evaluator contains 16 members, 5 of which are lists or arrays that need to
> be copied.
> So, I'm thinking of changing the evaluator API. When you want to save a
> context, you would no longer create a copy of the evaluator. Instead you
> would create a savepoint that would allow the evaluator to roll back to its
> previous state.
> RolapEvaluator.push() would no longer return a RolapEvaluator; it would
> return an int, the index on the stack of operations to roll back to.
> Operations that change the state of the evaluator would add a command to the
> stack to undo their effect. The stack would be a list of opcodes and
> Currently if you call RolapEvaluator.push() you don't need to call pop().
> The original evaluator is not modified, and when you finish using the new
> evaluator, the garbage collector will claim it eventually. After this
> change, every piece of code that calls push() will need to call
> RolapEvaluator.rollback(int savepoint). No question that this is more
> onerous and error-prone, but I think it's worth it.
> There is another potentially major benefit. Another cost that shows up
> prominently in profiling is the cost of creating a CellRequest. Right now,
> each time you request a cell value, you need to build a cell request from
> scratch. Converting evaluator context to cell request is quite involved: you
> need to figure out the key columns of each member, a bitmap of those column
> ids, and build a list of the values of those columns. Not something one
> wants to do millions of times per query. To amortize that cost, I'd like to
> store partially-created cell requests in the evaluator, which would help
> because consecutive cell requests are often very similar. But to do that,
> I'd have to add a lot more state to the evaluator, and that would make it
> even more expensive to copy an evaluator.
> With the proposed change to RolapEvaluator.push(), we would vastly reduce
> the number of times that we copy an evaluator, so we can build amortizing
> data structures that reduce the cost of requesting a cell from cache.
> Anyone have any gut instinct on whether this change would be useful?
> Mondrian mailing list
> Mondrian at pentaho.org
More information about the Mondrian