jhyde at pentaho.com
Sun Jan 16 00:26:48 EST 2011
I've noticed that RolapEvaluator.push() occurs a lot in
computation-intensive queries, and it's not a cheap operation. Each
evaluator contains 16 members, 5 of which are lists or arrays that need to
So, I'm thinking of changing the evaluator API. When you want to save a
context, you would no longer create a copy of the evaluator. Instead you
would create a savepoint that would allow the evaluator to roll back to its
RolapEvaluator.push() would no longer return a RolapEvaluator; it would
return an int, the index on the stack of operations to roll back to.
Operations that change the state of the evaluator would add a command to the
stack to undo their effect. The stack would be a list of opcodes and
Currently if you call RolapEvaluator.push() you don't need to call pop().
The original evaluator is not modified, and when you finish using the new
evaluator, the garbage collector will claim it eventually. After this
change, every piece of code that calls push() will need to call
RolapEvaluator.rollback(int savepoint). No question that this is more
onerous and error-prone, but I think it's worth it.
There is another potentially major benefit. Another cost that shows up
prominently in profiling is the cost of creating a CellRequest. Right now,
each time you request a cell value, you need to build a cell request from
scratch. Converting evaluator context to cell request is quite involved: you
need to figure out the key columns of each member, a bitmap of those column
ids, and build a list of the values of those columns. Not something one
wants to do millions of times per query. To amortize that cost, I'd like to
store partially-created cell requests in the evaluator, which would help
because consecutive cell requests are often very similar. But to do that,
I'd have to add a lot more state to the evaluator, and that would make it
even more expensive to copy an evaluator.
With the proposed change to RolapEvaluator.push(), we would vastly reduce
the number of times that we copy an evaluator, so we can build amortizing
data structures that reduce the cost of requesting a cell from cache.
Anyone have any gut instinct on whether this change would be useful?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Mondrian