[Mondrian] Changes to Mondrian's caching architecture

Julian Hyde jhyde at pentaho.com
Wed Jan 18 15:38:06 EST 2012


On Jan 18, 2012, at 8:43 AM, Joe Barnett wrote:

> -Have you measured the performance impact?  I'd imagine there's a
> (hopefully) negligible single-threaded overhead cost to the thread
> communication and better overall performance/throughput under
> concurrent load, but would be good to quantify that.  (I'll certainly
> do the same when I have time to try out the new code)

I haven't done extensive performance testing. Performance is about the same on the regression suite; I had expected a degradation there, because there are lots of queries doing small amounts of work. No one minds if a query that used to (say) take 8 milliseconds now takes 10 milliseconds. I'd expect to see significant improvement in the throughput (if not the response time) of a moderately to heavily utilized multi-core system.

> -What is the lifecycle of the new "cache manager" and "worker"
> threads, and what objects are they tied to?  (Is the "cache manager"
> thread per connection/star/cube/global?)

Each mondrian server instance (of which there is USUALLY one instance per jvm, unless you play games) has one cache manager (i.e. an instance of SegmentCacheManager), and each of these has a pool of SQL worker threads and a pool of threads for executing cache requests. Both of these are managed by executor objects within SegmentCacheManager.

I recall that you have advocated having a pool for each query, shared between all tasks that need to be done in that query; whereas I would have a pool for each kind of task. I am open to factory-izing pools/thread-factories/executors. It will need to be simple to configure, especially for 99% who use it out-of-the-box.

If it can be done using judicious choice of executors, I would also like to allow mondrian to run in "embedded" mode, where there are no worker threads or async calls, and everything gets done by the client thread.

> Also, how do these interact
> with/supersede the RolapResultShepherd implementation that Luc added
> in the last release?

It doesn't supersede RolapResultShepherd. That hand-off between threads is necessary to allow execution to be cancelled without killing or cancelling threads.

>  Relatedly, are you still opposed to adding any
> hooks to set context for the connection around these thread/callable
> implementations, or can you think of a way that you might find
> something similar acceptable? (FWIW, my previous diff was actually
> wrong, and it was the callable created in RolapConnection#execute()
> that needed the wrapping, not the thread/pool creation itself)

I agree we need some way to share context.

Another use case I have heard: Others have asked for log4j MDC (mapped diagnostic context: http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/MDC.html ) to be kept up to date as each thread switches to a different task.

A lot of the context can be stashed in an Execution. The same Execution will be current as control moves from one thread to another. So, hooks wouldn't be needed to move your context around. We could create a property->value map in each Execution to store that context.

If you still need hooks, please propose an SPI for them, and how they would be configured. It needs to be efficient; I want to keep the cost of switching threads as low as possible.

> -It sounds like these changes apply only to the cell/segment cache,
> and not the member caches?  Are there any plans on implementing
> similar changes (both the Actor model cache, and the pluggable caching
> architecture) for the member caches (which I think are just in
> SmartMemberReader/MemberCacheHelper, if I remember right)?

It would make sense to do it, but it's not high on the list of priorities. As you guessed, we'd make the member cache pluggable and move to an actor model for the member cache manager at the same time.

In my opinion, the case for making the member cache pluggable is not as strong as the case for making the cell cache pluggable. Getting members from a DBMS is not as expensive as getting cells.

Julian



More information about the Mondrian mailing list