[Mondrian] Multi-threading SQL execution
Bart.Pappyn at vandewiele.com
Tue Feb 13 05:14:54 EST 2007
As Michael already mentioned, Option #2 could have troubles with
When running against a dynamic database, option #2 will make it even
more difficult to keep the cache integrity.
While I think it would be better that a central cache manager would have
multiple threads to load the results (both aggregates as hierarchies)
would run separately from the mdx query executing threads (just making
requests to the cache). There could still
be problems when multiple threads would read the same cube data using
different transactions. In some cases
you might allow each cube (when using virtual cubes) to have a single
thread, but still, in most cases, this would lead
For example : When calculating aggregate tables at night, it is possible
that one sql query depends on a newer
version (an aggregate table already updated), and another query
depending on an aggregate table yet to be updated.
While database transactions are not there yet, it would make future work
in that direction more difficult, it might be impossible.
I see a lot of movement in different directions for the moment, about
scalability, multi user access, integrity, dynamic databases,
huge results, small real time results... But I notice that some
decisions don't cover the whole idea or are not compatible with each
I think it would be useful if there was an analysis made of the whole
idea, with a roadmap supporting this vision, or at least each direction
that is not compatible with other usages of mondrian, should be made
configurable. So it would be possible - at cube level - or at
mondrian.properties level to configure how mondrian will behave.
From: mondrian-bounces at pentaho.org [mailto:mondrian-bounces at pentaho.org]
On Behalf Of Julian Hyde
Sent: dinsdag 13 februari 2007 2:38
To: 'Mondrian developer mailing list'
Subject: RE: [Mondrian] Multi-threading SQL execution
Option #1 (ROLLUP/CUBE BY) is viable and useful. If there are
differences between DBMS vendors in how they implement this support,
let's stick to the letter of the SQL:2003 standard.
To implement option #1, someone will have to get their hands dirty
understanding how cell requests are turned into SQL queries. The hardest
part is to look at a collection of cell requests and figure out whether
they can be satisfied using the same query.
Is there a chance that a ROLLUP query will compute exponentially more
results than individual GROUP BY queries? If so, we will need to do a
cost:benefit analysis before issuing a ROLLUP query.
Option #2 (parallel query execution) is also viable, and is useful if
option #1 is implemented, because certain queries, especially those on
virtual cubes, may generate queries which are not a rollup of each
Implementing option #2 it requires a modest amount of coding, mainly
introducing a multi-threaded request queue, and a significant amount of
testing for threading issues.
A third option is to support rollup within cache. If mondrian notices
that there is a request for ([Time]..[Q1], ... [Q4],
[Product].[Beer]) and also a request for ([Time].,
[Product].[Beer]) then it should execute request #1 then answer request
#2 by rolling up the results of request #1.
ALL of these options will benefit mondrian and each offers something
that the other two do not, so it's difficult to choose between them. My
instinct is that option #2 is slightly less work than option #1, but has
less benefit. Take your pick!
This email has been scanned by the Email Security System.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Mondrian