[Mondrian] Multi-threading SQL execution

Matt Campbell mkambol at gmail.com
Tue Feb 13 10:47:10 EST 2007


I agree, Bart, if we did implement Option #2 it would have to be
configurable.

I hadn't known until I read the thread Michael posted that using
java.lang.Thread is not a good idea in a servlet.  I'm still a bit fuzzy on
why that is, but it seems clear that Option #2 will require some serious
thought to do correctly.

For option #1 Julian asked if we might inadvertantly pull back much more
data than we need, and suggested we somehow do a cost/benefit analysis
before deciding to use ROLLUP or CUBE.  I think that's a very good
point--blindly doing a ROLLUP or CUBE in all cases could result in much
worse performance, particularly when a query involves several dimensions.



On 2/13/07, Pappyn Bart <Bart.Pappyn at vandewiele.com> wrote:
>
>  As Michael already mentioned, Option #2 could have troubles with database
> transactions.
> When running against a dynamic database, option #2 will make it even more
> difficult to keep the cache integrity.
>
> While I think it would be better that a central cache manager would have
> multiple threads to load the results (both aggregates as hierarchies) and
> would run separately from the mdx query executing threads (just making
> requests to the cache).  There could still
> be problems when multiple threads would read the same cube data using
> different transactions.  In some cases
> you might allow each cube (when using virtual cubes) to have a single
> thread, but still, in most cases, this would lead
> to problems.
>
> For example : When calculating aggregate tables at night, it is possible
> that one sql query depends on a newer
> version (an aggregate table already updated), and another query depending
> on an aggregate table yet to be updated.
>
> While database transactions are not there yet, it would make future work
> in that direction more difficult, it might be impossible.
>
> I see a lot of movement in different directions for the moment, about
> scalability, multi user access, integrity, dynamic databases,
> huge results, small real time results...  But I notice that some decisions
> don't cover the whole idea or are not compatible with each other.
>
> I think it would be useful if there was an analysis made of the whole
> idea, with a roadmap supporting this vision, or at least each direction
> that is not compatible with other usages of mondrian, should be made
> configurable.  So it would be possible - at cube level - or at
> mondrian.properties level to configure how mondrian will behave.
>
> Bart
>  ------------------------------
> *From:* mondrian-bounces at pentaho.org [mailto:mondrian-bounces at pentaho.org]
> *On Behalf Of *Julian Hyde
> *Sent:* dinsdag 13 februari 2007 2:38
> *To:* 'Mondrian developer mailing list'
> *Subject:* RE: [Mondrian] Multi-threading SQL execution
>
>  *Option #1 (ROLLUP/CUBE BY) *is viable and useful. If there are
> differences between DBMS vendors in how they implement this support, let's
> stick to the letter of the SQL:2003 standard.
>
> To implement option #1, someone will have to get their hands dirty
> understanding how cell requests are turned into SQL queries. The hardest
> part is to look at a collection of cell requests and figure out whether they
> can be satisfied using the same query.
>
> Is there a chance that a ROLLUP query will compute exponentially more
> results than individual GROUP BY queries? If so, we will need to do a
> cost:benefit analysis before issuing a ROLLUP query.
>
> *Option #2 (parallel query execution) *is also viable, and is useful if
> option #1 is implemented, because certain queries, especially those on
> virtual cubes, may generate queries which are not a rollup of each other.
>
> Implementing option #2 it requires a modest amount of coding, mainly
> introducing a multi-threaded request queue, and a significant amount of
> testing for threading issues.
>
> A *third option is to support rollup within cache*. If mondrian notices
> that there is a request for ([Time].[1997].[Q1], ... [Q4], [Product].[Beer])
> and also a request for ([Time].[1997], [Product].[Beer]) then it should
> execute request #1 then answer request #2 by rolling up the results of
> request #1.
>
> ALL of these options will benefit mondrian and each offers something that
> the other two do not, so it's difficult to choose between them. My instinct
> is that option #2 is slightly less work than option #1, but has less
> benefit. Take your pick!
>
> Julian
>
> ______________________________________________________________________
> This email has been scanned by the Email Security System.
> ______________________________________________________________________
>
> _______________________________________________
> Mondrian mailing list
> Mondrian at pentaho.org
> http://lists.pentaho.org/mailman/listinfo/mondrian
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20070213/6338a631/attachment.html 


More information about the Mondrian mailing list