[Mondrian] Multi-threading SQL execution

Pappyn Bart Bart.Pappyn at vandewiele.com
Wed Feb 14 05:47:50 EST 2007


In fact, my project has in fact both cases, data changing at night and
data changing all the time.  

On the first - which I call archived data - there are aggregate tables
and cache enabled.

On the second - which I call hot data, I turned off aggregate tables and
cache. The data is cached
so frequently that maintaining cache or aggregate tables makes no sense.

Both are identical in layout and share all dimensions.  Both are merged
in one virtual cube.

But even if the system is only changing at a certain time (at night),
the period the data is changing, mondrian results are not reliable. So
atomic cache population becomes important.

Since the exception to the 95% rule is handled by cubes not maintaining
cache, then the solution falls apart in two tasks :

A) Cubes not maintaining cache should use one atomic operation for all
sql queries launched in that thread.  There can be running
   other queries in parallel using a different transaction, since that
cache is thread local (since my changes recently).

B) Cubes maintaining cache.  I think Julian's blog article tells the
direction here.  The only thing I need to add there, is
the fact I also like a plugin to be able to flush the cache in the same
way.  

-> A) is easy to implement since there is no possibility to corrupt
other sessions.

-> B) Can be more easy if mondrian knows when data has changed.  It is
still up to the designer of the database and the software filling the
database to make sure his changes to the database are atomic.  My plugin
I recently have added to mondrian can be useful.  The way I implemented
it, I created a table in the database, maintaining a list of tables and
the last time they have changed.  The modifications table is changed in
the same atomic operation of the actual table change.  Mondrian asks the
plugin if a table has changed.  If mondrian knows when the tables are
changed and - requirement - the tables are changed in an atomic
operation, than mondrian does not need long lasting transactions or
read-consistent views of the database.  It just needs to implement the
same behavior as A).  Mondrian asks each time before a query is started
what data has changed, it flushes cache, load agg thread local in an
atomic operation and when finished, it should check in results in global
cache (when other sessions are not using the cache any more).
Everything is already in place, only the transactions should be there.

But when the new cache control feature is used, then flushing of cache
is not in sync with mondrian queries, nor with the changing database,
which make things much more difficult.  I think it might be useful if
both principles are merged into one, using cache control in the plugin.

Telling mondrian, "the database is changing", requires the database
process being able to access the jvm where mondrian is running, which is
not always the case, and still, it would be not in sync. When the plugin
is used, it can be an in-between.  The process can communication using
special tables in the database, with mondrian.  Or the plugin can
communicate in other means, soap, whatever with the external process
responsible for modifying the database.  The plugin should be able to
use the same db transaction as the mdx query session.

I think forcing the plugin as a default way of implementing real time
olap is not a huge requirement, since the usage is already very
specific.  Most people will be satisfied using the cache control
implementation as it is now, since data changes at night and the system
is not very likely being used at that time.  Most users just want to
keep their application server up and running, and want means to flush
the cache.  This was already possible, but the cache control feature
reduces the cost of the operation.

Bart
-----Original Message-----
From: mondrian-bounces at pentaho.org [mailto:mondrian-bounces at pentaho.org]
On Behalf Of Julian Hyde
Sent: woensdag 14 februari 2007 3:17
To: mondrian at pentaho.org
Subject: RE: [Mondrian] Multi-threading SQL execution


> > I want to challenge Michael's assertion that "almost no
> JDBC driver will
> > allow parallel queries". It may be true, but if so, it will
> cost us a
> > lot of effort to do things another way, so I want to see
> some evidence. 
> > Michael, can you give an example of a JDBC driver which
> cannot execute
> > two statements in the same connection (and transaction context)?
> 
> http://www.mysql.com/news-and-events/newsletter/2003-04/a0000000154.ht
> ml
> 
>
http://download-east.oracle.com/docs/cd/A91202_01/901_doc/java.901/a9021
1/ti
ps.htm


Thanks John. That's good enough evidence.

Let me also challenge the assumption that in the typical mondrian
installation, the contents of the DBMS are dynamically changing. My
impression is that in 95% of cases, data is loaded nightly or weekly.
There are clearly exceptions - as Michael and Bart have both explained -
but we should not treat the exceptions as the norm.

Option #2 (parallel query execution) is viable and valuable to the
majority of the mondrian community, and we shouldn't be too hasty in
ruling it out.
Option #1 (ROLLUP) has limitations in its applicability too, because not
all databases support that syntax.

Julian

_______________________________________________
Mondrian mailing list
Mondrian at pentaho.org
http://lists.pentaho.org/mailman/listinfo/mondrian

______________________________________________________________________
This email has been scanned by the Email Security System.
______________________________________________________________________



More information about the Mondrian mailing list