[Mondrian] Multithreading etc

michael bienstein mbienstein at yahoo.fr
Fri Mar 9 05:36:00 EST 2007


I sent this to the list but it gets bounced because I attached the code in a zip file.  How do I send code through without checking it in because it is still orthogonal to the codebase?

Michael
----

Well, I have code that works for multi-threading infrastructure so I
would like to know if it is worth continuing with this or not.

As for ROLLUP/CUBE my thoughts are:
1)
Either we keep the codebase simple by sticking to a standard (SQL2003)
even if this standard is not yet implemented widely and certain
databases have better special features than others, or we allow a
per-database SQL generation system.  The argument for the second makes
sense only if the developer resources to write and maintain each
dialect comes from the database vendor or their community.  Mondrian is
probably at a stage that such discussions can be undertaken with the
database vendors.
2) Architecturally this implies loading multiple
Aggregations from one SQL query.  That requires a rethink of the way
the cell cache loading is done because at the moment an Aggregation is
loaded one at a time and in a synchronized block on the Aggregation. 
Similar concerns have to be dealt with for in-memory rollups.  I think
that synchronized is too forceful.  We need something more like a Lock
from java.util.concurrent so we can do tryLock().  Look at the TxLock
idea I have in the code I'm attaching.

As for multi-threading:
I
have only written most of the base infrastructure, not the cell
loading.  To integrate would require a significant amount of work in
Mondrian's code to pass all interaction with Mondrian through
TxSystem.runWithTx().  

Basic concerns are:
  1)      Threads should be able to share data
 related to the request across the threads.
  2)      A
Thread should be loaned to a request and returned in a way that is
well-nigh fail-safe (i.e. the thread shouldn’t keep running of the
request fails in some way).
  3)      We should be able in a parameter of some sort decide to NOT use threads at all.
  4)      The number of threads should be configurable.
  5)      There should be an independence from the rest of the code base.
  6)      We should be able to make use of custom thread pools or use managed thread pools from the application server.
  7)      Then there is a relatively minor issue with read-consistency for near-real-time data that turns out to be a real head-ache.  This can be done by either: using the transaction semantics of the underlying data store or modifying all SQL requests and cache interactions with a timestamp and/or transaction id of some sort.  E.g.
when an MDX requests begins it asks the underlying data store for the
id of the last completed transaction that modified data and keeps this
in a request-scope available to all threads.  Then it appends “changedTxId <= ${lastTxIdWhenFirstEntered}” to each WHERE clause.  If however we use the underlying data store’s transactions then we must keep open the JDBC Connection for
 the duration of the request reusing it on the same thread for each interaction with that data store.
    Now,
I think that the best way to take advantage of multiple threads in the
storage system is NOT launching multiple SQLs on the same star schema
but different aggregations but rather to use partitioning of data.  That is to segment the cell data (and maybe dimension data) based on values of certain columns.  For example year<2007 and year=2007 in  two different partitions.  This can be introduced slowly by simply making a RolapStar one Partition for the moment.  Having said that aggregation tables are also a type of Partition and hitting two of them at once should be quite easy.
  So the design I am introducing has the following features:
1)
A scope for "request" or "interaction" that is larger than the Thread
that begins it.  Since this is similar to a transaction I've called it
a Tx.  See the mondrian.tx package.  Each sub-system in Mondrian can
enlist a representation of itself in the Tx.
2)
Break up the different tasks performed into Task objects that can be
run potentially in parallel.  Allow a set of Tasks to be tied to the
same Thread so that the same JDBC Connection can be used for all of
them for read-consistency and cleaned up at the end of the Tx.  This is
done declaratively so the implementation can be changed easily.  The
implementation can also ensure that the J2EE context is passed onto
separate threads (JNDI, context class loader etc).
3) A system of fail-quick locks at the Tx scope rather than just Thread scope.  

If this is worth persuing as a design for the next version then good.  If not I'll stop now.

Michael



	

	
		
___________________________________________________________________________ 
Découvrez une nouvelle façon d'obtenir des réponses à toutes vos questions ! 
Profitez des connaissances, des opinions et des expériences des internautes sur Yahoo! Questions/Réponses 
http://fr.answers.yahoo.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20070309/423e1cec/attachment.html 


More information about the Mondrian mailing list