[Mondrian] Multithreading etc

Laurent Valdes valderama at gmail.com
Fri Mar 9 07:11:35 EST 2007


Hi,

as far as I'm concerned, I think it would be a very good idea to use threads
more extensively.
Would it be possible to provide a schematic of Mondrian cache architecture ?

Considering Tasks and JDBC, is there some threads suscription code to allow
them to gain datas from a pool of objects ?

Please forgive me, as I'm a newbie with Mondrian.
Have a good day !

Laurent.


2007/3/9, michael bienstein <mbienstein at yahoo.fr>:
>
> I sent this to the list but it gets bounced because I attached the code in
> a zip file.  How do I send code through without checking it in because it is
> still orthogonal to the codebase?
>
> Michael
> ----
>
> Well, I have code that works for multi-threading infrastructure so I would
> like to know if it is worth continuing with this or not.
>
> As for ROLLUP/CUBE my thoughts are:
> 1) Either we keep the codebase simple by sticking to a standard (SQL2003)
> even if this standard is not yet implemented widely and certain databases
> have better special features than others, or we allow a per-database SQL
> generation system.  The argument for the second makes sense only if the
> developer resources to write and maintain each dialect comes from the
> database vendor or their community.  Mondrian is probably at a stage that
> such discussions can be undertaken with the database vendors.
> 2) Architecturally this implies loading multiple Aggregations from one SQL
> query.  That requires a rethink of the way the cell cache loading is done
> because at the moment an Aggregation is loaded one at a time and in a
> synchronized block on the Aggregation.  Similar concerns have to be dealt
> with for in-memory rollups.  I think that synchronized is too forceful.  We
> need something more like a Lock from java.util.concurrent so we can do
> tryLock().  Look at the TxLock idea I have in the code I'm attaching.
>
> As for multi-threading:
> I have only written most of the base infrastructure, not the cell
> loading.  To integrate would require a significant amount of work in
> Mondrian's code to pass all interaction with Mondrian through
> TxSystem.runWithTx().
>
> Basic concerns are:
> 1)      Threads should be able to share data related to the request across
> the threads.
> 2)      A Thread should be loaned to a request and returned in a way that
> is well-nigh fail-safe (i.e. the thread shouldn't keep running of the
> request fails in some way).
> 3)      We should be able in a parameter of some sort decide to NOT use
> threads at all.
> 4)      The number of threads should be configurable.
> 5)      There should be an independence from the rest of the code base.
> 6)      We should be able to make use of custom thread pools or use
> managed thread pools from the application server.
> 7)      Then there is a relatively minor issue with read-consistency for
> near-real-time data that turns out to be a real head-ache.  This can be
> done by either: using the transaction semantics of the underlying data store
> *or* modifying all SQL requests and cache interactions with a timestamp
> and/or transaction id of some sort.  E.g. when an MDX requests begins it
> asks the underlying data store for the id of the last completed transaction
> that modified data and keeps this in a request-scope available to all
> threads.  Then it appends "changedTxId <= ${lastTxIdWhenFirstEntered}" to
> each WHERE clause.  If however we use the underlying data store's
> transactions then we must keep open the JDBC Connection for the duration of
> the request reusing it on the same thread for each interaction with that
> data store.
> Now, I think that the best way to take advantage of multiple threads in
> the storage system is NOT launching multiple SQLs on the same star schema
> but different aggregations but rather to use *partitioning* of data.  That
> is to segment the cell data (and maybe dimension data) based on values of
> certain columns.  For example year<2007 and year=2007 in  two different
> partitions.  This can be introduced slowly by simply making a RolapStar
> one Partition for the moment.  Having said that aggregation tables are
> also a type of Partition and hitting two of them at once should be quite
> easy.
> So the design I am introducing has the following features:
> 1) A scope for "request" or "interaction" that is larger than the Thread
> that begins it.  Since this is similar to a transaction I've called it a
> Tx.  See the mondrian.tx package.  Each sub-system in Mondrian can enlist
> a representation of itself in the Tx.
> 2) Break up the different tasks performed into Task objects that can be
> run potentially in parallel.  Allow a set of Tasks to be tied to the same
> Thread so that the same JDBC Connection can be used for all of them for
> read-consistency and cleaned up at the end of the Tx.  This is done
> declaratively so the implementation can be changed easily.  The
> implementation can also ensure that the J2EE context is passed onto separate
> threads (JNDI, context class loader etc).
> 3) A system of fail-quick locks at the Tx scope rather than just Thread
> scope.
>
> If this is worth persuing as a design for the next version then good.  If
> not I'll stop now.
>
> Michael
>
> ------------------------------
> Découvrez une nouvelle façon d'obtenir des réponses à toutes vos questions
> ! Profitez des connaissances, des opinions et des expériences des
> internautes sur Yahoo! Questions/Réponses<http://fr.rd.yahoo.com/evt=42054/*http://fr.answers.yahoo.com>
> .
>
> _______________________________________________
> Mondrian mailing list
> Mondrian at pentaho.org
> http://lists.pentaho.org/mailman/listinfo/mondrian
>
>


-- 
«À attendre que l'herbe pousse, le boeuf meurt de faim»
«Le boeuf» @<http://www.le-valdo.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20070309/80bc5725/attachment.html 


More information about the Mondrian mailing list