[Mondrian] Multithreading etc

Wed Mar 21 08:41:48 EDT 2007

Would it be possible to create another SVN branch for the Mondrian
multithreaded version ?
Afterwards, it would be possible to merge it with the main branch ;)

Hava a good day,

Laurent

2007/3/21, Pappyn Bart <Bart.Pappyn at vandewiele.com>:
>
>  Julian,
>
> The last few weeks, I have been busy thinking about ways to contribute a
> test suite that works as an insurance policy for me.  Since I can only
> contribute a small part of my time to mondrian, I cannot watch each change
> to ensure my application is still working.  All extra suggestions you have
> made are definitely very helpful.
>
> 1) Test suite :
>
> My application now has a schema of more than 5000 lines.  It uses many
> (almost all) features that are present in mondrian.  And it combines them,
> in a way I am not sure all of them are covered by existing regression
> tests.  It does also things that are not covered by the foodmart test case
> (like combining two cubes of identical layout in one virtual cube).
>
> I think I will try to do three things :
>
>    - Try to check against the foodmart database whether the features I
>    use are supported by a test, if not, add one to the standard regression test
>    suite.  This is a hard one, since most features are already tested, but not
>    in every combination with other features.  I noticed in the past, most thing
>    that silently fail in my application (without triggering the regression
>    tests) are mostly due to combination of many facts (virtual cubes,
>    properties, user defined functions, cache turned on/off, complex format
>    expressions,...)
>    - Create a regression test suite that runs here, using my schema and
>    my database.  This in combination with continuous integration could alarm me
>    if anything is failing.
>    - Try to contribute a database and schema to test against a dynamic
>    database.  While the test will never be able to simulate a real dynamic
>    environment, I will try to change the database in between two queries.  I
>    will try out the change listener plug-in, in order to see if things are
>    flushing and results are correct.  Maybe a good place to test your new cache
>    control part.
>
>    I think I would implement it using an database that is copied before
>    the test (so the test starts with the same view each time) and doing jdbc
>    calls to fill the database.  If you have a better idea, please tell me.  I
>    am not sure what kind of database I should choose (access, ...).  I noticed
>    that the test suite contains something to load a DB from csv or xml, but I
>    am not sure how this is working and how I can modify the data on the fly?
>
> 2) QA partner
>
> Sounds like a good idea
>
> 3) Continuous integration
>
> Looking forward to see this, I hope the setup environment for e.g.
> CruiseControl is made available, so I can use it to setup up an environment
> to test my own cube and regression tests.
>
> 4) Dynamic database
>
> Indeed, this is very complex to test.  But one needs to start somewhere,
> so I will try to contribute a test environment (see 1)) that will behave the
> same way as my application does.
>
> And yes, there are many pitfalls, but due to a specific setup of the
> database most things are working for me, with some known issues that are
> acceptable in my kind of application.
>
> Most things I did for 2.3 are about making it possible to :
>
> * Let cubes not maintaining cache, not mess with the global cache.
>
> * Cubes loading cache, will not interfere with concurrent threads (only
> check into global cache when all other threads are done).
>
> * Being able to flush both aggregate and member cache using a plug-in.
>
> Bart
>
>  ------------------------------
> *From:* mondrian-bounces at pentaho.org [mailto:mondrian-bounces at pentaho.org]
> *On Behalf Of *Julian Hyde
> *Sent:* vrijdag 16 maart 2007 18:04
> *To:* 'Mondrian developer mailing list'
> *Subject:* RE: [Mondrian] Multithreading etc
>
>  If changes to mondrian are breaking your application, I sympathize. How
> can we prevent that from happening? Unless we restrict ourselves to trivial
> enhancements, the we obviously need to test the new functionality against
> existing apps, or at least tests which exercise existing apps' requirements.
>
> Ideally, these tests would be in the standard regression suite. But since
> some tests are too complicated to be in the standard suite, to testing which
> cannot be done nightly at least has to be done once per release.
>
> We already have a process in place for much of this. For example:
>
>    - Developers ensure that code changes don't break the regression
>    suite.
>    - I run the regression suite nightly, in a wide set of
>    configurations, and let the developers know next day if things break.
>    - All code changes - bug fixes, enhancements and ad hoc feature -
>    must be accompanied by a regression test which exercises the change. (That
>    means it should fail if the change is not present.)
>
> The extra things we need to do:
>
>    - If you are using mondrian in your apps, contribute test suites
>    which exercise your app's functionality. LucidEra have already done thist
>    (ClearTestSuite) and Thomson-Medstat are working on it. Sure it's a lot of
>    work, but it's less work than taking a new release where things have stopped
>    working. It's an insurance policy.
>    - Would it help if we invented the notion of 'QA partner' companies?
>    We could run through the test suite of each QA partner before each release,
>    as a pre-condition for making that release.
>    - Write better tests for new features. Complex features require
>    complex tests.
>    - Set up continuous integration (E.g. CruiseControl) to ensure that
>    the code always builds and compiles. Thiyagu is already working on this.
>
> Bart, On point #3: The feature you are interested in, for working on top
> of dynamic databases, is VERY complex to test. We have spent a lot of time
> discussing how to implement this feature, but very little time designing a
> testing infrastructure. It should be little surprise that the feature is
> fragile at this point.
>
> Julian
>
>
>  ------------------------------
> *From:* mondrian-bounces at pentaho.org [mailto:mondrian-bounces at pentaho.org]
> *On Behalf Of *Pappyn Bart
> *Sent:* Tuesday, March 13, 2007 1:34 AM
> *To:* Mondrian developer mailing list
> *Subject:* RE: [Mondrian] Multithreading etc
>
>  Hi Michael,
>
>
>
> I don't see any problems for you to make changes to mondrian.  But I have
> some concerns:
>
>
>
> I think the changes you are about to make are quite huge and will have an
> impact of how mondrian will behave.  Since this is the first source
> contribution you are about to make, I urge you not to check anything into
> perforce before it is actually working and passing all regression tests.
>
>
>
> I think most developers of mondrian have ongoing projects that are using
> mondrian, I think this is
>
> becoming more and more an important issue.
>
>
>
> For me: it must be able to flush aggregates and member cache using the
> plug-in and cubes not maintaining cache should be able to load their own
> data, without messing with global cache.
>
>
>
> And since a dynamic database cannot easily be simulated in a regression
> test, I think if you are serious about tackling the read-consistency for
> near-real-time data, you need a realistic (dynamic) database to test
> against.  And the database must be large enough to be able to see realistic
> performance.  It is also advised to test with virtual cubes, cubes
> maintaining cache (with aggregate tables) in combination with cubes not
> maintaining cache (without aggregate tables), shared dimensions and so on�
>
>
>
> I released my project 4 weeks ago, not even using the latest version of
> perforce, since at a given point mondrian-head was completely broken for
> me.  While I know software always has some bugs that need to be patched,
> things that are not tested and are breaking mondrian should not be checked
> in.  All too often I had to sync with perforce to solve a bug and this ended
> up in a nightmare, spending most of my time finding out what change was
> causing mondrian to break.
>
>
>
> When mondrian 2.3 will be released, it is most likely that there will
> be some 2.3.x version containing some patches.  I think it must be
> possible to make those patches without having to drag new huge features
> along.
>
>
>
> Bart
>
>  ------------------------------
>  *From:* mondrian-bounces at pentaho.org [mailto:mondrian-bounces at pentaho.org]
> *On Behalf Of *michael bienstein
> *Sent:* vrijdag 9 maart 2007 11:36
> *To:* Mondrian developer mailing list
> *Subject:* [Mondrian] Multithreading etc
>
>  I sent this to the list but it gets bounced because I attached the code
> in a zip file.  How do I send code through without checking it in because it
> is still orthogonal to the codebase?
>
> Michael
> ----
>
> Well, I have code that works for multi-threading infrastructure so I would
> like to know if it is worth continuing with this or not.
>
> As for ROLLUP/CUBE my thoughts are:
> 1) Either we keep the codebase simple by sticking to a standard (SQL2003)
> even if this standard is not yet implemented widely and certain databases
> have better special features than others, or we allow a per-database SQL
> generation system.  The argument for the second makes sense only if the
> developer resources to write and maintain each dialect comes from the
> database vendor or their community.  Mondrian is probably at a stage that
> such discussions can be undertaken with the database vendors.
> 2) Architecturally this implies loading multiple Aggregations from one SQL
> query.  That requires a rethink of the way the cell cache loading is done
> because at the moment an Aggregation is loaded one at a time and in a
> synchronized block on the Aggregation.  Similar concerns have to be dealt
> with for in-memory rollups.  I think that synchronized is too forceful.  We
> need something more like a Lock from java.util.concurrent so we can do
> tryLock().  Look at the TxLock idea I have in the code I'm attaching.
>
> As for multi-threading:
> I have only written most of the base infrastructure, not the cell
> loading.  To integrate would require a significant amount of work in
> Mondrian's code to pass all interaction with Mondrian through
> TxSystem.runWithTx().
>
> Basic concerns are:
> 1)      Threads should be able to share data related to the request across
> the threads.
> 2)      A Thread should be loaned to a request and returned in a way that
> is well-nigh fail-safe (i.e. the thread shouldn�t keep running of the
> request fails in some way).
> 3)      We should be able in a parameter of some sort decide to NOT use
> threads at all.
> 4)      The number of threads should be configurable.
> 5)      There should be an independence from the rest of the code base.
> 6)      We should be able to make use of custom thread pools or use
> managed thread pools from the application server.
> 7)      Then there is a relatively minor issue with read-consistency for
> near-real-time data that turns out to be a real head-ache.  This can be
> done by either: using the transaction semantics of the underlying data store
> *or* modifying all SQL requests and cache interactions with a timestamp
> and/or transaction id of some sort.  E.g. when an MDX requests begins it
> asks the underlying data store for the id of the last completed transaction
> that modified data and keeps this in a request-scope available to all
> threads.  Then it appends �changedTxId <= ${lastTxIdWhenFirstEntered}� to
> each WHERE clause.  If however we use the underlying data store�s
> transactions then we must keep open the JDBC Connection for the duration of
> the request reusing it on the same thread for each interaction with that
> data store.
> Now, I think that the best way to take advantage of multiple threads in
> the storage system is NOT launching multiple SQLs on the same star schema
> but different aggregations but rather to use *partitioning* of data.  That
> is to segment the cell data (and maybe dimension data) based on values of
> certain columns.  For example year<2007 and year=2007 in  two different
> partitions.  This can be introduced slowly by simply making a RolapStar
> one Partition for the moment.  Having said that aggregation tables are
> also a type of Partition and hitting two of them at once should be quite
> easy.
> So the design I am introducing has the following features:
> 1) A scope for "request" or "interaction" that is larger than the Thread
> that begins it.  Since this is similar to a transaction I've called it a
> Tx.  See the mondrian.tx package.  Each sub-system in Mondrian can enlist
> a representation of itself in the Tx.
> 2) Break up the different tasks performed into Task objects that can be
> run potentially in parallel.  Allow a set of Tasks to be tied to the same
> Thread so that the same JDBC Connection can be used for all of them for
> read-consistency and cleaned up at the end of the Tx.  This is done
> declaratively so the implementation can be changed easily.  The
> implementation can also ensure that the J2EE context is passed onto separate
> threads (JNDI, context class loader etc).
> 3) A system of fail-quick locks at the Tx scope rather than just Thread
> scope.
>
> If this is worth persuing as a design for the next version then good.  If
> not I'll stop now.
>
> Michael
>
> ------------------------------
> D�couvrez une nouvelle fa�on d'obtenir des r�ponses � toutes vos questions
> ! Profitez des connaissances, des opinions et des exp�riences des
> internautes sur Yahoo! Questions/R�ponses<http://fr.rd.yahoo.com/evt=42054/*http://fr.answers.yahoo.com>.
>
> ______________________________________________________________________
> This email has been scanned by the Email Security System.
> ______________________________________________________________________
>
>
> ______________________________________________________________________
> This email has been scanned by the Email Security System.
> ______________________________________________________________________
>
> _______________________________________________
> Mondrian mailing list
> Mondrian at pentaho.org
> http://lists.pentaho.org/mailman/listinfo/mondrian
>
>

-- 
We are drowning in information, but starved for knowledge.
«Germain» @<http://www.le-valdo.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20070321/01ebac8a/attachment.html