[Mondrian] Again on the non-aggregable measure

michael bienstein mbienstein at yahoo.fr
Fri Nov 23 12:05:19 EST 2007


Julian,

First off I can see where you are coming from and I see our major difference here is that I have a project where I have to fix an existing application to work quicker and my idea was to simply adapt Mondrian to fit *quickly*.  You - completely understandably - want a much more general feature.  If we can work around how to deal with this difference the rest will flow easily.

BTW, the reason I want Mondrian is not for the cool MDX stuff, it's the cache.  The technology I use currently has to go to disk each time to read the rows.  Mondrian can keep it in memory.  The other reason is that Java servlets allows one process in the OS to handle multiple requests but the current technology has one process per request and each one hits the disk each time! As a result I have an upper limit on the number of concurrent users of about 50 before the OS has trouble and the CPU is at 100% not on doing computation but on thrashing between the jobs and giving time to the file system to work out how to handle the load.  I need more users in parallel so I want to load it into memory and run it off one memory image for 250 users in parallel.

So on your ideas:
We have two different requirements that are both valid but are essentially different.  You want to have more control in the schema about how to aggregate measures which use out-of-the-ordinary aggregations and use this to leverage modern SQL to generate aggregate tables based on the fact table.  I.e. data in the aggregate tables is still *dependent* on the fact table - they just speed up performance.  I on the other hand want aggregate tables that are not derived from the fact table.  It is essentially "write-back" data.  In my example the users write this data back in a separate application and the data is rolled into the nightly ETL job.  I haven't talked about write-back because I don't need to go through Mondrian to get this, but that's essentially what it is.  This data is still independent because there is explicitly no rollup possible from a fact table to an aggregation table.  As you can see these are very different requirements.

Now it is technically possible to create aggregate tables that contain some measure columns that were calculated from the fact table and also some columns already there from the ETL job.  It's ugly though.  You would have to modify the table created by the ETL job to append the pre-calculated data to.  Adding columns to tables that were created from the ETL makes me shiver.  It's doable but you risk too much.  I can see a benefit in terms of disk space in that the level columns don't have to be duplicated in separate tables, but if we are spending time optimizing for disk when disk drives are cheap (and just for write-back data which can't be huge because it's created only by employees, not external systems), then I think it's a bit of wasted effort especially since the cheaper memory gets the more Mondrian is probably going to just read the tables once into memory and the disk space won't count.

If you want to keep the schema metadata simple by putting non-aggregable measures into the same cube as the normal measures then we should allow for measures to define different aggregate tables at each level of aggregation.  E.g. "Volume" at Region*Month isin table "normalagg_region_month" while "Budget" at Region*Month is in table "budget_region_month".  That's doable in the XML without making it too terrible, but after that, is it worth it?  I'm not sure if it would be worth it for you.  I know for my project it won't be though.  I therefore (re-)propose just having a whole RolapStar for pre-prepared write-back cell data.  I am not considering more interesting write-back cases such as "Budget for January for Stationary is 1000 so the implied budget for pencils on 4 January is 1000/31/#expected ratio of pencils cost to total stationary".  These sorts of more complicated implications that roll-down rather than up are completely out of my scope.

Hoping you agree to keep the two concepts separate,

Michael




      _____________________________________________________________________________ 
Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20071123/29215576/attachment.html 


More information about the Mondrian mailing list