[Mondrian] Lots of Dimensions & Lots of Rows

Paul Ramsey pramsey at refractions.net
Wed Feb 7 12:57:08 EST 2007


Matt, thanks for the anecdote, that is amazing (and great news)! What 
kind of hardware do you have backing your system?

Julian, thanks for the detailed response, it will be interesting trying 
to fit our problem neatly into the OLAP space, but I see lots of 
advantages to doing so in terms of pre-existing functionality.  On the 
spatial side, by the time stuff hits the fact table, it is essentially 
non-spatial. All the spatial relationships (cell->region, 
cell->watershed) are expressed in the dimensions at that point.

Yours,

Paul

Matt Campbell wrote:
> To add some anecdotal info-- we've successfully run Mondrian with nearly 
> 700 dimensions using fact tables with 100s of millions of rows.  We've 
> noticed no performance degradation or instability from the number of 
> dimensions.
> 
> On 2/6/07, *Julian Hyde* <julianhyde at speakeasy.net 
> <mailto:julianhyde at speakeasy.net>> wrote:
> 
> 
> 
>      > My concern is that the number of dimensions on top of the number of
>      > facts will make the whole thing unworkable, since people are going to
>      > expect to query the thing through a *lot* of different paths.  As an
>      > added bonus, the deployment environment will be PostgreSQL.
> 
>     The number of dimensions is not a huge problem per se. If mondrian
>     operating
>     in a ROLAP mode (that is, generating a SQL query for each set of
>     cells) then
>     each dimension is a POTENTIAL thing to slice on but it's only the
>     dimensions
>     ACTUALLY sliced on which affect the performance of the SQL.
> 
>     If you create aggregate tables -- and you probably will need to, for
>     that
>     data volume -- a large number of dimensions becomes more of a problem -
>     because you will need a correspondingly large number of aggregate
>     tables.
> 
>     There may be some tricks you can use when designing your aggregate
>     tables.
>     If your DBMS supports special indexes for GIS (just the kind of
>     thing that
>     PostgreSQL does very well) you should try to design the agg tables
>     so that
>     those indexes get used.
> 
>     Also, if a lot of your queries are localized ( e.g. queries for data
>     within
>     10 km of a given town) index your fact table so that this data set
>     can be
>     readily retrieved.
> 
>     Databases -- mondrian included -- don't handle ranges as well as
>     they handle
>     discrete values. So, splitting spatial coordinates into the integral
>     and
>     fractional part (e.g. 34.56 N, 123.45 W becomes lat_whole=34
>     lat_fraction=.56 long_whole=-123 long_fraction=.45) is a trick which
>     might
>     tend to create the right number and kind of 'buckets' in mondrian's
>     workspace.
> 
>     There has been some research to extend mondrian for GIS
>     applications: see
>     "An open source and web based framework for geographic and
>     multidimensional
>     processing" (da Silva, Times, Salgado, 2006),
>     http://portal.acm.org/citation.cfm?id=1141292
> 
>     Julian
> 
>     _______________________________________________
>     Mondrian mailing list
>     Mondrian at pentaho.org <mailto:Mondrian at pentaho.org>
>     http://lists.pentaho.org/mailman/listinfo/mondrian
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Mondrian mailing list
> Mondrian at pentaho.org
> http://lists.pentaho.org/mailman/listinfo/mondrian


-- 

   Paul Ramsey
   Refractions Research
   http://www.refractions.net
   pramsey at refractions.net
   Phone: 250-383-3022
   Cell: 250-885-0632



More information about the Mondrian mailing list