[Mondrian] Lots of Dimensions & Lots of Rows

Julian Hyde julianhyde at speakeasy.net
Tue Feb 6 21:00:40 EST 2007



> My concern is that the number of dimensions on top of the number of 
> facts will make the whole thing unworkable, since people are going to 
> expect to query the thing through a *lot* of different paths.  As an 
> added bonus, the deployment environment will be PostgreSQL.

The number of dimensions is not a huge problem per se. If mondrian operating
in a ROLAP mode (that is, generating a SQL query for each set of cells) then
each dimension is a POTENTIAL thing to slice on but it's only the dimensions
ACTUALLY sliced on which affect the performance of the SQL.

If you create aggregate tables -- and you probably will need to, for that
data volume -- a large number of dimensions becomes more of a problem -
because you will need a correspondingly large number of aggregate tables.

There may be some tricks you can use when designing your aggregate tables.
If your DBMS supports special indexes for GIS (just the kind of thing that
PostgreSQL does very well) you should try to design the agg tables so that
those indexes get used.

Also, if a lot of your queries are localized (e.g. queries for data within
10 km of a given town) index your fact table so that this data set can be
readily retrieved.

Databases -- mondrian included -- don't handle ranges as well as they handle
discrete values. So, splitting spatial coordinates into the integral and
fractional part (e.g. 34.56 N, 123.45 W becomes lat_whole=34
lat_fraction=.56 long_whole=-123 long_fraction=.45) is a trick which might
tend to create the right number and kind of 'buckets' in mondrian's
workspace.

There has been some research to extend mondrian for GIS applications: see
"An open source and web based framework for geographic and multidimensional
processing" (da Silva, Times, Salgado, 2006),
http://portal.acm.org/citation.cfm?id=1141292

Julian




More information about the Mondrian mailing list