[Mondrian] Question on non-aggregable measures

Julian Hyde jhyde at pentaho.org
Fri Nov 23 05:58:43 EST 2007


 Michael Bienstein wrote:
I don't think it will work for my requirements because the data has to come
from aggregate tables and not the fact table at all.  

I hear you. We need a measure which may or may not exist in the fact table,
may or may not exist in any particular agg table.
A schema designer should optionally be able to specify that a
non-aggregatable measure to be stored in the fact table. In your case, just
don't specify a column when you define the measure. Mondrian should accept
that iff the aggregate function is 'none'.
The measure would exist only at those levels of aggregation for which there
is an aggregate table and that agg table has a column that maps (according
to the mapping rules) to that measure.
The measure would also exist if there was a rollup path along rollup-able
dimensions. (In your world, there are no rollup-able dimensions, but in my
more general world they exist and are very useful.)
In other cases it would be EMPTY.
By the way, storing measures in agg tables breaks the principle of data
independence <http://en.wikipedia.org/wiki/Data_independence> , so is not
recommended unless you have special requirements. If you store measures in
agg tables, the agg tables have to be set in stone, and you would not be
able to use an automated agg table designer.

Actually there's a case of when the fact table IS the lowest level of
aggregation already, but I don't think Mondrian deals with that (Count will
count the rows in fact tables but sum the count measure in aggregate tables
for example).   

I don't see the problem. There is no universal 'count' measure in MDX. The
only 'official' use for a count measure is when computing an average, and
AVG(x) always uses the local count from the same cube as x.
The metamodel is flexible, so you can roll your own count measure in
whatever way works for you. In a virtual cube, you can have several count
measures from several constituent cubes, and they will give different
answers. You're free to create a FACT_COUNT column in an agg table and
populate it with ones, or create a count measure in a fact table which is a
sum of a column.

However I think it is a good idea, if orthogonal.  The only disagreement I
have is the semi-additive measure system you've got.  MDX should be used to
define these sorts of things.  It's great if there is a mechanism to place
these cells into the cell cache (calculated cells aren't right now I think)
but don't use "last" like you have.  Once again all this is I think
orthogonal to my needs.

I'd like to hear from people who are using semi-additive measures (SAMs).
Are they doing it successfully using calculated members? I have a feeling
that it would be better if there were mechanisms to define SAMs as both
stored measures, and as calculated measures.
Improved caching would help calc measures - see my post/email about a week
ago on improved statement-level caching - but they are never going to be as
efficient as stored measures. Stored measures can be included in agg tables,
and we can let SQL do the heavy lifting of multiple layers of aggregation
(e.g. first sum along dimension x, then find the last value along the time
dimension, then sum along dimensions y and z). Modern SQL includes LAST as
an agg function, so we should leverage this.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20071123/9e66869b/attachment.html 

More information about the Mondrian mailing list