[Mondrian] RE: Nativize set

Julian Hyde jhyde at pentaho.com
Thu Oct 1 19:32:44 EDT 2009


 



 Matt Campbell wrote: 

 

 [We walked about] the ability to combine the axes in order to get the best
possible benefit from native evaluation.  This would help in cases where you
have a very large dimension on rows, for example, and a constraint on
columns which will reduce that set significantly.   I think when we last
talked about this you had thought the mechanics of combining all axes in one
pass and then splitting them apart would be fairly straightforward.  Do you
have any specific thoughts about how that would be done?  We've been
thinking about it ourselves and think there could be some complications
around dealing with empty rows and columns. 

I'm only confident that it works when both axes have a NON EMPTY constraint.
In that case there won't be any empty rows or columns. (In the language of
SSAS, combining the axes creates an EXISTS constraint.)

 

At least one of the axes will need to be re-sorted after the result has been
decomposed, since you can only sort by one set of criteria at a time.

 

Even if there is no explicit Order function, some kind of sort still needs
to be applied, because MDX expressions are inherently sorted. (Unless they
explicitly include UnOrder.)

 

Another approach would be to combine the axes into a crossjoin to compute a
set of 'non empty cells' (say in a hash table). Then re-evaluate one or both
axes from scratch, and probe into the set of cells. This approach works best
with axes that are small to medium-sized. For example, if you were
evaluating

 

SELECT NON EMPTY {Gender.Members} * {Measures.[Unit Sales], Measures.[Store
Sales]} on 0,

 NON EMPTY [Customer].Members on 1

FROM [Sales]

 

you would first find all non-empty cells, sorted by the natural order of the
customers hierarchy, compute axis #0 (6 positions), project out customers to
create axis #1, and as you do that, keep track of which positions on axis #0
have a cell under them.

 

This approach seems to better because you compute customers and cells in
only one pass (rather than two passes currently) but you don't have to
re-sort axis #0 into natural order. Also, it can be applied to a wider set
of queries: e.g. if they didn't specify 'NON EMPTY' on axis 0.

 

Now I've sketched out a couple of algorithms, can you give me some queries
you don't think the algorithms could handle, or could not handle
efficiently.

 

I've taken the liberty of Cc-ing mondrian-dev.

 

Julian

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20091001/82cdd98c/attachment.html 


More information about the Mondrian mailing list