[Mondrian] Many to Many Dimensions in Mondrian

Will Gorman wgorman at pentaho.com
Tue Dec 17 00:53:34 EST 2013


Hi folks,

I am beginning work on implementing Many to Many Dimension relationships for a customer (MONDRIAN-1572).  I've done an initial prototype that is functioning and would like to start a conversation with the Mondrian developer community on implementing this feature for production use.  Here's what I've done so far:


-        Based on the Mondrian 3 codeline, I've implemented a set of Mondrian MDX tests that demonstrate the correct behaviors of Many to Many functionality in a basic use case

-        For this prototype I used annotations to represent the metadata

-        The only class I needed to modify so far is AggregateFunDef.java - I added logic that would swap out a many to many dimension (in the test cases I used "Customer") with its bridge dimension (in the test case I used "Account").  This in turn will allow the query to calculate against the correct parent context vs. the child context which would include duplicates.

-        At the moment this functionality takes advantage of the native non-empty implementation, if this is disabled the many to many functionality in AggregateFunDef fails to behave correctly.

Here is a link to the prototype implementation for more details: https://github.com/wgorman/mondrian/commit/802ed27b2a6484c111e5ef9ab7bad27d82e50fb3

Now that I have a prototype, I wanted to get input from the developer community on next steps.  First I'd like to propose some schema enhancements to both Mondrian 3.x and 4.0 for this capability:

For Mondrian 4.0, there is a place in the MondrianSchema.xml commented out:

    <!--
    // TODO: Define ManyToManyDimensionLink extends DimensionLink.
    // "The dimension table is joined to
    // an intermediate fact table. The intermediate fact table is joined, in
    // turn, to an intermediate dimension table to which the fact table is
    // joined".
    -->

Defining a ManyToManyDimensionLink this way makes sense to me as well.

In Mondrian 3.x, my recommendation would be to keep it simple and add a new attribute "m2mCube" to DimensionUsage.  This would reference the many to many bridge cube for the dimension vs. providing a foreign key.  Either that or we could add a new Usage element like "ManyToManyDimensionUsage", I'm up for suggestions here.


Second, I wanted to hear what folks think about the approach I took within AggregateFunDef, or if there is a different way that folks propose thinking about many to many dimensions that I may not have thought of before.  One proposal from Matt Campbell is to implement this at the MDX layer, here are his thoughts:



The expression I used w/ my many-to-many test looked like this.  It checks for specific aggregation levels and determines the appropriate rollup based on that context.



          iif ([Dim Customer].CurrentMember IS [Dim Customer].[All Dim Customers],

               [Measures].[BASE_Amount],

               iif ([Dim Account].CurrentMember IS [Dim Account].[All Dim Accounts],

                  Aggregate(Filter([Dim Account].[Account].members,

                           ([Dim Date].[All Dim Dates], [Measures].[Factless Account Customer Count]) > 0),

                           ValidMeasure( Measures.[BASE_Amount] )),

                  iif ( ([Dim Date].[All Dim Dates], [Measures].[Factless Account Customer Count]) > 0,

                         ([Dim Customer].[All Dim Customers],Measures.[BASE_Amount]),

                        null)))

I really appreciate all feedback.  For folks not familiar with Many to Many Dimensions, here is a great document I recommend for some background:
http://www.sqlbi.com/wp-content/uploads/The_Many-to-Many_Revolution_2.0.pdf

Thanks!

Will

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20131216/33e66086/attachment.html 


More information about the Mondrian mailing list