[Mondrian] Many to Many Dimensions in Mondrian
Will Gorman
wgorman at pentaho.com
Tue Dec 17 00:53:34 EST 2013
Hi folks,
I am beginning work on implementing Many to Many Dimension relationships for a customer (MONDRIAN-1572). I've done an initial prototype that is functioning and would like to start a conversation with the Mondrian developer community on implementing this feature for production use. Here's what I've done so far:
- Based on the Mondrian 3 codeline, I've implemented a set of Mondrian MDX tests that demonstrate the correct behaviors of Many to Many functionality in a basic use case
- For this prototype I used annotations to represent the metadata
- The only class I needed to modify so far is AggregateFunDef.java - I added logic that would swap out a many to many dimension (in the test cases I used "Customer") with its bridge dimension (in the test case I used "Account"). This in turn will allow the query to calculate against the correct parent context vs. the child context which would include duplicates.
- At the moment this functionality takes advantage of the native non-empty implementation, if this is disabled the many to many functionality in AggregateFunDef fails to behave correctly.
Here is a link to the prototype implementation for more details: https://github.com/wgorman/mondrian/commit/802ed27b2a6484c111e5ef9ab7bad27d82e50fb3
Now that I have a prototype, I wanted to get input from the developer community on next steps. First I'd like to propose some schema enhancements to both Mondrian 3.x and 4.0 for this capability:
For Mondrian 4.0, there is a place in the MondrianSchema.xml commented out:
<!--
// TODO: Define ManyToManyDimensionLink extends DimensionLink.
// "The dimension table is joined to
// an intermediate fact table. The intermediate fact table is joined, in
// turn, to an intermediate dimension table to which the fact table is
// joined".
-->
Defining a ManyToManyDimensionLink this way makes sense to me as well.
In Mondrian 3.x, my recommendation would be to keep it simple and add a new attribute "m2mCube" to DimensionUsage. This would reference the many to many bridge cube for the dimension vs. providing a foreign key. Either that or we could add a new Usage element like "ManyToManyDimensionUsage", I'm up for suggestions here.
Second, I wanted to hear what folks think about the approach I took within AggregateFunDef, or if there is a different way that folks propose thinking about many to many dimensions that I may not have thought of before. One proposal from Matt Campbell is to implement this at the MDX layer, here are his thoughts:
The expression I used w/ my many-to-many test looked like this. It checks for specific aggregation levels and determines the appropriate rollup based on that context.
iif ([Dim Customer].CurrentMember IS [Dim Customer].[All Dim Customers],
[Measures].[BASE_Amount],
iif ([Dim Account].CurrentMember IS [Dim Account].[All Dim Accounts],
Aggregate(Filter([Dim Account].[Account].members,
([Dim Date].[All Dim Dates], [Measures].[Factless Account Customer Count]) > 0),
ValidMeasure( Measures.[BASE_Amount] )),
iif ( ([Dim Date].[All Dim Dates], [Measures].[Factless Account Customer Count]) > 0,
([Dim Customer].[All Dim Customers],Measures.[BASE_Amount]),
null)))
I really appreciate all feedback. For folks not familiar with Many to Many Dimensions, here is a great document I recommend for some background:
http://www.sqlbi.com/wp-content/uploads/The_Many-to-Many_Revolution_2.0.pdf
Thanks!
Will
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20131216/33e66086/attachment.html
More information about the Mondrian
mailing list