[Mondrian] Shared degenerate dimensions

Wright, Jeff jeff.s.wright at truvenhealth.com
Tue Oct 21 11:55:16 EDT 2014


Thinking out loud, wondering what you lose when you use degenerate dimensions instead of a standard star schema...


-          For M3 I think you lose the ability to participate in a multi-level hierarchy, not sure if that holds for M4.

-          You lose the ability to specify order via a different column than values. And maybe you don't really lose that, but it doesn't seem typical for a degenerate dimension to have >1 column associated to it.

From: mondrian-bounces at pentaho.org [mailto:mondrian-bounces at pentaho.org] On Behalf Of Wright, Jeff
Sent: Monday, October 20, 2014 12:56 PM
To: Mondrian developer mailing list
Subject: Re: [Mondrian] Shared degenerate dimensions

Very interested in shared degenerate dimensions. Besides the BFT use case, I've seen examples in a conventional star schema, where users wanted to treat a degenerate dimension that occurred on >1 fact table as the same dimension.

I've seen both these cases:

1.       At least one fact table can be treated as "complete", listing all members of the shared dimension.

2.       The set of members is a union across all fact tables.

It would be nice if the mondrian schema could define that the members of a degenerate dimension could be obtained from:

-          Distinct values in the fact table

-          Distinct values across multiple fact tables

-          Lookup from a table of distinct values created by ETL

That last option could even help performance of member queries for non-shared degenerate dimensions.

--jeff

From: mondrian-bounces at pentaho.org<mailto:mondrian-bounces at pentaho.org> [mailto:mondrian-bounces at pentaho.org] On Behalf Of Matt Campbell
Sent: Monday, October 20, 2014 12:16 PM
To: Mondrian developer mailing list
Subject: [Mondrian] Shared degenerate dimensions


Data models with "Big Flat Tables" instead of star schemas seem to be getting more common, driven in part by columnar databases where use of separate dimension tables has little or no performance/storage benefit.  Similarly, with document stores like mongodb it often makes sense to store dimensional data together with facts in the same collection.  Use of the Thin Kettle JDBC with Mondrian also results in BFT models, since Thin Kettle can only expose a single table with no ability to join.

This data model is far simpler, but loses important power and flexibility.  Dimensions in cubes built on a BFT are degenerate and are un-sharable in Mondrian.  But even with BFTs users will still want to combine data with overlapping but non-identical dimensionalities.  I.e. they still have motivations to use Virtual Cubes (or multiple measure groups in M4).  I'm wondering what options might be worth considering to support this use case.  Should there be a new dimension link type to allow sharing degenerate dimensions?  The idea of a shared degen dimension feels a bit like a CopyLink to me.  It also seems like there would still need to be a single "official" source for the full set of members (level.members needs to return a consistent set).

Curious if others have thoughts.



-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20141021/d834956f/attachment-0001.html 


More information about the Mondrian mailing list