[Mondrian] virtual cubes, non-joining dims, performance

Matt Campbell mcampbell at pentaho.com
Fri Aug 19 11:37:35 EDT 2016


Hey Jeff-

The purpose of the UNION query with native evaluation of virtual cubes is to make sure that all of the tuples with NON EMPTY data from any of the relevant base cubes are included.  So if a query crossjoins Products and Stores from the [Warehouse and Sales] cube, native eval needs to query both the sales and warehouse fact tables in order to get the full set of NON EMPTY tuples.  It does that with queries to each fact table UNION’d together (that logic has been in place for years).

Prior to the changes for 1599/2280, SqlTupleReader would retrieve those base cubes relevant to the measures in the query, and then pare the list of cubes down to only those that join to all the dimensions in the crossjoin.  If any measures in the query might change context, however, that approach could give incorrect results.  There are some examples in the 2 jira cases.

The changes for 1599/2280 introduced grouping of native tuple queries, so we now retrieve tuples grouped by common joining base cubes.  STR only does this if one or more measures are determined to shift context (with ValidMeasure or otherwise).  But this does mean that we’re now more likely to execute UNION queries, since we’re evaluating in groups that share common cubes.

I’m definitely interested in hearing thoughts on improving performance in this scenario.  The trick is assuring correctness, which requires knowing which tuples have non-empty data across all relevant fact tables, without executing more expensive SQL than necessary.

-matt

From: <mondrian-bounces at pentaho.org<mailto:mondrian-bounces at pentaho.org>> on behalf of "jeff.s.wright at truvenhealth.com<mailto:jeff.s.wright at truvenhealth.com>" <jeff.s.wright at truvenhealth.com<mailto:jeff.s.wright at truvenhealth.com>>
Reply-To: Mondrian list <mondrian at pentaho.org<mailto:mondrian at pentaho.org>>
Date: Friday, August 19, 2016 at 10:26 AM
To: Mondrian list <mondrian at pentaho.org<mailto:mondrian at pentaho.org>>
Subject: [Mondrian] virtual cubes, non-joining dims, performance

I think this is primarily to Matt, but sending to mailing list in case there’s other interest or input…

We’ve been testing some new-ish code in an effort to get correct behavior of non-joining dimensions, and there seems to be a somewhat steep performance cost to the fix. What we’re seeing is that recent versions of SqlTupleReader generate an expensive UNION query across the fact tables in a virtual cube.

I haven’t carefully sorted through the code our team has tested, but looking at the revision history on github, I suspect we’re testing with the checkin for issue #1, but not issue #2:

#1 - http://jira.pentaho.com/browse/MONDRIAN-1599 - Some queries against Virtual cubes can give wrong results when dimensions don't fully join
#2 - http://jira.pentaho.com/browse/MONDRIAN-2280 - ValidMeasure not taken into account when Analyzer uses NonEmptyCrossjoin as it returns no data

Is the expensive UNION query something you’ve seen as well?

--jeff
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20160819/f91f2d13/attachment-0001.html 


More information about the Mondrian mailing list