[Mondrian] performance/memory impact of slicers on segment load

Wright, Jeff jeff.s.wright at truvenhealth.com
Thu Aug 4 08:02:03 EDT 2016


I'm looking for a way to improve the performance and memory use where there are complex filters in the slicer.

Mondrian's default behavior is to "expand" the slicer. The dimensions referenced in the slicer are incorporated into the GROUP BY of the segment load sql, and Mondrian rolls up to the aggregation level of the MDX in memory. This behavior is a problem if:

* The cardinality of the slicer constraints is high. It's not hard to get to high cardinality (segmentLoad > 100,000 rows) if a user picks 4 or 5 dimensions to filter on, and the query already has some cardinality due to the other axes.

* If the MDX has TopCount(), it will never be evaluated natively for the segment load (because the segment load isn't at the aggregation level of the TopCount()).

I've noticed that the default evaluation approach I've just described doesn't happen for a distinct count measure (because Mondrian can't roll up results in memory). The difference in the logic seems to be in AggregateFunDef.AggregateCalc.aggregate(), where there are code branches to deal with distinct count (and average), vs other aggregation methods.

Has anybody else thought about this problem? A brute force approach would be to add a property to force the distinct count branch in AggregateFunDef.

Sample food mart queries with very simple slicers...

-- 1 filter, additive measure, segment load returns 93 rows of data for 3 rows of output
WITH SET [Time Filter] AS '[Time.Weekly].[Week].[10] : [Time.Weekly].[Week].[40]'
SELECT
NON EMPTY [Measures].[Unit Sales] ON 0,
NON EMPTY [Customers].[State Province].members on 1
FROM Sales
WHERE [Time Filter]

-- 1 Filter, Distinct Count measure, segment load returns 3 rows
WITH SET [Time Filter] AS
  '[Time.Weekly].[Week].[10] : [Time.Weekly].[Week].[40]'
SELECT
NON EMPTY {[Measures].[Customer Count]} ON 0,
NON EMPTY [Customers].[State Province].members on 1
FROM Sales
WHERE [Time Filter]

-- TopCount, additive measure, segment load returns 119 rows
WITH SET [Time Filter] AS '[Time.Weekly].[Week].[10] : [Time.Weekly].[Week].[40]'
SELECT
NON EMPTY [Measures].[Unit Sales] ON 0,
NON EMPTY TopCount([Store].[Store Name].members, 5, [Measures].[Unit Sales]) on 1
FROM Sales
WHERE [Time Filter]

-- TopCount, Distinct count measure, segment load returns 5 rows
WITH SET [Time Filter] AS '[Time.Weekly].[Week].[10] : [Time.Weekly].[Week].[40]'
SELECT
NON EMPTY [Measures].[Customer Count] ON 0,
NON EMPTY TopCount([Store].[Store Name].members, 5, [Measures].[Customer Count]) on 1
FROM Sales
WHERE [Time Filter]

--jeff
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20160804/18d40eb0/attachment.html 


More information about the Mondrian mailing list