[Mondrian] Visualizing expression evaluation / optimizing calculations

Eric McDermid mcdermid at stonecreek.com
Tue Feb 3 18:29:07 EST 2009


I'm working on optimizing the calc-intensive portions of some queries,  
and am encountering difficulty in understanding exactly what's going  
on in terms of expression evaluation.

First, a greatly simplified but concrete example.  Given a cube that  
looks something like this:

<Cube name="My_Cube">
...
           <Table name="my_fact"/>

	<Measure aggregator="sum" column="column_a" formatString="####.###"  
name="A" type="Numeric"/>
	<Measure aggregator="sum" column="column_b" formatString="####.###"  
name="B" type="Numeric"/>
	<CalculatedMember name="C" dimension="Measures" visible="true"  
type="Numeric">
		<Formula>IIF([Measures].[A] = 0, 0, 100 * ([Measures].[B]/[Measures]. 
[A]))</Formula>
		<CalculatedMemberProperty name="FORMAT_STRING" value="###.###" />
	</CalculatedMember>
...
</Cube>

Let's say I have a query that looks something like this:

WITH
MEMBER [Measures].[id]
     AS [XX].CurrentMember.Name
SET [Available]
     AS FILTER([XX].[Foo].Members, CAST([XX].[Foo].CurrentMember.Key  
AS NUMERIC) > 0 )
SET [Requested]
     AS FILTER([Available], 1=1 AND ([Measures].[C] < 5) AND  
([Measures].[B] > 500))
MEMBER [Measures].[total_available_count]
     AS Format(COUNT([Available]), "#####")
MEMBER [Measures].[total_result_count]
     AS Format(COUNT([Requested]), "#####")
MEMBER [KIs].[Aggregated]
     AS Aggregate([Requested])
SELECT
     {
         [Measures].[id],
         [Measures].[total_result_count],
         [Measures].[total_available_count],
         [Measures].[C],
         [Measures].[B],
         [Measures].[A]
     } ON COLUMNS,
     {
         [KIs].[Aggregated],
         [KIs].[Content],
         [KIs].[Unattributed],
         TAIL(
		HEAD(Order([Requested], [Measures].[B], BDESC), 20),
		IIF(20 > COUNT(Order([Requested], [Measures].[B], BDESC)),  
COUNT(Order([Requested], [Measures].[B], BDESC)) , 20) -0
	)
     } ON ROWS
FROM My_Cube WHERE...

I've noticed that swapping [Maasures].[C] < 5 and [Measures].[B] > 500  
in the SET [Requested] filter condition improves performance  
dramatically in most cases, though depending on the exact values  
chosen to compare against it can turn out to be a wash.

So, my questions fall into two categories:

First, is it reasonable to assume that the performance gain is a  
result of simply specifying the most restrictive subcondition in the  
filter first?  If so, is there a more appropriate way to help Mondrian  
figure out that certain filter conditions are more restrictive, or is  
order of conditions the only option?  Alternately, are there any  
generalized rules of thumb to follow, such as ensuring that conditions  
involving aggregate measures come before conditions involving  
calculated members?

Second, and more important in the "teach a man to fish" sense, what's  
the best way to get a sense for how this sort of thing breaks down  
internally for a given query?  I don't see any debug log output that  
seems appropriate, nor have I found quite the right place to drop in a  
breakpoint (I'm still looking, it's just taking forever).

All of this relates to Mondrian 2.4 running against MySql a the  
moment, but we do hope to get to 3.04 (or better, 3.1) soon.

  -- Eric




More information about the Mondrian mailing list