[Mondrian] Visualizing expression evaluation / optimizing calculations
Eric McDermid
mcdermid at stonecreek.com
Tue Feb 3 18:29:07 EST 2009
I'm working on optimizing the calc-intensive portions of some queries,
and am encountering difficulty in understanding exactly what's going
on in terms of expression evaluation.
First, a greatly simplified but concrete example. Given a cube that
looks something like this:
<Cube name="My_Cube">
...
<Table name="my_fact"/>
<Measure aggregator="sum" column="column_a" formatString="####.###"
name="A" type="Numeric"/>
<Measure aggregator="sum" column="column_b" formatString="####.###"
name="B" type="Numeric"/>
<CalculatedMember name="C" dimension="Measures" visible="true"
type="Numeric">
<Formula>IIF([Measures].[A] = 0, 0, 100 * ([Measures].[B]/[Measures].
[A]))</Formula>
<CalculatedMemberProperty name="FORMAT_STRING" value="###.###" />
</CalculatedMember>
...
</Cube>
Let's say I have a query that looks something like this:
WITH
MEMBER [Measures].[id]
AS [XX].CurrentMember.Name
SET [Available]
AS FILTER([XX].[Foo].Members, CAST([XX].[Foo].CurrentMember.Key
AS NUMERIC) > 0 )
SET [Requested]
AS FILTER([Available], 1=1 AND ([Measures].[C] < 5) AND
([Measures].[B] > 500))
MEMBER [Measures].[total_available_count]
AS Format(COUNT([Available]), "#####")
MEMBER [Measures].[total_result_count]
AS Format(COUNT([Requested]), "#####")
MEMBER [KIs].[Aggregated]
AS Aggregate([Requested])
SELECT
{
[Measures].[id],
[Measures].[total_result_count],
[Measures].[total_available_count],
[Measures].[C],
[Measures].[B],
[Measures].[A]
} ON COLUMNS,
{
[KIs].[Aggregated],
[KIs].[Content],
[KIs].[Unattributed],
TAIL(
HEAD(Order([Requested], [Measures].[B], BDESC), 20),
IIF(20 > COUNT(Order([Requested], [Measures].[B], BDESC)),
COUNT(Order([Requested], [Measures].[B], BDESC)) , 20) -0
)
} ON ROWS
FROM My_Cube WHERE...
I've noticed that swapping [Maasures].[C] < 5 and [Measures].[B] > 500
in the SET [Requested] filter condition improves performance
dramatically in most cases, though depending on the exact values
chosen to compare against it can turn out to be a wash.
So, my questions fall into two categories:
First, is it reasonable to assume that the performance gain is a
result of simply specifying the most restrictive subcondition in the
filter first? If so, is there a more appropriate way to help Mondrian
figure out that certain filter conditions are more restrictive, or is
order of conditions the only option? Alternately, are there any
generalized rules of thumb to follow, such as ensuring that conditions
involving aggregate measures come before conditions involving
calculated members?
Second, and more important in the "teach a man to fish" sense, what's
the best way to get a sense for how this sort of thing breaks down
internally for a given query? I don't see any debug log output that
seems appropriate, nor have I found quite the right place to drop in a
breakpoint (I'm still looking, it's just taking forever).
All of this relates to Mondrian 2.4 running against MySql a the
moment, but we do hope to get to 3.04 (or better, 3.1) soon.
-- Eric
More information about the Mondrian
mailing list