[Mondrian] RolapResult/RolapEvaluator evaluation flawed - checkin 8771

Richard Emberson remberson at edgedynamics.com
Thu Feb 22 15:50:32 EST 2007


This is with regards to checkin 8771.

For the past year or so with customer datasets there
were issues with non-empty crossjoin optimization.
I always thought it had something to do with how the
crossjoin optimization algorithm worked and I've spent
eons working on it. The problem is more fundamental -
the RolapResult/RolapEvaluator evaluation algorithm is flawed.
Any Axis evaluation that involves evaluation a Measure as
part of determining which Member to include in the Axis might
generate incorrect results if the Cube has one or more
Non-Normal hierarchies.  A Hierarchy is termed 'Non-Normal'
(a term I use) if either it has hasAll==false or if it
hasAll==true but the default Member is not the All-Member.

The new set of junit tests in the class mondrian.rolap.RolapResultTest
with the CSV (comma-separated-value) dataset tables in
RolapResultTest.csv demonstrate the issue using the "filter'
function in a query.
In the dataset there are two Dimension tables, D1 and D2,
and two FactTables, FT1 adn FT2.
Dimension 1, D1, has 4 Members of the first Level:
'a', 'b', 'c' and 'd'.
Dimension 2, D2, has 4 Members of the first Level:
'x', 'y', 'z' and 'w'.
In FactTable 1 (FT1) there are values for:
D1=='a' and D2=='x';
D1=='b' and D2=='y'; and
D1=='c' and D2=='z'.
In FactTable 2 (FT2) there are values for:
D1=='a' and D2=='x';
D1=='b' and D2=='y';
D1=='c' and D2=='z';
D1=='d' and any member of D2 and
any member of D1 and D2=='w'.
These relationships can be seen in the RolapResultTest.csv
dataset definition file.

There are three Cubes defined in RolapResultTest.java.
Each Cube defines two private Dimensions, D1 and D2 (the
same names as their tables), and a single Measure with
name "Value" and column 'value'.
The Dimensions each have a single Hierarchy with a
single Level. Between the three Cubes, the first
difference is how the Dimensions' hierarchies are
defined: hasAll either true or false and what
is the defaultMember. The other difference is
what FactTable is used.

The first Cube is called FTAll and uses FactTable 1 (FT1) and
hasAll==true with no defaultMember specified (so the
defaultMember is the All-Member) for both Dimensions.

The second Cube is FT1 which uses FactTable 1 (FT1)
and hasAll==false where for Hierarchy D1 the
defaultMember is '[D1].[d]' and
for Hierarchy D2 the defaultMember is '[D2].[w]'.
Note that for FT1, both these defaultMember have
no [Measure].[Value] values - there is no data.

The last Cube is FT2 and uses FactTable 2 (FT2) with
hasAll==false for both hierarchies and
for Hierarchy D1 the defaultMember is '[D1].[d]' and
for Hierarchy D2 the defaultMember is '[D2].[w]'.
In this case, both defaultMember's have
[Measure].[Value] values for all of the Members of the
other Hierarchy.

Really, these are very simple tables and cubes definitions.

The test query is:

select
     filter({[D1].[a],[D1].[b],[D1].[c]}, [Measures].[Value] > 0)
     ON COLUMNS,
     {[D2].[x],[D2].[y],[D2].[z]}
     ON ROWS
from <CUBE_NAME>

where <CUBE_NAME> is replaced with FTAll, FT1 and FT2.
For all cubes one expects that the result should be:

   Axis #0:
   {}
   Axis #1:
   {[D1].[a]}
   {[D1].[b]}
   {[D1].[c]}
   Axis #2:
   {[D2].[x]}
   {[D2].[y]}
   {[D2].[z]}
   Row #0: 5
   Row #0:
   Row #0:
   Row #1:
   Row #1: 10
   Row #1:
   Row #2:
   Row #2:
   Row #2: 15

(where for the FTAll cube the Member names have an additional
all member, e.g., instead of "[D1].[a]" one has "[D1].[All D1s].[a]").

But it is not so, the FT1 cube yields:

   Axis #0:
   {}
   Axis #1:
   Axis #2:
   {[D2].[x]}
   {[D2].[y]}
   {[D2].[z]}

Why?

To understand the flaw, one must understand how the
RolapResult/RolapEvaluator evaluates. When the RolapEvaluator is
created it creates an array of all Members associated with the
Cube - this is the evaluation Context (which is a single cell or a
block of cells defined by the Members of the array). These Members
are the 'default' Members of each of the Cube's Hierarchies.  The
RolapResult evaluation algorithm then evaluates each axis one at
a time (starting with the slicer if it exists), until all data has
been loaded - until all Members for that axis has been determined.
How does it evaluate a Measure?
It takes the current evaluation Context and see if
there is any data in the current cell/block-of-cells.

If you have a Time Dimension where the default member is, say,
2010 and there is no data for that year in your dataset, then
you will never get anything back even if on one axis you
explicitly use different year for which there is data.

For the FTAll cube, both Dimensions have All-member which
are the default Members.  These All-Members are then
the ones used to determine if a particular Member has
data. For example, in the above MDX, the D1 Dimension
member [D1].[a] is evaluate in the Context where the
D2 member is the All-Member. For all D2 there is data,
so [D1].[a] evaluates to have data and as a Member it
is part of the generated columns axis.

For the FT2 cube, where the Dimensions have hasAll==false,
but where the default Members have data for all Members
of the other Dimension. In this case, the MDX yields the
same as for FTAll - data is found, for example, for the
[D1].[a], [D2].[x] cell.

Lastly, for the FT1 where the Dimensions have hasAll==false
but the default Members have no data. In this case,
member [D1].[a] is evaluate in the Context of the [D2].[w]
member for which there is no data. Thus, none of the
D1 Members are added to the columns axis.
This is wrong - the flaw.
It should evaluate just like the other.

If you care if you customers get the correct data, what
should you do?

In the short term, with the current Mondrian build, make all
your Hierarchies 'Normal' (hasAll==true and the default Member
is the All-Member).

How to fix Mondrian?
One option might be, if a Hierarchy
is 'Non-Normal', then, under the covers, Mondrian make it
'Normal' by creating an implicit All-Member which is
also the default Member. This implicit All-Member is used
only for the Mondrian's evaluation phase.
I have not tried this nor do not know if it is the correct
approach. It might generate false positives, a Member
is include because there is data when evaluated with the
other Hierarchies' All-Member, but there might not
actually be data for those Members of the other Hierarchies
that are explicitly named in other axes.

Best might be to always evaluate in the Context of the Members
that are explicitly named in the other axes. But what are
they? What order might they be applied? How can you know them
before the other axes are themselves evaluated?
So, evaluate first with all Normal Hierarchies All-Members
(including any implicit generated). Then, having determined what
Members appear in which axes, somehow use them to re-evaluate??
Somehow, one must evaluate the axes all at the same time or some
facsimile of that action.

Richard

-- 
Quis custodiet ipsos custodes:
This email message is for the sole use of the intended recipient(s) and
may contain confidential information.  Any unauthorized review, use,
disclosure or distribution is prohibited.  If you are not the intended
recipient, please contact the sender by reply email and destroy all
copies of the original message.



More information about the Mondrian mailing list