[Mondrian] RE: Mondrian Bug 1961163 NON EMPTY and Filter() breaking aggregate calculations

Julian Hyde jhyde at pentaho.com
Wed Oct 15 13:52:48 EDT 2008

Will do. Many thanks for the fix!


From: Andreas_Voss at tonbeller.com [mailto:Andreas_Voss at tonbeller.com] 
Sent: Wednesday, October 15, 2008 9:35 AM
To: jhyde at pentaho.com; Mondrian developer mailing list
Subject: Mondrian Bug 1961163 NON EMPTY and Filter() breaking aggregate

Hi all, 

I have analyzed and fixed that bug. It turned out that I'd consider this a
'heavy' bug because it has the potential to return invalid results for many
NON EMPTY queries. Because I did not manage to upload a file to the bug
tracker :-( I send the patches here attached. Description of the fix
follows. Would be glad if someone (Juilan?) could review this. 



This bug consists of three bugs (I think). I have attached a patch that
fixes these bugs (hopefully). The patch is against a mondrian 2.x version
(probably the latest JDK-1.4 version). Together with the description below
it should be easy to apply the fix against the head version of Mondrian. 

Description of the fixes: 

1. SqlContextConstraint#isValidContext() 

We can only restrict the result if we are in NON EMPTY mode. So I added 

        // we can restrict the result in NON EMPTY mode only 
        if (!context.isNonEmpty()) { 
            return false; 

2. RolapEvaluator#getExpResultCacheKey() 

The RolapEvaluator caches results that have been computed for an expression.
The cache key consideres the members of those dimensions that the expression
depends on. In NON EMPTY mode however, the result may depend on any
dimension. So I added: 

     * Creates a key which uniquely identifes an expression and its 
     * context. The context includes members of dimensions which the 
     * expression is dependent upon. 
    private Object getExpResultCacheKey(ExpCacheDescriptor descriptor) { 
        List key = new ArrayList(); 
        // in NON EMPTY mode the result depends on everything, e.g. 
        // "NON EMPTY [Customer].[Name].members" may return different
        // for 1997-01 and 1997-02 
        if (nonEmpty) { 
            for (int i = 0; i < currentMembers.length; i++) 
            return key; 


3. check all invocations of ListCalc#evaluateList() 

This is the tricky one. The problem is that for a NON EMPTY axis expression
the expression and all of its sub expressions are evaluated in NON EMPTY
mode. Thats wrong. Lets look at the query excerpt 

NON EMPTY Filter([Store].[Store Name].Members, ([Measures].[AvgRevenue] <
[Measures].[Store Sales])) 

Here the Filter() is evaluated in NON EMPTY context. Its OK to evaluate the
set expression [Store].[Store Name].Members in NON EMPTY mode too, because
the empty stores are removed from the result later anyway. But its not ok to
evaluate the condition in NON EMPTY mode. 

During evaluation the Filter() function places one Store after the other
into the context and evaluates the condition. Here the condition contains
the expression 

Avg([Store].[Store Name].Members, [Measures].[Store Sales]) 

If here the [Store].[Store Name].Members argument is evaluated in NON EMPTY
mode and there is a particular store in the context, then the sql optimizer
will return only that store - no need to return them all. Thats wrong in the
context of AVG, we want to compute the average over all stores, so the
Filter() has to cancel the NON EMPTY mode when evaluating the condition. 

In general, the NON EMPTY mode has to be cancelled when certain sub
expressions are evaluated. I found it difficult to decide for which sub
expressions the NON EMPTY mode has to be cancelled. A rule of thumb may be: 

a) when a function returns a set, then its input set argument may be
evaluated in NON EMPTY MODE (e.g. Filter, TopCount, CrossJoin) 
b) all expressions, that do not return a set may not evaluate their set
arguments  in NON EMPTY mode (e.g. Aggregate, Avg) 

c) non-set evaluation does not care (e.g. boolean, number, member, tuple) 

But there are exceptions, e.g. Tail(<Set>). Here the set argument has to be
evaluated with NON EMPTY cancelled - (I think). 

I added to RolapEvaluator 

    public Evaluator push(boolean nonEmpty) { 
        final RolapEvaluator evaluator = _push(); 
        return evaluator; 

Then I checked all invocations of ListCalc#evaluateList(). For example in
AvgFunDef I changed 

            public Object evaluate(Evaluator evaluator) { 
                List memberList = listCalc.evaluateList(evaluator); 
                return avg(evaluator.push(), memberList, calc); 


            public Object evaluate(Evaluator evaluator) { 
                List memberList =
                return avg(evaluator.push(false), memberList, calc); 

so the AVG function always evaluates its arguments with NON EMPTY cancelled.


Andreas Voss
Tel: +49 6251 7000 - 354
Fax: +49 6251 7000 - 140
 <mailto:av at tonbeller.com> av at tonbeller.com 

Werner-von-Siemens-Str. 2
D-64625 Bensheim 


Register Court: District Court Darmstadt
Registration: HRB 21474
Managing Board: Rutger Hetzler (CEO), Sebastian Hetzler, Torsten Mayer
Chairman of the Supervisory Board: Rüdiger Brand


This message is for the designated recipient only and may contain
privileged, proprietary, or otherwise private information. If you have
received it in error, please notify the sender immediately and delete the
original. Any unauthorised copying or dissemination of this message is
strictly prohibited.

Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail
irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und
vernichten Sie diese E-Mail. Das unerlaubte Kopieren sowie die Weitergabe
dieser E-Mail ist nicht gestattet.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20081015/c0a03fc9/attachment.html 

More information about the Mondrian mailing list