[Mondrian] Unexpected behaviour while using a TopCount in a Named Set

Tue Jul 10 05:32:44 EDT 2012

Hi again,

I have been looking at olap4j logs and analysing the SQL queries which 
are being generated, and found the following query which is a bit 
suspicious, since it seems that its selecting only the customers for 
[Time].[1997].[Q4].[10].

SELECT `time_by_day`.`the_year` AS `c0`,
        `time_by_day`.`quarter` AS `c1`,
        `time_by_day`.`month_of_year` AS `c2`,
        `customer`.`customer_id` AS `c3`,
         sum(`sales_fact_1997`.`unit_sales`) AS `m0`
FROM `time_by_day` AS `time_by_day`,
      `sales_fact_1997` AS `sales_fact_1997`,
      `customer` AS `customer`
WHERE `sales_fact_1997`.`time_id` = `time_by_day`.`time_id`
   AND `time_by_day`.`the_year` = 1997
   AND `time_by_day`.`quarter` = 'Q4'
   AND `time_by_day`.`month_of_year` = 10
   AND `sales_fact_1997`.`customer_id` = `customer`.`customer_id`
GROUP BY `time_by_day`.`the_year`,
          `time_by_day`.`quarter`,
          `time_by_day`.`month_of_year`,
          `customer`.`customer_id`

According to the log, this query is used to evaluate the Named set Top 
Count, and evaluates to this, which are the top 5 customers for 
[Time].[1997].[Q4].[10]

10:02:51,571 DEBUG [ResultBase] mondrian.rolap.RolapNamedSetEvaluator at 38e43ba7: Named set Top Count evaluated to:
[Customers].[USA].[WA].[Olympia].[Patricia Gervasi]
[Customers].[USA].[WA].[Bremerton].[Richard Vansdal]
[Customers].[USA].[WA].[Puyallup].[Tawnya Ancheta]
[Customers].[USA].[WA].[Spokane].[Edna Woodson]
[Customers].[USA].[WA].[Spokane].[Ida Rodriguez]

Later, on other SQL query, the IDs of these customers are used to make 
the final result. Thus, if the intermediate results are adulterated, 
since they only consider [Time].[1997].[Q4].[10], the final results will 
not be correct.

Is this what is generating wrong results?

Cheers,
Pedro

On 07/09/2012 06:28 PM, Pedro Salgueiro wrote:
> We have just tried with mondrian 3.4.5 and 3.4.0-SNAPSHOT, and the 
> results are the same.
> About the 'NON EMPTY', we removed it from both queries and the 
> behaviour is the same.
>
> We are  open to more suggestions.
>
> Pedro
>
> On 07/09/2012 06:13 PM, Julian Hyde wrote:
>> I don't know whether it would make any difference with your 
>> particular data set, but for the record, the queries are not 
>> equivalent. According to the MDX spec, named sets are evaluated after 
>> the slicer axis but before other axes, therefore the second query 
>> doesn't inherit the 'NON EMPTY'.
>>
>> It's worth trying the latest 3.4 code, as Pedro suggests.
>>
>> Julian
>>
>>
>> On Jul 9, 2012, at 9:56 AM, Pedro Alves wrote:
>>
>>> Did you try the latest code in 3.4 branch? We fixed some issues with
>>> top count very recently
>>>
>>> On Mon 09 Jul 2012 05:54:26 PM WEST, Pedro Salgueiro wrote:
>>>> Hi!
>>>>
>>>> I am using a Named Set with a TopCount, which is then used in a Select
>>>> which has a set of time members in the WHERE clause.  For some reason,
>>>> I am getting an unexpected behaviour: The TopCount statement doesn't
>>>> seems to be affected by the WHERE clause. I tried to replicate the
>>>> situation while using the Foodmart and managed to get the same
>>>> behaviour, so I guess that we can exclude any problem with the cube we
>>>> are using.
>>>>
>>>> To better explain the problem, this is the same query, but not using
>>>> the set:
>>>>
>>>>     SELECT
>>>>     NON EMPTY TopCount([Customers].[Name].Members, 5, 
>>>> [Measures].[Unit Sales]) * [Measures].[Unit Sales] on 0
>>>>     FROM [Sales]
>>>>     WHERE [Time].[1997].[Q1].[1]:[Time].[1997].[Q4].[10]
>>>>
>>>> This query produces the following results:
>>>>
>>>>     USA
>>>>     WA
>>>>     Spokane
>>>>     Mary Francis Benigar James Horvat Matt Bellah Ida Rodriguez 
>>>> Kristin Miller
>>>>     Unit Sales           Unit Sales   Unit Sales  Unit Sales    
>>>> Unit Sales
>>>>     ==================== ============ =========== ============= 
>>>> ==============
>>>>                       422          369         363 344            323
>>>>
>>>>
>>>> This query is equivalent to the following one, which moves the
>>>> TopCount to the set [TopCount]:
>>>>
>>>>     WITH
>>>>     SET [Top Count]
>>>>     AS { TopCount([Customers].[Name].Members, 5, [Measures].[Unit 
>>>> Sales]) }
>>>>
>>>>     SELECT
>>>>     NON EMPTY [Top Count] * [Measures].[Unit Sales] on 0
>>>>     FROM [Sales]
>>>>     WHERE [Time].[1997].[Q1].[1]:[Time].[1997].[Q4].[10]
>>>>
>>>>
>>>> Although this query is equivalent to the first one, it produces
>>>> different results:
>>>>
>>>>     USA
>>>>     WA
>>>>     Olympia          Bremerton       Puyallup       Spokane
>>>>     Patricia Gervasi Richard Vansdal Tawnya Ancheta Edna Woodson 
>>>> Ida Rodriguez
>>>>     Unit Sales       Unit Sales      Unit Sales     Unit Sales   
>>>> Unit Sales
>>>>     ================ =============== ============== ============ 
>>>> =============
>>>>                   226             156 220          243           344
>>>>
>>>> The funny thing, is that this behaviour is only seen in some cases,
>>>> which is probably related to the data in the cube.
>>>>
>>>> Is this the behaviour to be expected from the the last query? If so,
>>>> it doesn’t makes any sense, since the statement in the WHERE clause
>>>> should be propagated to the named sets, thus resulting in the top 5
>>>> customers for the given time interval.
>>>>
>>>> Could this be a mondrian bug?
>>>>
>>>> Has anyone encountered this problem before?
>>>>
>>>> Best regards,
>>>> Pedro
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Mondrian mailing list
>>>> Mondrian at pentaho.org
>>>> http://lists.pentaho.org/mailman/listinfo/mondrian
>>> _______________________________________________
>>> Mondrian mailing list
>>> Mondrian at pentaho.org
>>> http://lists.pentaho.org/mailman/listinfo/mondrian
>> _______________________________________________
>> Mondrian mailing list
>> Mondrian at pentaho.org
>> http://lists.pentaho.org/mailman/listinfo/mondrian
>>
>
>