[Mondrian] Unexpected behaviour while using a TopCount in a Named Set
Pedro Salgueiro
pedro.salgueiro at inductiva.pt
Tue Jul 10 05:32:44 EDT 2012
Hi again,
I have been looking at olap4j logs and analysing the SQL queries which
are being generated, and found the following query which is a bit
suspicious, since it seems that its selecting only the customers for
[Time].[1997].[Q4].[10].
SELECT `time_by_day`.`the_year` AS `c0`,
`time_by_day`.`quarter` AS `c1`,
`time_by_day`.`month_of_year` AS `c2`,
`customer`.`customer_id` AS `c3`,
sum(`sales_fact_1997`.`unit_sales`) AS `m0`
FROM `time_by_day` AS `time_by_day`,
`sales_fact_1997` AS `sales_fact_1997`,
`customer` AS `customer`
WHERE `sales_fact_1997`.`time_id` = `time_by_day`.`time_id`
AND `time_by_day`.`the_year` = 1997
AND `time_by_day`.`quarter` = 'Q4'
AND `time_by_day`.`month_of_year` = 10
AND `sales_fact_1997`.`customer_id` = `customer`.`customer_id`
GROUP BY `time_by_day`.`the_year`,
`time_by_day`.`quarter`,
`time_by_day`.`month_of_year`,
`customer`.`customer_id`
According to the log, this query is used to evaluate the Named set Top
Count, and evaluates to this, which are the top 5 customers for
[Time].[1997].[Q4].[10]
10:02:51,571 DEBUG [ResultBase] mondrian.rolap.RolapNamedSetEvaluator at 38e43ba7: Named set Top Count evaluated to:
[Customers].[USA].[WA].[Olympia].[Patricia Gervasi]
[Customers].[USA].[WA].[Bremerton].[Richard Vansdal]
[Customers].[USA].[WA].[Puyallup].[Tawnya Ancheta]
[Customers].[USA].[WA].[Spokane].[Edna Woodson]
[Customers].[USA].[WA].[Spokane].[Ida Rodriguez]
Later, on other SQL query, the IDs of these customers are used to make
the final result. Thus, if the intermediate results are adulterated,
since they only consider [Time].[1997].[Q4].[10], the final results will
not be correct.
Is this what is generating wrong results?
Cheers,
Pedro
On 07/09/2012 06:28 PM, Pedro Salgueiro wrote:
> We have just tried with mondrian 3.4.5 and 3.4.0-SNAPSHOT, and the
> results are the same.
> About the 'NON EMPTY', we removed it from both queries and the
> behaviour is the same.
>
> We are open to more suggestions.
>
> Pedro
>
> On 07/09/2012 06:13 PM, Julian Hyde wrote:
>> I don't know whether it would make any difference with your
>> particular data set, but for the record, the queries are not
>> equivalent. According to the MDX spec, named sets are evaluated after
>> the slicer axis but before other axes, therefore the second query
>> doesn't inherit the 'NON EMPTY'.
>>
>> It's worth trying the latest 3.4 code, as Pedro suggests.
>>
>> Julian
>>
>>
>> On Jul 9, 2012, at 9:56 AM, Pedro Alves wrote:
>>
>>> Did you try the latest code in 3.4 branch? We fixed some issues with
>>> top count very recently
>>>
>>> On Mon 09 Jul 2012 05:54:26 PM WEST, Pedro Salgueiro wrote:
>>>> Hi!
>>>>
>>>> I am using a Named Set with a TopCount, which is then used in a Select
>>>> which has a set of time members in the WHERE clause. For some reason,
>>>> I am getting an unexpected behaviour: The TopCount statement doesn't
>>>> seems to be affected by the WHERE clause. I tried to replicate the
>>>> situation while using the Foodmart and managed to get the same
>>>> behaviour, so I guess that we can exclude any problem with the cube we
>>>> are using.
>>>>
>>>> To better explain the problem, this is the same query, but not using
>>>> the set:
>>>>
>>>> SELECT
>>>> NON EMPTY TopCount([Customers].[Name].Members, 5,
>>>> [Measures].[Unit Sales]) * [Measures].[Unit Sales] on 0
>>>> FROM [Sales]
>>>> WHERE [Time].[1997].[Q1].[1]:[Time].[1997].[Q4].[10]
>>>>
>>>> This query produces the following results:
>>>>
>>>> USA
>>>> WA
>>>> Spokane
>>>> Mary Francis Benigar James Horvat Matt Bellah Ida Rodriguez
>>>> Kristin Miller
>>>> Unit Sales Unit Sales Unit Sales Unit Sales
>>>> Unit Sales
>>>> ==================== ============ =========== =============
>>>> ==============
>>>> 422 369 363 344 323
>>>>
>>>>
>>>> This query is equivalent to the following one, which moves the
>>>> TopCount to the set [TopCount]:
>>>>
>>>> WITH
>>>> SET [Top Count]
>>>> AS { TopCount([Customers].[Name].Members, 5, [Measures].[Unit
>>>> Sales]) }
>>>>
>>>> SELECT
>>>> NON EMPTY [Top Count] * [Measures].[Unit Sales] on 0
>>>> FROM [Sales]
>>>> WHERE [Time].[1997].[Q1].[1]:[Time].[1997].[Q4].[10]
>>>>
>>>>
>>>> Although this query is equivalent to the first one, it produces
>>>> different results:
>>>>
>>>> USA
>>>> WA
>>>> Olympia Bremerton Puyallup Spokane
>>>> Patricia Gervasi Richard Vansdal Tawnya Ancheta Edna Woodson
>>>> Ida Rodriguez
>>>> Unit Sales Unit Sales Unit Sales Unit Sales
>>>> Unit Sales
>>>> ================ =============== ============== ============
>>>> =============
>>>> 226 156 220 243 344
>>>>
>>>> The funny thing, is that this behaviour is only seen in some cases,
>>>> which is probably related to the data in the cube.
>>>>
>>>> Is this the behaviour to be expected from the the last query? If so,
>>>> it doesn’t makes any sense, since the statement in the WHERE clause
>>>> should be propagated to the named sets, thus resulting in the top 5
>>>> customers for the given time interval.
>>>>
>>>> Could this be a mondrian bug?
>>>>
>>>> Has anyone encountered this problem before?
>>>>
>>>> Best regards,
>>>> Pedro
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Mondrian mailing list
>>>> Mondrian at pentaho.org
>>>> http://lists.pentaho.org/mailman/listinfo/mondrian
>>> _______________________________________________
>>> Mondrian mailing list
>>> Mondrian at pentaho.org
>>> http://lists.pentaho.org/mailman/listinfo/mondrian
>> _______________________________________________
>> Mondrian mailing list
>> Mondrian at pentaho.org
>> http://lists.pentaho.org/mailman/listinfo/mondrian
>>
>
>
More information about the Mondrian
mailing list