[Mondrian] Non Empty not respected with aggregated tables when sets are used

Diethard Steiner diethard.steiner at gmail.com
Fri Oct 3 13:19:40 EDT 2014


Nope, they are gone now.

But in any case, I would say that the result set should be the same no matter if I set highCardinality to true or false, meaning NON EMPTY should be respected in both cases. I can create a jira case for this if you want.

On 3 Oct 2014, at 18:16, Matt Campbell <mcampbell at pentaho.com> wrote:

> 
> Do you still see the large number of segment load queries (one per month)?
> 
> 
> -----Original Message-----
> From: mondrian-bounces at pentaho.org [mailto:mondrian-bounces at pentaho.org] On Behalf Of Diethard Steiner
> Sent: Friday, October 03, 2014 12:40 PM
> To: Mondrian developer mailing list
> Subject: Re: [Mondrian] Non Empty not respected with aggregated tables when sets are used
> 
> Hi Matt,
> 
> Thanks for the reply!  
> 
> Now this is very interesting: setting highCardinality to false yields the expected result. Thanks a lot for this hint!
> 
> Should I log a jira case for this? 
> 
> The Mondrian version used is: 3.6.5
> Saiku: biserver plugin, Paul's fork (http://ci.analytical-labs.com/job/pstoellberger-saiku3/lastSuccessfulBuild/artifact/saiku-bi-platform-plugin/target/saiku-plugin-3.0-PSTOELLBERGER-SNAPSHOT.zip)
> 
> I have one global date dimension defined, and then in the individual cubes I reference it like this:
> <DimensionUsage source="Date" name="Start Date" caption="Start Date" visible="true" foreignKey="startdate_tk" highCardinality="false"/>
> 
> In another other cubes, which do not have agg tables, I did not see the original problem.
> 
> Cheers,
> Diethard
> 
> 
> On 3 Oct 2014, at 16:53, Matt Campbell <mcampbell at pentaho.com> wrote:
> 
>> 
>> Hi Diethard, greetings from rainy Michigan.
>> 
>> I notice your <Dimension> below is named [Date], but the MDX shows [Start Date].  There also seems to be an inconsistency in the logged output.  Is there more than one Date dimension in your model, or did it just get renamed somewhere along the way?
>> 
>> I'm surprised both that you see empty rows for the first query, and also that you see a segment load query per Month.  I would expect a single segment load query against the agg table.
>> 
>> Which Saiku build are you seeing this with?  I'll see which version of Mondrian it's using and try to reproduce.
>> 
>> Out of curiosity, I wonder if you would see the same behavior if you set highCardinality="false" on the dim.
>> 
>> -matt
>> 
>> -----Original Message-----
>> From: mondrian-bounces at pentaho.org 
>> [mailto:mondrian-bounces at pentaho.org] On Behalf Of Diethard Steiner
>> Sent: Friday, October 03, 2014 10:44 AM
>> To: Mondrian developer mailing list
>> Subject: [Mondrian] Non Empty not respected with aggregated tables 
>> when sets are used
>> 
>> Good afternoon everyone from a sunny London! 
>> 
>> I came across an interesting problem today:
>> 
>> There seems to be an issue in regards to handling NON Empty in one of the latest Saiku builds with Mondrian 3.x (well, I haven't tested this with anything else yet). The problem manifests itself only when using aggregated tables.
>> 
>> The Schema:
>> 
>> Global Date Dim:
>> 
>>   <Dimension name="Date" type="TimeDimension" visible="true" highCardinality="true">
>>       <Hierarchy name="Date" visible="true" hasAll="true" allMemberName="Total" allMemberCaption="Total" primaryKey="date_tk">
>>           <Table name="dim_date" schema="common"/>
>>           <Level name="Date" column="date_digits" type="Date" uniqueMembers="true" levelType="TimeDays" hideMemberIf="Never" visible="true" />
>>       </Hierarchy> 
>>       <Hierarchy name="Monthly Calendar" caption="Monthly Calendar" visible="true" hasAll="true" allMemberName="Total" allMemberCaption="Total" primaryKey="date_tk">
>>           <Table name="dim_date" schema="common"/>
>>           <Level 
>>               name="Quarter" 
>>               column="year_quarter_int"
>>               type="Integer"  
>>               captionColumn="year_quarter"
>>               nameColumn="quarter_number"
>>               ordinalColumn="year_quarter_int"  
>>               uniqueMembers="false" levelType="TimeQuarters" hideMemberIf="Never" visible="true" />
>>           <Level name="Month" column="year_month_int" type="Integer" nameColumn="year_month" ordinalColumn="year_month_int" uniqueMembers="false" levelType="TimeMonths" hideMemberIf="Never" visible="true" />
>>           <Level name="Day of the Month" column="date_tk" type="Integer" nameColumn="day_in_month" ordinalColumn="date_tk" uniqueMembers="true" levelType="TimeDays" hideMemberIf="Never" visible="true" />
>>       </Hierarchy>	
>>       <Hierarchy name="Weekly Calendar" caption="Weekly Calendar" visible="true" hasAll="true" allMemberName="Total" allMemberCaption="Total" primaryKey="date_tk">
>>           <Table name="dim_date" schema="common"/>
>>           <Level name="Year" column="year_week_based" type="Integer" uniqueMembers="false" levelType="TimeYears" hideMemberIf="Never" visible="true" />
>>           <Level name="Week" column="year_week_int" type="String" nameColumn="year_week"  ordinalColumn="year_week_int" uniqueMembers="false" levelType="TimeWeeks" hideMemberIf="Never" visible="true" />
>>           <Level name="Day of the Week" column="date_tk" type="Integer" nameColumn="day_in_week" ordinalColumn="date_tk" uniqueMembers="true" levelType="TimeDays" hideMemberIf="Never" visible="true" />
>>       </Hierarchy>        
>>   </Dimension>
>> 
>> The Cube has the monthly aggr table referenced like this:
>> 
>>      <AggName name="agg_monthly_fact_cdr" ignorecase="true">
>>         <AggFactCount column="fact_count">
>>         </AggFactCount>
>>         <AggForeignKey factColumn="service_tk" aggColumn="service_tk">
>>         </AggForeignKey>
>> 	[ ... many more ... ]
>>         <AggMeasure column="calls" name="[Measures].[CDR Count]">
>>         </AggMeasure>
>> 	[ ... many more ... ]
>>         <AggLevel column="startdate_year" name="[Start Date.Monthly Calendar].[Year]" collapsed="true"></AggLevel>
>>         <AggLevel column="startdate_quarter_number" name="[Start Date.Monthly Calendar].[Quarter]" collapsed="true"></AggLevel><!- holds year_quarter_int value ->
>>         <AggLevel column="startdate_month_number" name="[Start Date.Monthly Calendar].[Month]" collapsed="true"></AggLevel><!- holds year_month_int value ->
>>     </AggName>
>> 
>> EXAMPLE
>> ========
>> 
>> Both cases below should in theory return the same result set - the query was slightly changed for each case.
>> 
>> CASE 1
>> -----------
>> 
>> Saiku creates following MDX query:
>> WITH
>> SET [~ROWS] AS
>>   {[Start Date.Monthly Calendar].[Month].Members} SELECT NON EMPTY 
>> {[Measures].[CDR Count]} ON COLUMNS, NON EMPTY [~ROWS] ON ROWS FROM 
>> [CDR]
>> 
>> 
>> Mondrian log:
>> 
>> 2014-10-03 15:13:45,894 DEBUG [mondrian.mdx] 4745: with set [~ROWS] as '{[Start Date.Monthly Calendar].[Month].Members}'
>> select NON EMPTY {[Measures].[CDR Count]} ON COLUMNS,  NON EMPTY 
>> [~ROWS] ON ROWS from [CDR]
>> 
>> 2014-10-03 15:13:45,939 DEBUG [mondrian.sql] 325: 
>> HighCardSqlTupleReader.readTuples [[Date.Monthly Calendar].[Month]]: 
>> executing sql [select "dim_date"."year4" as "c0", 
>> "dim_date"."year_quarter_int" as "c1", "dim_date"."year_quarter" as 
>> "c2", "dim_date"."quarter_number" as "c3", "dim_date"."year_month_int" 
>> as "c4", "dim_date"."year_month" as "c5" from "common"."dim_date" as 
>> "dim_date" group by "dim_date"."year4", "dim_date"."year_quarter_int", 
>> "dim_date"."year_quarter", "dim_date"."quarter_number", 
>> "dim_date"."year_month_int", "dim_date"."year_month" order by 
>> "dim_date"."year4" ASC NULLS LAST, "dim_date"."year_quarter_int" ASC 
>> NULLS LAST, "dim_date"."year_month_int" ASC NULLS LAST]
>> 2014-10-03 15:13:46,044 DEBUG [mondrian.sql] 325: , exec 104 ms
>> 2014-10-03 15:13:46,053 DEBUG [mondrian.sql] 325: , exec+fetch 114 ms, 
>> 241 rows
>> 2014-10-03 15:13:46,096 DEBUG [mondrian.sql] 326: 
>> HighCardSqlTupleReader.readTuples [[Date.Monthly Calendar].[Month]]: 
>> executing sql [select "dim_date"."year4" as "c0", 
>> "dim_date"."year_quarter_int" as "c1", "dim_date"."year_quarter" as 
>> "c2", "dim_date"."quarter_number" as "c3", "dim_date"."year_month_int" 
>> as "c4", "dim_date"."year_month" as "c5" from "common"."dim_date" as 
>> "dim_date" group by "dim_date"."year4", "dim_date"."year_quarter_int", 
>> "dim_date"."year_quarter", "dim_date"."quarter_number", 
>> "dim_date"."year_month_int", "dim_date"."year_month" order by 
>> "dim_date"."year4" ASC NULLS LAST, "dim_date"."year_quarter_int" ASC 
>> NULLS LAST, "dim_date"."year_month_int" ASC NULLS LAST]
>> 2014-10-03 15:13:46,153 DEBUG [mondrian.sql] 326: , exec 57 ms
>> 2014-10-03 15:13:46,162 DEBUG [mondrian.sql] 326: , exec+fetch 66 ms, 
>> 241 rows
>> 2014-10-03 15:13:46,206 DEBUG [mondrian.sql] 327: 
>> SqlStatisticsProvider.getColumnCardinality: executing sql [select 
>> count(distinct "year4") from "common"."dim_date"]
>> 2014-10-03 15:13:46,251 DEBUG [mondrian.sql] 327: , exec 45 ms
>> 2014-10-03 15:13:46,251 DEBUG [mondrian.sql] 327: , exec+fetch 45 ms, 
>> 1 rows
>> 2014-10-03 15:13:46,296 DEBUG [mondrian.sql] 328: 
>> SqlStatisticsProvider.getColumnCardinality: executing sql [select 
>> count(distinct "year_quarter_int") from "common"."dim_date"]
>> 2014-10-03 15:13:46,345 DEBUG [mondrian.sql] 328: , exec 48 ms
>> 2014-10-03 15:13:46,345 DEBUG [mondrian.sql] 328: , exec+fetch 49 ms, 
>> 1 rows
>> 2014-10-03 15:13:46,393 DEBUG [mondrian.sql] 329: 
>> SqlStatisticsProvider.getColumnCardinality: executing sql [select 
>> count(distinct "year_month_int") from "common"."dim_date"]
>> 2014-10-03 15:13:46,445 DEBUG [mondrian.sql] 329: , exec 52 ms
>> 2014-10-03 15:13:46,445 DEBUG [mondrian.sql] 329: , exec+fetch 52 ms, 
>> 1 rows
>> 2014-10-03 15:13:46,490 DEBUG [mondrian.sql] 330: Segment.load: 
>> executing sql [select "agg_monthly_fact_cdr"."startdate_year" as "c0", 
>> "agg_monthly_fact_cdr"."startdate_quarter_number" as "c1", 
>> "agg_monthly_fact_cdr"."startdate_month_number" as "c2", 
>> sum("agg_monthly_fact_cdr"."calls") as "m0" from 
>> "dma_cdr"."agg_monthly_fact_cdr" as "agg_monthly_fact_cdr" where 
>> "agg_monthly_fact_cdr"."startdate_year" = 2005 and 
>> "agg_monthly_fact_cdr"."startdate_quarter_number" = 200501 and 
>> "agg_monthly_fact_cdr"."startdate_month_number" = 200501 group by 
>> "agg_monthly_fact_cdr"."startdate_year", 
>> "agg_monthly_fact_cdr"."startdate_quarter_number", 
>> "agg_monthly_fact_cdr"."startdate_month_number"]
>> 
>> many more ... last one loops through all year-quarter-month 
>> combinations from the date table
>> 
>> The strategy seems to be:
>> 1) Get year - quarter - month combinations from dim date table [Not 
>> too sure why this is executed twice]
>> 2) Get distinct year, quarter, month values from dim date table
>> 3) Get measures from agg table for every year-quarter-month 
>> combination from the dim date table (so no matter if there are record 
>> for this period in the agg table or not)
>> 
>> Results are showing everything (so EMPTY and NON EMPTY).
>> 
>> 
>> CASE 2
>> -----------
>> 
>> 
>> Then I manually changed the Saiku MDX query to the following:
>> 
>> SELECT
>> NON EMPTY {[Measures].[CDR Count]} ON COLUMNS, NON EMPTY  {[Start 
>> Date.Monthly Calendar].[Month].Members} ON ROWS FROM [CDR]
>> 
>> The SQL Mondrian creates is more to what I would expect:
>> 
>> 2014-10-03 14:58:27,352 DEBUG [mondrian.mdx] 3002: select NON EMPTY 
>> {[Measures].[CDR Count]} ON COLUMNS,  NON EMPTY {[Start Date.Monthly 
>> Calendar].[Month].Members} ON ROWS from [CDR]
>> 
>> 2014-10-03 14:58:27,403 DEBUG [mondrian.sql] 286: 
>> HighCardSqlTupleReader.readTuples [[Start Date.Monthly 
>> Calendar].[Month]]: executing sql [select 
>> "agg_monthly_fact_cdr"."startdate_year" as "c0", 
>> "dim_date"."year_quarter_int" as "c1", "dim_date"."year_quarter" as 
>> "c2", "dim_date"."quarter_number" as "c3", "dim_date"."year_month_int" 
>> as "c4", "dim_date"."year_month" as "c5" from 
>> "dma_cdr"."agg_monthly_fact_cdr" as "agg_monthly_fact_cdr", 
>> "common"."dim_date" as "dim_date" where "dim_date"."year_quarter_int" 
>> = "agg_monthly_fact_cdr"."startdate_quarter_number" and 
>> "dim_date"."year_month_int" = 
>> "agg_monthly_fact_cdr"."startdate_month_number" group by 
>> "agg_monthly_fact_cdr"."startdate_year", 
>> "dim_date"."year_quarter_int", "dim_date"."year_quarter", 
>> "dim_date"."quarter_number", "dim_date"."year_month_int", 
>> "dim_date"."year_month" order by 
>> "agg_monthly_fact_cdr"."startdate_year" ASC NULLS LAST, 
>> "dim_date"."year_quarter_int" ASC NULLS LAST, 
>> "dim_date"."year_month_int" ASC NULLS LAST]
>> 2014-10-03 14:58:43,759 DEBUG [mondrian.sql] 286: , exec 16355 ms
>> 2014-10-03 14:58:43,759 DEBUG [mondrian.sql] 286: , exec+fetch 16356 
>> ms, 10 rows
>> 2014-10-03 14:58:43,802 DEBUG [mondrian.sql] 287: 
>> HighCardSqlTupleReader.readTuples [[Start Date.Monthly 
>> Calendar].[Month]]: executing sql [select 
>> "agg_monthly_fact_cdr"."startdate_year" as "c0", 
>> "dim_date"."year_quarter_int" as "c1", "dim_date"."year_quarter" as 
>> "c2", "dim_date"."quarter_number" as "c3", "dim_date"."year_month_int" 
>> as "c4", "dim_date"."year_month" as "c5" from 
>> "dma_cdr"."agg_monthly_fact_cdr" as "agg_monthly_fact_cdr", 
>> "common"."dim_date" as "dim_date" where "dim_date"."year_quarter_int" 
>> = "agg_monthly_fact_cdr"."startdate_quarter_number" and 
>> "dim_date"."year_month_int" = 
>> "agg_monthly_fact_cdr"."startdate_month_number" group by 
>> "agg_monthly_fact_cdr"."startdate_year", 
>> "dim_date"."year_quarter_int", "dim_date"."year_quarter", 
>> "dim_date"."quarter_number", "dim_date"."year_month_int", 
>> "dim_date"."year_month" order by 
>> "agg_monthly_fact_cdr"."startdate_year" ASC NULLS LAST, 
>> "dim_date"."year_quarter_int" ASC NULLS LAST, 
>> "dim_date"."year_month_int" ASC NULLS LAST]
>> 2014-10-03 14:58:59,080 DEBUG [mondrian.sql] 287: , exec 15276 ms
>> 2014-10-03 14:58:59,080 DEBUG [mondrian.sql] 287: , exec+fetch 15277 
>> ms, 10 rows
>> 2014-10-03 14:58:59,125 DEBUG [mondrian.sql] 288: 
>> SqlStatisticsProvider.getColumnCardinality: executing sql [select 
>> count(distinct "year4") from "common"."dim_date"]
>> 2014-10-03 14:58:59,174 DEBUG [mondrian.sql] 288: , exec 48 ms
>> 2014-10-03 14:58:59,174 DEBUG [mondrian.sql] 288: , exec+fetch 49 ms, 
>> 1 rows
>> 2014-10-03 14:58:59,219 DEBUG [mondrian.sql] 289: 
>> SqlStatisticsProvider.getColumnCardinality: executing sql [select 
>> count(distinct "year_quarter_int") from "common"."dim_date"]
>> 2014-10-03 14:58:59,269 DEBUG [mondrian.sql] 289: , exec 50 ms
>> 2014-10-03 14:58:59,269 DEBUG [mondrian.sql] 289: , exec+fetch 50 ms, 
>> 1 rows
>> 2014-10-03 14:58:59,315 DEBUG [mondrian.sql] 290: 
>> SqlStatisticsProvider.getColumnCardinality: executing sql [select 
>> count(distinct "year_month_int") from "common"."dim_date"]
>> 2014-10-03 14:58:59,362 DEBUG [mondrian.sql] 290: , exec 47 ms
>> 2014-10-03 14:58:59,363 DEBUG [mondrian.sql] 290: , exec+fetch 48 ms, 
>> 1 rows
>> 2014-10-03 14:58:59,408 DEBUG [mondrian.sql] 291: Segment.load: 
>> executing sql [select "agg_monthly_fact_cdr"."startdate_year" as "c0", 
>> "agg_monthly_fact_cdr"."startdate_quarter_number" as "c1", 
>> "agg_monthly_fact_cdr"."startdate_month_number" as "c2", 
>> sum("agg_monthly_fact_cdr"."calls") as "m0" from 
>> "dma_cdr"."agg_monthly_fact_cdr" as "agg_monthly_fact_cdr" where 
>> "agg_monthly_fact_cdr"."startdate_year" = 2013 and 
>> "agg_monthly_fact_cdr"."startdate_quarter_number" = 201304 and 
>> "agg_monthly_fact_cdr"."startdate_month_number" = 201311 group by 
>> "agg_monthly_fact_cdr"."startdate_year", 
>> "agg_monthly_fact_cdr"."startdate_quarter_number", 
>> "agg_monthly_fact_cdr"."startdate_month_number"]
>> 2014-10-03 14:58:59,458 DEBUG [mondrian.sql] 291: , exec 49 ms
>> 2014-10-03 14:58:59,459 DEBUG [mondrian.sql] 291: , exec+fetch 50 ms, 
>> 1 rows
>> 2014-10-03 14:58:59,459 DEBUG [mondrian.mdx] 3002: exec: 32107 ms
>> 2014-10-03 14:58:59,505 DEBUG [mondrian.sql] 292: Segment.load: 
>> executing sql [select "agg_monthly_fact_cdr"."startdate_year" as "c0", 
>> "agg_monthly_fact_cdr"."startdate_quarter_number" as "c1", 
>> "agg_monthly_fact_cdr"."startdate_month_number" as "c2", 
>> sum("agg_monthly_fact_cdr"."calls") as "m0" from 
>> "dma_cdr"."agg_monthly_fact_cdr" as "agg_monthly_fact_cdr" where 
>> "agg_monthly_fact_cdr"."startdate_year" = 2014 and 
>> "agg_monthly_fact_cdr"."startdate_quarter_number" = 201401 and 
>> "agg_monthly_fact_cdr"."startdate_month_number" = 201402 group by 
>> "agg_monthly_fact_cdr"."startdate_year", 
>> "agg_monthly_fact_cdr"."startdate_quarter_number", 
>> "agg_monthly_fact_cdr"."startdate_month_number"]
>> 
>> So here the strategy seems to be:
>> 1) Get the year-quarter-month combinations from AGG table (which do 
>> have a measure) [not quite sure though why this is done twice]
>> 2) Get the unique year, quarter and month values from the dim date 
>> table
>> 3) Loop over the agg table for each year-quarter-month combination and 
>> get the measure
>> 
>> And the result shows all NON EMPTY members - which is what I want.
>> 
>> So, why is this happening? Both queries should in theory return the same results, no?
>> 
>> Cheers,
>> Diddy
>> _______________________________________________
>> Mondrian mailing list
>> Mondrian at pentaho.org
>> http://lists.pentaho.org/mailman/listinfo/mondrian
>> _______________________________________________
>> Mondrian mailing list
>> Mondrian at pentaho.org
>> http://lists.pentaho.org/mailman/listinfo/mondrian
> 
> _______________________________________________
> Mondrian mailing list
> Mondrian at pentaho.org
> http://lists.pentaho.org/mailman/listinfo/mondrian
> _______________________________________________
> Mondrian mailing list
> Mondrian at pentaho.org
> http://lists.pentaho.org/mailman/listinfo/mondrian



More information about the Mondrian mailing list