[Mondrian] Non Empty not respected with aggregated tables when sets are used

Diethard Steiner diethard.steiner at gmail.com
Fri Oct 3 13:11:47 EDT 2014


… in addition: as I was checking the other dim date references now in my olap schema I realised that for all the other cubes highCardinality was already set to false. So this problem might actually not be related to the agg tables (as originally stated).


On 3 Oct 2014, at 16:53, Matt Campbell <mcampbell at pentaho.com> wrote:

> 
> Hi Diethard, greetings from rainy Michigan.
> 
> I notice your <Dimension> below is named [Date], but the MDX shows [Start Date].  There also seems to be an inconsistency in the logged output.  Is there more than one Date dimension in your model, or did it just get renamed somewhere along the way?
> 
> I'm surprised both that you see empty rows for the first query, and also that you see a segment load query per Month.  I would expect a single segment load query against the agg table.
> 
> Which Saiku build are you seeing this with?  I'll see which version of Mondrian it's using and try to reproduce.
> 
> Out of curiosity, I wonder if you would see the same behavior if you set highCardinality="false" on the dim.
> 
> -matt
> 
> -----Original Message-----
> From: mondrian-bounces at pentaho.org [mailto:mondrian-bounces at pentaho.org] On Behalf Of Diethard Steiner
> Sent: Friday, October 03, 2014 10:44 AM
> To: Mondrian developer mailing list
> Subject: [Mondrian] Non Empty not respected with aggregated tables when sets are used
> 
> Good afternoon everyone from a sunny London! 
> 
> I came across an interesting problem today:
> 
> There seems to be an issue in regards to handling NON Empty in one of the latest Saiku builds with Mondrian 3.x (well, I haven't tested this with anything else yet). The problem manifests itself only when using aggregated tables.
> 
> The Schema:
> 
> Global Date Dim:
> 
>    <Dimension name="Date" type="TimeDimension" visible="true" highCardinality="true">
>        <Hierarchy name="Date" visible="true" hasAll="true" allMemberName="Total" allMemberCaption="Total" primaryKey="date_tk">
>            <Table name="dim_date" schema="common"/>
>            <Level name="Date" column="date_digits" type="Date" uniqueMembers="true" levelType="TimeDays" hideMemberIf="Never" visible="true" />
>        </Hierarchy> 
>        <Hierarchy name="Monthly Calendar" caption="Monthly Calendar" visible="true" hasAll="true" allMemberName="Total" allMemberCaption="Total" primaryKey="date_tk">
>            <Table name="dim_date" schema="common"/>
>            <Level 
>                name="Quarter" 
>                column="year_quarter_int"
>                type="Integer"  
>                captionColumn="year_quarter"
>                nameColumn="quarter_number"
>                ordinalColumn="year_quarter_int"  
>                uniqueMembers="false" levelType="TimeQuarters" hideMemberIf="Never" visible="true" />
>            <Level name="Month" column="year_month_int" type="Integer" nameColumn="year_month" ordinalColumn="year_month_int" uniqueMembers="false" levelType="TimeMonths" hideMemberIf="Never" visible="true" />
>            <Level name="Day of the Month" column="date_tk" type="Integer" nameColumn="day_in_month" ordinalColumn="date_tk" uniqueMembers="true" levelType="TimeDays" hideMemberIf="Never" visible="true" />
>        </Hierarchy>	
>        <Hierarchy name="Weekly Calendar" caption="Weekly Calendar" visible="true" hasAll="true" allMemberName="Total" allMemberCaption="Total" primaryKey="date_tk">
>            <Table name="dim_date" schema="common"/>
>            <Level name="Year" column="year_week_based" type="Integer" uniqueMembers="false" levelType="TimeYears" hideMemberIf="Never" visible="true" />
>            <Level name="Week" column="year_week_int" type="String" nameColumn="year_week"  ordinalColumn="year_week_int" uniqueMembers="false" levelType="TimeWeeks" hideMemberIf="Never" visible="true" />
>            <Level name="Day of the Week" column="date_tk" type="Integer" nameColumn="day_in_week" ordinalColumn="date_tk" uniqueMembers="true" levelType="TimeDays" hideMemberIf="Never" visible="true" />
>        </Hierarchy>        
>    </Dimension>
> 
> The Cube has the monthly aggr table referenced like this:
> 
>       <AggName name="agg_monthly_fact_cdr" ignorecase="true">
>          <AggFactCount column="fact_count">
>          </AggFactCount>
>          <AggForeignKey factColumn="service_tk" aggColumn="service_tk">
>          </AggForeignKey>
> 	[ ... many more ... ]
>          <AggMeasure column="calls" name="[Measures].[CDR Count]">
>          </AggMeasure>
> 	[ ... many more ... ]
>          <AggLevel column="startdate_year" name="[Start Date.Monthly Calendar].[Year]" collapsed="true"></AggLevel>
>          <AggLevel column="startdate_quarter_number" name="[Start Date.Monthly Calendar].[Quarter]" collapsed="true"></AggLevel><!- holds year_quarter_int value ->
>          <AggLevel column="startdate_month_number" name="[Start Date.Monthly Calendar].[Month]" collapsed="true"></AggLevel><!- holds year_month_int value ->
>      </AggName>
> 
> EXAMPLE
> ========
> 
> Both cases below should in theory return the same result set - the query was slightly changed for each case.
> 
> CASE 1
> -----------
> 
> Saiku creates following MDX query:
> WITH
> SET [~ROWS] AS
>    {[Start Date.Monthly Calendar].[Month].Members} SELECT NON EMPTY {[Measures].[CDR Count]} ON COLUMNS, NON EMPTY [~ROWS] ON ROWS FROM [CDR]
> 
> 
> Mondrian log:
> 
> 2014-10-03 15:13:45,894 DEBUG [mondrian.mdx] 4745: with set [~ROWS] as '{[Start Date.Monthly Calendar].[Month].Members}'
> select NON EMPTY {[Measures].[CDR Count]} ON COLUMNS,
>  NON EMPTY [~ROWS] ON ROWS
> from [CDR]
> 
> 2014-10-03 15:13:45,939 DEBUG [mondrian.sql] 325: HighCardSqlTupleReader.readTuples [[Date.Monthly Calendar].[Month]]: executing sql [select "dim_date"."year4" as "c0", "dim_date"."year_quarter_int" as "c1", "dim_date"."year_quarter" as "c2", "dim_date"."quarter_number" as "c3", "dim_date"."year_month_int" as "c4", "dim_date"."year_month" as "c5" from "common"."dim_date" as "dim_date" group by "dim_date"."year4", "dim_date"."year_quarter_int", "dim_date"."year_quarter", "dim_date"."quarter_number", "dim_date"."year_month_int", "dim_date"."year_month" order by "dim_date"."year4" ASC NULLS LAST, "dim_date"."year_quarter_int" ASC NULLS LAST, "dim_date"."year_month_int" ASC NULLS LAST]
> 2014-10-03 15:13:46,044 DEBUG [mondrian.sql] 325: , exec 104 ms
> 2014-10-03 15:13:46,053 DEBUG [mondrian.sql] 325: , exec+fetch 114 ms, 241 rows
> 2014-10-03 15:13:46,096 DEBUG [mondrian.sql] 326: HighCardSqlTupleReader.readTuples [[Date.Monthly Calendar].[Month]]: executing sql [select "dim_date"."year4" as "c0", "dim_date"."year_quarter_int" as "c1", "dim_date"."year_quarter" as "c2", "dim_date"."quarter_number" as "c3", "dim_date"."year_month_int" as "c4", "dim_date"."year_month" as "c5" from "common"."dim_date" as "dim_date" group by "dim_date"."year4", "dim_date"."year_quarter_int", "dim_date"."year_quarter", "dim_date"."quarter_number", "dim_date"."year_month_int", "dim_date"."year_month" order by "dim_date"."year4" ASC NULLS LAST, "dim_date"."year_quarter_int" ASC NULLS LAST, "dim_date"."year_month_int" ASC NULLS LAST]
> 2014-10-03 15:13:46,153 DEBUG [mondrian.sql] 326: , exec 57 ms
> 2014-10-03 15:13:46,162 DEBUG [mondrian.sql] 326: , exec+fetch 66 ms, 241 rows
> 2014-10-03 15:13:46,206 DEBUG [mondrian.sql] 327: SqlStatisticsProvider.getColumnCardinality: executing sql [select count(distinct "year4") from "common"."dim_date"]
> 2014-10-03 15:13:46,251 DEBUG [mondrian.sql] 327: , exec 45 ms
> 2014-10-03 15:13:46,251 DEBUG [mondrian.sql] 327: , exec+fetch 45 ms, 1 rows
> 2014-10-03 15:13:46,296 DEBUG [mondrian.sql] 328: SqlStatisticsProvider.getColumnCardinality: executing sql [select count(distinct "year_quarter_int") from "common"."dim_date"]
> 2014-10-03 15:13:46,345 DEBUG [mondrian.sql] 328: , exec 48 ms
> 2014-10-03 15:13:46,345 DEBUG [mondrian.sql] 328: , exec+fetch 49 ms, 1 rows
> 2014-10-03 15:13:46,393 DEBUG [mondrian.sql] 329: SqlStatisticsProvider.getColumnCardinality: executing sql [select count(distinct "year_month_int") from "common"."dim_date"]
> 2014-10-03 15:13:46,445 DEBUG [mondrian.sql] 329: , exec 52 ms
> 2014-10-03 15:13:46,445 DEBUG [mondrian.sql] 329: , exec+fetch 52 ms, 1 rows
> 2014-10-03 15:13:46,490 DEBUG [mondrian.sql] 330: Segment.load: executing sql [select "agg_monthly_fact_cdr"."startdate_year" as "c0", "agg_monthly_fact_cdr"."startdate_quarter_number" as "c1", "agg_monthly_fact_cdr"."startdate_month_number" as "c2", sum("agg_monthly_fact_cdr"."calls") as "m0" from "dma_cdr"."agg_monthly_fact_cdr" as "agg_monthly_fact_cdr" where "agg_monthly_fact_cdr"."startdate_year" = 2005 and "agg_monthly_fact_cdr"."startdate_quarter_number" = 200501 and "agg_monthly_fact_cdr"."startdate_month_number" = 200501 group by "agg_monthly_fact_cdr"."startdate_year", "agg_monthly_fact_cdr"."startdate_quarter_number", "agg_monthly_fact_cdr"."startdate_month_number"]
> 
> many more ... last one loops through all year-quarter-month combinations from the date table
> 
> The strategy seems to be:
> 1) Get year - quarter - month combinations from dim date table [Not too sure why this is executed twice]
> 2) Get distinct year, quarter, month values from dim date table
> 3) Get measures from agg table for every year-quarter-month combination from the dim date table (so no matter if there are record for this period in the agg table or not)
> 
> Results are showing everything (so EMPTY and NON EMPTY).
> 
> 
> CASE 2
> -----------
> 
> 
> Then I manually changed the Saiku MDX query to the following:
> 
> SELECT
> NON EMPTY {[Measures].[CDR Count]} ON COLUMNS, NON EMPTY  {[Start Date.Monthly Calendar].[Month].Members} ON ROWS FROM [CDR]
> 
> The SQL Mondrian creates is more to what I would expect:
> 
> 2014-10-03 14:58:27,352 DEBUG [mondrian.mdx] 3002: select NON EMPTY {[Measures].[CDR Count]} ON COLUMNS,
>  NON EMPTY {[Start Date.Monthly Calendar].[Month].Members} ON ROWS from [CDR]
> 
> 2014-10-03 14:58:27,403 DEBUG [mondrian.sql] 286: HighCardSqlTupleReader.readTuples [[Start Date.Monthly Calendar].[Month]]: executing sql [select "agg_monthly_fact_cdr"."startdate_year" as "c0", "dim_date"."year_quarter_int" as "c1", "dim_date"."year_quarter" as "c2", "dim_date"."quarter_number" as "c3", "dim_date"."year_month_int" as "c4", "dim_date"."year_month" as "c5" from "dma_cdr"."agg_monthly_fact_cdr" as "agg_monthly_fact_cdr", "common"."dim_date" as "dim_date" where "dim_date"."year_quarter_int" = "agg_monthly_fact_cdr"."startdate_quarter_number" and "dim_date"."year_month_int" = "agg_monthly_fact_cdr"."startdate_month_number" group by "agg_monthly_fact_cdr"."startdate_year", "dim_date"."year_quarter_int", "dim_date"."year_quarter", "dim_date"."quarter_number", "dim_date"."year_month_int", "dim_date"."year_month" order by "agg_monthly_fact_cdr"."startdate_year" ASC NULLS LAST, "dim_date"."year_quarter_int" ASC NULLS LAST, "dim_date"."year_month_int" ASC NULLS LAST]
> 2014-10-03 14:58:43,759 DEBUG [mondrian.sql] 286: , exec 16355 ms
> 2014-10-03 14:58:43,759 DEBUG [mondrian.sql] 286: , exec+fetch 16356 ms, 10 rows
> 2014-10-03 14:58:43,802 DEBUG [mondrian.sql] 287: HighCardSqlTupleReader.readTuples [[Start Date.Monthly Calendar].[Month]]: executing sql [select "agg_monthly_fact_cdr"."startdate_year" as "c0", "dim_date"."year_quarter_int" as "c1", "dim_date"."year_quarter" as "c2", "dim_date"."quarter_number" as "c3", "dim_date"."year_month_int" as "c4", "dim_date"."year_month" as "c5" from "dma_cdr"."agg_monthly_fact_cdr" as "agg_monthly_fact_cdr", "common"."dim_date" as "dim_date" where "dim_date"."year_quarter_int" = "agg_monthly_fact_cdr"."startdate_quarter_number" and "dim_date"."year_month_int" = "agg_monthly_fact_cdr"."startdate_month_number" group by "agg_monthly_fact_cdr"."startdate_year", "dim_date"."year_quarter_int", "dim_date"."year_quarter", "dim_date"."quarter_number", "dim_date"."year_month_int", "dim_date"."year_month" order by "agg_monthly_fact_cdr"."startdate_year" ASC NULLS LAST, "dim_date"."year_quarter_int" ASC NULLS LAST, "dim_date"."year_month_int" ASC NULLS LAST]
> 2014-10-03 14:58:59,080 DEBUG [mondrian.sql] 287: , exec 15276 ms
> 2014-10-03 14:58:59,080 DEBUG [mondrian.sql] 287: , exec+fetch 15277 ms, 10 rows
> 2014-10-03 14:58:59,125 DEBUG [mondrian.sql] 288: SqlStatisticsProvider.getColumnCardinality: executing sql [select count(distinct "year4") from "common"."dim_date"]
> 2014-10-03 14:58:59,174 DEBUG [mondrian.sql] 288: , exec 48 ms
> 2014-10-03 14:58:59,174 DEBUG [mondrian.sql] 288: , exec+fetch 49 ms, 1 rows
> 2014-10-03 14:58:59,219 DEBUG [mondrian.sql] 289: SqlStatisticsProvider.getColumnCardinality: executing sql [select count(distinct "year_quarter_int") from "common"."dim_date"]
> 2014-10-03 14:58:59,269 DEBUG [mondrian.sql] 289: , exec 50 ms
> 2014-10-03 14:58:59,269 DEBUG [mondrian.sql] 289: , exec+fetch 50 ms, 1 rows
> 2014-10-03 14:58:59,315 DEBUG [mondrian.sql] 290: SqlStatisticsProvider.getColumnCardinality: executing sql [select count(distinct "year_month_int") from "common"."dim_date"]
> 2014-10-03 14:58:59,362 DEBUG [mondrian.sql] 290: , exec 47 ms
> 2014-10-03 14:58:59,363 DEBUG [mondrian.sql] 290: , exec+fetch 48 ms, 1 rows
> 2014-10-03 14:58:59,408 DEBUG [mondrian.sql] 291: Segment.load: executing sql [select "agg_monthly_fact_cdr"."startdate_year" as "c0", "agg_monthly_fact_cdr"."startdate_quarter_number" as "c1", "agg_monthly_fact_cdr"."startdate_month_number" as "c2", sum("agg_monthly_fact_cdr"."calls") as "m0" from "dma_cdr"."agg_monthly_fact_cdr" as "agg_monthly_fact_cdr" where "agg_monthly_fact_cdr"."startdate_year" = 2013 and "agg_monthly_fact_cdr"."startdate_quarter_number" = 201304 and "agg_monthly_fact_cdr"."startdate_month_number" = 201311 group by "agg_monthly_fact_cdr"."startdate_year", "agg_monthly_fact_cdr"."startdate_quarter_number", "agg_monthly_fact_cdr"."startdate_month_number"]
> 2014-10-03 14:58:59,458 DEBUG [mondrian.sql] 291: , exec 49 ms
> 2014-10-03 14:58:59,459 DEBUG [mondrian.sql] 291: , exec+fetch 50 ms, 1 rows
> 2014-10-03 14:58:59,459 DEBUG [mondrian.mdx] 3002: exec: 32107 ms
> 2014-10-03 14:58:59,505 DEBUG [mondrian.sql] 292: Segment.load: executing sql [select "agg_monthly_fact_cdr"."startdate_year" as "c0", "agg_monthly_fact_cdr"."startdate_quarter_number" as "c1", "agg_monthly_fact_cdr"."startdate_month_number" as "c2", sum("agg_monthly_fact_cdr"."calls") as "m0" from "dma_cdr"."agg_monthly_fact_cdr" as "agg_monthly_fact_cdr" where "agg_monthly_fact_cdr"."startdate_year" = 2014 and "agg_monthly_fact_cdr"."startdate_quarter_number" = 201401 and "agg_monthly_fact_cdr"."startdate_month_number" = 201402 group by "agg_monthly_fact_cdr"."startdate_year", "agg_monthly_fact_cdr"."startdate_quarter_number", "agg_monthly_fact_cdr"."startdate_month_number"]
> 
> So here the strategy seems to be:
> 1) Get the year-quarter-month combinations from AGG table (which do have a measure) [not quite sure though why this is done twice]
> 2) Get the unique year, quarter and month values from the dim date table
> 3) Loop over the agg table for each year-quarter-month combination and get the measure
> 
> And the result shows all NON EMPTY members - which is what I want.
> 
> So, why is this happening? Both queries should in theory return the same results, no?
> 
> Cheers,
> Diddy
> _______________________________________________
> Mondrian mailing list
> Mondrian at pentaho.org
> http://lists.pentaho.org/mailman/listinfo/mondrian
> _______________________________________________
> Mondrian mailing list
> Mondrian at pentaho.org
> http://lists.pentaho.org/mailman/listinfo/mondrian



More information about the Mondrian mailing list