[Mondrian] Non Empty not respected with aggregated tables when sets are used

Diethard Steiner diethard.steiner at gmail.com
Fri Oct 3 10:44:26 EDT 2014


Good afternoon everyone from a sunny London! 

I came across an interesting problem today:

There seems to be an issue in regards to handling NON Empty in one of the latest Saiku builds with Mondrian 3.x (well, I haven’t tested this with anything else yet). The problem manifests itself only when using aggregated tables.

The Schema:

Global Date Dim:

    <Dimension name="Date" type="TimeDimension" visible="true" highCardinality="true">
        <Hierarchy name="Date" visible="true" hasAll="true" allMemberName="Total" allMemberCaption="Total" primaryKey="date_tk">
            <Table name="dim_date" schema="common"/>
            <Level name="Date" column="date_digits" type="Date" uniqueMembers="true" levelType="TimeDays" hideMemberIf="Never" visible="true" />
        </Hierarchy> 
        <Hierarchy name="Monthly Calendar" caption="Monthly Calendar" visible="true" hasAll="true" allMemberName="Total" allMemberCaption="Total" primaryKey="date_tk">
            <Table name="dim_date" schema="common"/>
            <Level 
                name="Quarter" 
                column="year_quarter_int"
                type="Integer"  
                captionColumn="year_quarter"
                nameColumn="quarter_number"
                ordinalColumn="year_quarter_int"  
                uniqueMembers="false" levelType="TimeQuarters" hideMemberIf="Never" visible="true" />
            <Level name="Month" column="year_month_int" type="Integer" nameColumn="year_month" ordinalColumn="year_month_int" uniqueMembers="false" levelType="TimeMonths" hideMemberIf="Never" visible="true" />
            <Level name="Day of the Month" column="date_tk" type="Integer" nameColumn="day_in_month" ordinalColumn="date_tk" uniqueMembers="true" levelType="TimeDays" hideMemberIf="Never" visible="true" />
        </Hierarchy>	
        <Hierarchy name="Weekly Calendar" caption="Weekly Calendar" visible="true" hasAll="true" allMemberName="Total" allMemberCaption="Total" primaryKey="date_tk">
            <Table name="dim_date" schema="common"/>
            <Level name="Year" column="year_week_based" type="Integer" uniqueMembers="false" levelType="TimeYears" hideMemberIf="Never" visible="true" />
            <Level name="Week" column="year_week_int" type="String" nameColumn="year_week"  ordinalColumn="year_week_int" uniqueMembers="false" levelType="TimeWeeks" hideMemberIf="Never" visible="true" />
            <Level name="Day of the Week" column="date_tk" type="Integer" nameColumn="day_in_week" ordinalColumn="date_tk" uniqueMembers="true" levelType="TimeDays" hideMemberIf="Never" visible="true" />
        </Hierarchy>        
    </Dimension>

The Cube has the monthly aggr table referenced like this:

       <AggName name="agg_monthly_fact_cdr" ignorecase="true">
          <AggFactCount column="fact_count">
          </AggFactCount>
          <AggForeignKey factColumn="service_tk" aggColumn="service_tk">
          </AggForeignKey>
	[ … many more … ]
          <AggMeasure column="calls" name="[Measures].[CDR Count]">
          </AggMeasure>
	[ … many more … ]
          <AggLevel column="startdate_year" name="[Start Date.Monthly Calendar].[Year]" collapsed="true"></AggLevel>
          <AggLevel column="startdate_quarter_number" name="[Start Date.Monthly Calendar].[Quarter]" collapsed="true"></AggLevel><!— holds year_quarter_int value —>
          <AggLevel column="startdate_month_number" name="[Start Date.Monthly Calendar].[Month]" collapsed="true"></AggLevel><!— holds year_month_int value —>
      </AggName>

EXAMPLE
========

Both cases below should in theory return the same result set - the query was slightly changed for each case.

CASE 1
-----------

Saiku creates following MDX query:
WITH
SET [~ROWS] AS
    {[Start Date.Monthly Calendar].[Month].Members}
SELECT
NON EMPTY {[Measures].[CDR Count]} ON COLUMNS,
NON EMPTY [~ROWS] ON ROWS
FROM [CDR]


Mondrian log:

2014-10-03 15:13:45,894 DEBUG [mondrian.mdx] 4745: with set [~ROWS] as '{[Start Date.Monthly Calendar].[Month].Members}'
select NON EMPTY {[Measures].[CDR Count]} ON COLUMNS,
  NON EMPTY [~ROWS] ON ROWS
from [CDR]

2014-10-03 15:13:45,939 DEBUG [mondrian.sql] 325: HighCardSqlTupleReader.readTuples [[Date.Monthly Calendar].[Month]]: executing sql [select "dim_date"."year4" as "c0", "dim_date"."year_quarter_int" as "c1", "dim_date"."year_quarter" as "c2", "dim_date"."quarter_number" as "c3", "dim_date"."year_month_int" as "c4", "dim_date"."year_month" as "c5" from "common"."dim_date" as "dim_date" group by "dim_date"."year4", "dim_date"."year_quarter_int", "dim_date"."year_quarter", "dim_date"."quarter_number", "dim_date"."year_month_int", "dim_date"."year_month" order by "dim_date"."year4" ASC NULLS LAST, "dim_date"."year_quarter_int" ASC NULLS LAST, "dim_date"."year_month_int" ASC NULLS LAST]
2014-10-03 15:13:46,044 DEBUG [mondrian.sql] 325: , exec 104 ms
2014-10-03 15:13:46,053 DEBUG [mondrian.sql] 325: , exec+fetch 114 ms, 241 rows
2014-10-03 15:13:46,096 DEBUG [mondrian.sql] 326: HighCardSqlTupleReader.readTuples [[Date.Monthly Calendar].[Month]]: executing sql [select "dim_date"."year4" as "c0", "dim_date"."year_quarter_int" as "c1", "dim_date"."year_quarter" as "c2", "dim_date"."quarter_number" as "c3", "dim_date"."year_month_int" as "c4", "dim_date"."year_month" as "c5" from "common"."dim_date" as "dim_date" group by "dim_date"."year4", "dim_date"."year_quarter_int", "dim_date"."year_quarter", "dim_date"."quarter_number", "dim_date"."year_month_int", "dim_date"."year_month" order by "dim_date"."year4" ASC NULLS LAST, "dim_date"."year_quarter_int" ASC NULLS LAST, "dim_date"."year_month_int" ASC NULLS LAST]
2014-10-03 15:13:46,153 DEBUG [mondrian.sql] 326: , exec 57 ms
2014-10-03 15:13:46,162 DEBUG [mondrian.sql] 326: , exec+fetch 66 ms, 241 rows
2014-10-03 15:13:46,206 DEBUG [mondrian.sql] 327: SqlStatisticsProvider.getColumnCardinality: executing sql [select count(distinct "year4") from "common"."dim_date"]
2014-10-03 15:13:46,251 DEBUG [mondrian.sql] 327: , exec 45 ms
2014-10-03 15:13:46,251 DEBUG [mondrian.sql] 327: , exec+fetch 45 ms, 1 rows
2014-10-03 15:13:46,296 DEBUG [mondrian.sql] 328: SqlStatisticsProvider.getColumnCardinality: executing sql [select count(distinct "year_quarter_int") from "common"."dim_date"]
2014-10-03 15:13:46,345 DEBUG [mondrian.sql] 328: , exec 48 ms
2014-10-03 15:13:46,345 DEBUG [mondrian.sql] 328: , exec+fetch 49 ms, 1 rows
2014-10-03 15:13:46,393 DEBUG [mondrian.sql] 329: SqlStatisticsProvider.getColumnCardinality: executing sql [select count(distinct "year_month_int") from "common"."dim_date"]
2014-10-03 15:13:46,445 DEBUG [mondrian.sql] 329: , exec 52 ms
2014-10-03 15:13:46,445 DEBUG [mondrian.sql] 329: , exec+fetch 52 ms, 1 rows
2014-10-03 15:13:46,490 DEBUG [mondrian.sql] 330: Segment.load: executing sql [select "agg_monthly_fact_cdr"."startdate_year" as "c0", "agg_monthly_fact_cdr"."startdate_quarter_number" as "c1", "agg_monthly_fact_cdr"."startdate_month_number" as "c2", sum("agg_monthly_fact_cdr"."calls") as "m0" from "dma_cdr"."agg_monthly_fact_cdr" as "agg_monthly_fact_cdr" where "agg_monthly_fact_cdr"."startdate_year" = 2005 and "agg_monthly_fact_cdr"."startdate_quarter_number" = 200501 and "agg_monthly_fact_cdr"."startdate_month_number" = 200501 group by "agg_monthly_fact_cdr"."startdate_year", "agg_monthly_fact_cdr"."startdate_quarter_number", "agg_monthly_fact_cdr"."startdate_month_number”]

many more … last one loops through all year-quarter-month combinations from the date table

The strategy seems to be:
1) Get year - quarter - month combinations from dim date table [Not too sure why this is executed twice]
2) Get distinct year, quarter, month values from dim date table
3) Get measures from agg table for every year-quarter-month combination from the dim date table (so no matter if there are record for this period in the agg table or not)

Results are showing everything (so EMPTY and NON EMPTY).


CASE 2
-----------


Then I manually changed the Saiku MDX query to the following:

SELECT
NON EMPTY {[Measures].[CDR Count]} ON COLUMNS,
NON EMPTY  {[Start Date.Monthly Calendar].[Month].Members} ON ROWS
FROM [CDR]

The SQL Mondrian creates is more to what I would expect:

2014-10-03 14:58:27,352 DEBUG [mondrian.mdx] 3002: select NON EMPTY {[Measures].[CDR Count]} ON COLUMNS,
  NON EMPTY {[Start Date.Monthly Calendar].[Month].Members} ON ROWS
from [CDR]

2014-10-03 14:58:27,403 DEBUG [mondrian.sql] 286: HighCardSqlTupleReader.readTuples [[Start Date.Monthly Calendar].[Month]]: executing sql [select "agg_monthly_fact_cdr"."startdate_year" as "c0", "dim_date"."year_quarter_int" as "c1", "dim_date"."year_quarter" as "c2", "dim_date"."quarter_number" as "c3", "dim_date"."year_month_int" as "c4", "dim_date"."year_month" as "c5" from "dma_cdr"."agg_monthly_fact_cdr" as "agg_monthly_fact_cdr", "common"."dim_date" as "dim_date" where "dim_date"."year_quarter_int" = "agg_monthly_fact_cdr"."startdate_quarter_number" and "dim_date"."year_month_int" = "agg_monthly_fact_cdr"."startdate_month_number" group by "agg_monthly_fact_cdr"."startdate_year", "dim_date"."year_quarter_int", "dim_date"."year_quarter", "dim_date"."quarter_number", "dim_date"."year_month_int", "dim_date"."year_month" order by "agg_monthly_fact_cdr"."startdate_year" ASC NULLS LAST, "dim_date"."year_quarter_int" ASC NULLS LAST, "dim_date"."year_month_int" ASC NULLS LAST]
2014-10-03 14:58:43,759 DEBUG [mondrian.sql] 286: , exec 16355 ms
2014-10-03 14:58:43,759 DEBUG [mondrian.sql] 286: , exec+fetch 16356 ms, 10 rows
2014-10-03 14:58:43,802 DEBUG [mondrian.sql] 287: HighCardSqlTupleReader.readTuples [[Start Date.Monthly Calendar].[Month]]: executing sql [select "agg_monthly_fact_cdr"."startdate_year" as "c0", "dim_date"."year_quarter_int" as "c1", "dim_date"."year_quarter" as "c2", "dim_date"."quarter_number" as "c3", "dim_date"."year_month_int" as "c4", "dim_date"."year_month" as "c5" from "dma_cdr"."agg_monthly_fact_cdr" as "agg_monthly_fact_cdr", "common"."dim_date" as "dim_date" where "dim_date"."year_quarter_int" = "agg_monthly_fact_cdr"."startdate_quarter_number" and "dim_date"."year_month_int" = "agg_monthly_fact_cdr"."startdate_month_number" group by "agg_monthly_fact_cdr"."startdate_year", "dim_date"."year_quarter_int", "dim_date"."year_quarter", "dim_date"."quarter_number", "dim_date"."year_month_int", "dim_date"."year_month" order by "agg_monthly_fact_cdr"."startdate_year" ASC NULLS LAST, "dim_date"."year_quarter_int" ASC NULLS LAST, "dim_date"."year_month_int" ASC NULLS LAST]
2014-10-03 14:58:59,080 DEBUG [mondrian.sql] 287: , exec 15276 ms
2014-10-03 14:58:59,080 DEBUG [mondrian.sql] 287: , exec+fetch 15277 ms, 10 rows
2014-10-03 14:58:59,125 DEBUG [mondrian.sql] 288: SqlStatisticsProvider.getColumnCardinality: executing sql [select count(distinct "year4") from "common"."dim_date"]
2014-10-03 14:58:59,174 DEBUG [mondrian.sql] 288: , exec 48 ms
2014-10-03 14:58:59,174 DEBUG [mondrian.sql] 288: , exec+fetch 49 ms, 1 rows
2014-10-03 14:58:59,219 DEBUG [mondrian.sql] 289: SqlStatisticsProvider.getColumnCardinality: executing sql [select count(distinct "year_quarter_int") from "common"."dim_date"]
2014-10-03 14:58:59,269 DEBUG [mondrian.sql] 289: , exec 50 ms
2014-10-03 14:58:59,269 DEBUG [mondrian.sql] 289: , exec+fetch 50 ms, 1 rows
2014-10-03 14:58:59,315 DEBUG [mondrian.sql] 290: SqlStatisticsProvider.getColumnCardinality: executing sql [select count(distinct "year_month_int") from "common"."dim_date"]
2014-10-03 14:58:59,362 DEBUG [mondrian.sql] 290: , exec 47 ms
2014-10-03 14:58:59,363 DEBUG [mondrian.sql] 290: , exec+fetch 48 ms, 1 rows
2014-10-03 14:58:59,408 DEBUG [mondrian.sql] 291: Segment.load: executing sql [select "agg_monthly_fact_cdr"."startdate_year" as "c0", "agg_monthly_fact_cdr"."startdate_quarter_number" as "c1", "agg_monthly_fact_cdr"."startdate_month_number" as "c2", sum("agg_monthly_fact_cdr"."calls") as "m0" from "dma_cdr"."agg_monthly_fact_cdr" as "agg_monthly_fact_cdr" where "agg_monthly_fact_cdr"."startdate_year" = 2013 and "agg_monthly_fact_cdr"."startdate_quarter_number" = 201304 and "agg_monthly_fact_cdr"."startdate_month_number" = 201311 group by "agg_monthly_fact_cdr"."startdate_year", "agg_monthly_fact_cdr"."startdate_quarter_number", "agg_monthly_fact_cdr"."startdate_month_number"]
2014-10-03 14:58:59,458 DEBUG [mondrian.sql] 291: , exec 49 ms
2014-10-03 14:58:59,459 DEBUG [mondrian.sql] 291: , exec+fetch 50 ms, 1 rows
2014-10-03 14:58:59,459 DEBUG [mondrian.mdx] 3002: exec: 32107 ms
2014-10-03 14:58:59,505 DEBUG [mondrian.sql] 292: Segment.load: executing sql [select "agg_monthly_fact_cdr"."startdate_year" as "c0", "agg_monthly_fact_cdr"."startdate_quarter_number" as "c1", "agg_monthly_fact_cdr"."startdate_month_number" as "c2", sum("agg_monthly_fact_cdr"."calls") as "m0" from "dma_cdr"."agg_monthly_fact_cdr" as "agg_monthly_fact_cdr" where "agg_monthly_fact_cdr"."startdate_year" = 2014 and "agg_monthly_fact_cdr"."startdate_quarter_number" = 201401 and "agg_monthly_fact_cdr"."startdate_month_number" = 201402 group by "agg_monthly_fact_cdr"."startdate_year", "agg_monthly_fact_cdr"."startdate_quarter_number", "agg_monthly_fact_cdr"."startdate_month_number"]

So here the strategy seems to be:
1) Get the year-quarter-month combinations from AGG table (which do have a measure) [not quite sure though why this is done twice]
2) Get the unique year, quarter and month values from the dim date table
3) Loop over the agg table for each year-quarter-month combination and get the measure

And the result shows all NON EMPTY members - which is what I want.

So, why is this happening? Both queries should in theory return the same results, no?

Cheers,
Diddy


More information about the Mondrian mailing list