[Mondrian] looking for help regarding SQL generated by Mondrian

Julian Hyde julianhyde at speakeasy.net
Mon Jun 25 19:45:57 EDT 2007


 


  _____  

From: mondrian-bounces at pentaho.org [mailto:mondrian-bounces at pentaho.org]
On Behalf Of Reidmiller, Chuck
Sent: Monday, June 25, 2007 8:05 AM
To: mondrian at pentaho.org
Subject: [Mondrian] looking for help regarding SQL generated by Mondrian



Hi all,

 

I am looking for help to try to improve the performance of some of our
test queries by tweaking the SQL that Mondrian generates.  We have an
inventory test database with 10 million rows in the fact table that we
run a series of queries against.  Some of the queries, such as this one:

 

select {[Cost Type].[current], [Cost Type].[planned]} *
{Tail([Time].[Year].Members,3)} ON COLUMNS,

         NON EMPTY Crossjoin({[Planner].CurrentMember.Children},
{[Part].[All Parts].[M].Children}) ON ROWS

         from Inventory

         where ([Measures].[Total Cost]);

 

have experienced a dramatic performance degradation after 2.2.2.  The
above query took approximately 2 minutes to execute against a PostgreSQL
database in 2.2.2, and now takes over 12 minutes with the current
development code base that I built on 6/22.  We cancelled it after about
30 minutes when running it with 2.3.2.  The SQL that is generated is
very different between the 3 versions of Mondrian, and voluminous with
2.3.2, but at the moment I am particularly interested in this SQL from
the current development code:

 

1) SqlTupleReader.readTuples [[Time].[Year]]: executing sql [select
"dates"."inv_year" as "c0" from "tow"."dates" as "dates" group by
"dates"."inv_year" order by "dates"."inv_year" ASC], exec 281 ms,
exec+fetch 297 ms, 1 rows

2) SqlTupleReader.readTuples [[Cost Type].[CT]]: executing sql [select
"costtype"."cost_type" as "c0" from "tow"."costtype" as "costtype" group
by "costtype"."cost_type" order by "costtype"."cost_type" ASC], exec 16
ms, exec+fetch 16 ms, 1 rows

3) SqlTupleReader.readTuples [[Time].[Year]]: executing sql [select
"time_by_day"."the_year" as "c0" from "time_by_day" as "time_by_day"
group by "time_by_day"."the_year" order by "time_by_day"."the_year"
ASC], exec 32 ms, exec+fetch 32 ms, 1 rows

4) SqlTupleReader.readTuples [[Time].[Year]]: executing sql [select
"time_by_day"."the_year" as "c0" from "time_by_day" as "time_by_day"
group by "time_by_day"."the_year" order by "time_by_day"."the_year"
ASC], exec 0 ms, exec+fetch 0 ms, 1 rows

5) SqlMemberSource.getMemberChildren: executing sql [select
"store"."store_country" as "c0" from "store" as "store" where
UPPER("store"."store_country") = UPPER('USA') group by
"store"."store_country" order by "store"."store_country" ASC], exec 47
ms, exec+fetch 47 ms, 1 rows

6) SqlMemberSource.getMemberChildren: executing sql [select
"store"."store_state" as "c0" from "store" as "store" where
"store"."store_country" = 'USA' and UPPER("store"."store_state") =
UPPER('CA') group by "store"."store_state" order by
"store"."store_state" ASC], exec 16 ms, exec+fetch 16 ms, 1 rows

7) SqlMemberSource.getMemberChildren: executing sql [select
"store"."store_city" as "c0" from "store" as "store" where
"store"."store_state" = 'CA' and UPPER("store"."store_city") =
UPPER('Los Angeles') group by "store"."store_city" order by
"store"."store_city" ASC], exec 0 ms, exec+fetch 0 ms, 1 rows

8) SqlMemberSource.getMemberChildren: executing sql [select
"customer"."country" as "c0" from "customer" as "customer" where
UPPER("customer"."country") = UPPER('USA') group by "customer"."country"
order by "customer"."country" ASC], exec 110 ms, exec+fetch 110 ms, 1
rows

9) SqlMemberSource.getMemberChildren: executing sql [select
"customer"."state_province" as "c0" from "customer" as "customer" where
"customer"."country" = 'USA' and UPPER("customer"."state_province") =
UPPER('CA') group by "customer"."state_province" order by
"customer"."state_province" ASC], exec 0 ms, exec+fetch 0 ms, 1 rows

10) SqlMemberSource.getMemberChildren: executing sql [select
"customer"."city" as "c0" from "customer" as "customer" where
"customer"."state_province" = 'CA' and UPPER("customer"."city") =
UPPER('Los Angeles') group by "customer"."city" order by
"customer"."city" ASC], exec 16 ms, exec+fetch 16 ms, 1 rows

11) SqlTupleReader.readTuples [[Cost Type].[CT]]: executing sql [select
"costtype"."cost_type" as "c0" from "tow"."costtype" as "costtype" group
by "costtype"."cost_type" order by "costtype"."cost_type" ASC], exec 0
ms, exec+fetch 0 ms, 1 rows

12) SqlMemberSource.getMemberChildren: executing sql [select
"part"."part_type" as "c0" from "tow"."part" as "part" where
UPPER("part"."part_type") = UPPER('M') group by "part"."part_type" order
by "part"."part_type" ASC], exec 47 ms, exec+fetch 47 ms, 1 rows

13) SqlTupleReader.readTuples [[Time].[Year]]: executing sql [select
"dates"."inv_year" as "c0" from "tow"."dates" as "dates" group by
"dates"."inv_year" order by "dates"."inv_year" ASC], exec 15 ms,
exec+fetch 15 ms, 1 rows

14) SqlMemberSource.getMemberChildren: executing sql [select
"planner"."planner_id" as "c0" from "tow"."planner" as "planner",
"tow"."invfacts" as "invfacts" where "invfacts"."planner_k" =
"planner"."planner_k" group by "planner"."planner_id" order by
"planner"."planner_id" ASC], exec 61798 ms, exec+fetch 61798 ms, 13 rows

15) SqlMemberSource.getMemberChildren: executing sql [select
"part"."part_nbr" as "c0" from "tow"."part" as "part", "tow"."invfacts"
as "invfacts" where "part"."part_type" = 'M' and "invfacts"."part_k" =
"part"."part_k" group by "part"."part_nbr" order by "part"."part_nbr"
ASC], exec 61766 ms, exec+fetch 61766 ms, 14 rows

16) RolapStar.Column.getCardinality: executing sql [select
count(distinct "dates"."inv_year") as "c0" from "tow"."dates" as
"dates"], exec 156 ms, exec+fetch 156 ms, 1 rows

17) RolapStar.Column.getCardinality: executing sql [select
count(distinct "part"."part_nbr") as "c0" from "tow"."part" as "part"],
exec 16 ms, exec+fetch 16 ms, 1 rows

18) RolapStar.Column.getCardinality: executing sql [select
count(distinct "costtype"."cost_type") as "c0" from "tow"."costtype" as
"costtype"], exec 16 ms, exec+fetch 16 ms, 1 rows

19) Segment.load: executing sql [select "dates"."inv_year" as "c0",
"part"."part_nbr" as "c1", "costtype"."cost_type" as "c2",
sum("invfacts"."a_total_cost") as "m0" from "tow"."dates" as "dates",
"tow"."invfacts" as "invfacts", "tow"."part" as "part", "tow"."costtype"
as "costtype" where "invfacts"."date_k" = "dates"."date_k" and
"invfacts"."part_k" = "part"."part_k" and "part"."part_nbr" in ('AXL-1',
'BED-1', 'CM15', 'DG-4', 'GEARBOX-3', 'HD-HOIST', 'HGB-3', 'HOOK',
'L-3', 'RG-5', 'SERVICE-KIT', 'SPR-4', 'WGN-2', 'WHL-1') and
"invfacts"."costtype_k" = "costtype"."costtype_k" group by
"dates"."inv_year", "part"."part_nbr", "costtype"."cost_type"], exec
61970 ms, exec+fetch 62251 ms, 4368 rows

20) RolapStar.Column.getCardinality: executing sql [select
count(distinct "planner"."planner_id") as "c0" from "tow"."planner" as
"planner"], exec 78 ms, exec+fetch 78 ms, 1 rows

21) Segment.load: executing sql [select "dates"."inv_year" as "c0",
"planner"."planner_id" as "c1", "costtype"."cost_type" as "c2",
sum("invfacts"."a_total_cost") as "m0" from "tow"."dates" as "dates",
"tow"."invfacts" as "invfacts", "tow"."planner" as "planner",
"tow"."costtype" as "costtype" where "invfacts"."date_k" =
"dates"."date_k" and "invfacts"."planner_k" = "planner"."planner_k" and
"invfacts"."costtype_k" = "costtype"."costtype_k" group by
"dates"."inv_year", "planner"."planner_id", "costtype"."cost_type"],
exec 417396 ms, exec+fetch 417490 ms, 4056 rows

22) SqlMemberSource.getMemberChildren: executing sql [select
"planner"."planner_id" as "c0" from "tow"."planner" as "planner",
"tow"."invfacts" as "invfacts" where "invfacts"."planner_k" =
"planner"."planner_k" group by "planner"."planner_id" order by
"planner"."planner_id" ASC], exec 61845 ms, exec+fetch 61845 ms, 13 rows

23) SqlMemberSource.getMemberChildren: executing sql [select
"part"."part_nbr" as "c0" from "tow"."part" as "part", "tow"."invfacts"
as "invfacts" where "part"."part_type" = 'M' and "invfacts"."part_k" =
"part"."part_k" group by "part"."part_nbr" order by "part"."part_nbr"
ASC], exec 61657 ms, exec+fetch 61657 ms, 14 rows

24) SqlMemberSource.getMemberChildren: executing sql [select
"dates"."inv_month" as "c0" from "tow"."dates" as "dates" where
"dates"."inv_year" = 2005 group by "dates"."inv_month" order by
"dates"."inv_month" ASC], exec 172 ms, exec+fetch 172 ms, 12 rows

25) SqlTupleReader.readTuples [[Time].[Week]]: executing sql [select
"dates"."inv_year" as "c0", "dates"."inv_month" as "c1",
"dates"."inv_week" as "c2" from "tow"."dates" as "dates" where
"dates"."inv_month" in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12) and
"dates"."inv_year" = 2005 group by "dates"."inv_year",
"dates"."inv_month", "dates"."inv_week" order by "dates"."inv_year" ASC,
"dates"."inv_month" ASC, "dates"."inv_week" ASC], exec 32 ms, exec+fetch
32 ms, 1 rows

26) SqlTupleReader.readTuples [[Time].[Day]]: executing sql [select
"dates"."inv_year" as "c0", "dates"."inv_month" as "c1",
"dates"."inv_week" as "c2", "dates"."inv_date" as "c3" from
"tow"."dates" as "dates" where "dates"."inv_week" in (1, 2, 3, 4, 5, 5,
6, 7, 8, 9, 9, 10, 11, 12, 13, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
22, 23, 24, 25, 26, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 35, 36, 37,
38, 39, 40, 41, 42, 43, 44, 44, 45, 46, 47, 48, 48, 49, 50, 51, 52) and
"dates"."inv_month" in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12) and
"dates"."inv_year" = 2005 group by "dates"."inv_year",
"dates"."inv_month", "dates"."inv_week", "dates"."inv_date" order by
"dates"."inv_year" ASC, "dates"."inv_month" ASC, "dates"."inv_week" ASC,
"dates"."inv_date" ASC], exec 109 ms, exec+fetch 156 ms, 1 rows

27) SqlMemberSource.getMemberChildren: executing sql [select
"dates"."inv_month" as "c0" from "tow"."dates" as "dates" where
"dates"."inv_year" = 2006 group by "dates"."inv_month" order by
"dates"."inv_month" ASC], exec 0 ms, exec+fetch 0 ms, 5 rows

28) SqlTupleReader.readTuples [[Time].[Week]]: executing sql [select
"dates"."inv_year" as "c0", "dates"."inv_month" as "c1",
"dates"."inv_week" as "c2" from "tow"."dates" as "dates" where
"dates"."inv_month" in (1, 2, 3, 4, 5) and "dates"."inv_year" = 2006
group by "dates"."inv_year", "dates"."inv_month", "dates"."inv_week"
order by "dates"."inv_year" ASC, "dates"."inv_month" ASC,
"dates"."inv_week" ASC], exec 0 ms, exec+fetch 0 ms, 1 rows

29) SqlTupleReader.readTuples [[Time].[Day]]: executing sql [select
"dates"."inv_year" as "c0", "dates"."inv_month" as "c1",
"dates"."inv_week" as "c2", "dates"."inv_date" as "c3" from
"tow"."dates" as "dates" where "dates"."inv_week" in (1, 2, 3, 4, 5, 5,
6, 7, 8, 9, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20) and
"dates"."inv_month" in (1, 2, 3, 4, 5) and "dates"."inv_year" = 2006
group by "dates"."inv_year", "dates"."inv_month", "dates"."inv_week",
"dates"."inv_date" order by "dates"."inv_year" ASC, "dates"."inv_month"
ASC, "dates"."inv_week" ASC, "dates"."inv_date" ASC], exec 0 ms,
exec+fetch 16 ms, 1 rows

30) SqlMemberSource.getMemberChildren: executing sql [select
"dates"."inv_month" as "c0" from "tow"."dates" as "dates" where
"dates"."inv_year" = 2004 group by "dates"."inv_month" order by
"dates"."inv_month" ASC], exec 16 ms, exec+fetch 16 ms, 12 rows

31) SqlTupleReader.readTuples [[Time].[Week]]: executing sql [select
"dates"."inv_year" as "c0", "dates"."inv_month" as "c1",
"dates"."inv_week" as "c2" from "tow"."dates" as "dates" where
"dates"."inv_month" in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12) and
"dates"."inv_year" = 2004 group by "dates"."inv_year",
"dates"."inv_month", "dates"."inv_week" order by "dates"."inv_year" ASC,
"dates"."inv_month" ASC, "dates"."inv_week" ASC], exec 0 ms, exec+fetch
0 ms, 1 rows

32) SqlTupleReader.readTuples [[Time].[Day]]: executing sql [select
"dates"."inv_year" as "c0", "dates"."inv_month" as "c1",
"dates"."inv_week" as "c2", "dates"."inv_date" as "c3" from
"tow"."dates" as "dates" where "dates"."inv_week" in (1, 2, 3, 4, 5, 6,
7, 8, 9, 10, 11, 12, 13, 14, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 23,
24, 25, 26, 27, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 36, 37, 38, 39,
40, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 49, 50, 51, 52, 53) and
"dates"."inv_month" in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12) and
"dates"."inv_year" = 2004 group by "dates"."inv_year",
"dates"."inv_month", "dates"."inv_week", "dates"."inv_date" order by
"dates"."inv_year" ASC, "dates"."inv_month" ASC, "dates"."inv_week" ASC,
"dates"."inv_date" ASC], exec 0 ms, exec+fetch 47 ms, 1 rows

33) RolapStar.Column.getCardinality: executing sql [select
count(distinct "dates"."inv_date") as "c0" from "tow"."dates" as
"dates"], exec 47 ms, exec+fetch 47 ms, 1 rows

34) Segment.load: executing sql [select "dates"."inv_date" as "c0",
"part"."part_nbr" as "c1", "planner"."planner_id" as "c2",
"costtype"."cost_type" as "c3", sum("invfacts"."a_total_cost") as "m0"
from "tow"."dates" as "dates", "tow"."invfacts" as "invfacts",
"tow"."part" as "part", "tow"."planner" as "planner", "tow"."costtype"
as "costtype" where "invfacts"."date_k" = "dates"."date_k" and
"dates"."inv_date" in ('2004-12-31', '2005-12-30', '2006-05-16') and
"invfacts"."part_k" = "part"."part_k" and "part"."part_nbr" in ('AXL-1',
'BED-1', 'CM15', 'DG-4', 'GEARBOX-3', 'HD-HOIST', 'HGB-3', 'HOOK',
'L-3', 'RG-5', 'SERVICE-KIT', 'SPR-4', 'WGN-2', 'WHL-1') and
"invfacts"."planner_k" = "planner"."planner_k" and
("planner"."planner_id" in ('0002', '0003', '0004', '0008', '03', '12',
'2776', 'B1', 'B100', 'B200', 'P1', 'P100') or "planner"."planner_id" is
null) and "invfacts"."costtype_k" = "costtype"."costtype_k" group by
"dates"."inv_date", "part"."part_nbr", "planner"."planner_id",
"costtype"."cost_type"], exec 3281 ms, exec+fetch 3297 ms, 123 rows

 

I have numbered the output lines for ease of reference.  The select in
line 21 takes almost 7 minutes to complete.  Since the MDX query
specifies 2 cost types and 3 years of interest, would it be possible to
enhance Mondrian to generate this select with predicates for inv_year
and cost_type such as:

 

Segment.load: executing sql [select "dates"."inv_year" as "c0",
"planner"."planner_id" as "c1", "costtype"."cost_type" as "c2",
sum("invfacts"."a_total_cost") as "m0" from "tow"."dates" as "dates",
"tow"."invfacts" as "invfacts", "tow"."planner" as "planner",
"tow"."costtype" as "costtype" where "invfacts"."date_k" =
"dates"."date_k" and "dates"."inv_year" in ('2004', '2005', '2006') and
"invfacts"."planner_k" = "planner"."planner_k" and
"invfacts"."costtype_k" = "costtype"."costtype_k" and
"costtype"."cost_type" in ('current', 'planned') group by
"dates"."inv_year", "planner"."planner_id", "costtype"."cost_type"

 

Would that query execute much faster? Mondrian drops the kinds of
constraints you have added, on the assumption that neighboring data has
a reasonable chance of being accessed in a future query; and that if you
are accessing 80% of a table you are doing a full scan, and therefore
you might as well access 100% of the table.

 

Also. it seems like that query could really benefit from an aggregate
table. 4,056 rows should take ~1s to load. I don't know whether you had
one in mondrian-2.2.

 

If so, I'd appreciate some guidance on where in the code to focus my
effort.  Also, I am curious about the duplication of some SQL.  Line 4
is a repeat of line 3.  Line 22 is a repeat of line 14, and line 23 is a
repeat of line 15.  All 4 of these latter queries run about 1 minute, so
it would be a significant performance boost if this duplication were
avoided. 

 

I have no idea why query 4 is a duplicate of query 3. Worth some
investigation in the debugger.

 

I notice that the latter queries  14, 15, 22, 23 contain references to
the fact table, which is a clue that non-empty filtering is being
applied. The query is trying to find children of a given member which
are non-empty in a given context, as opposed to the children of a given
member, and this makes it more difficult to cache the results of the
query. Unfortunately, the context being generated into the query is
empty (i.e. the query is being executed with 'all' for all members) and
so the SQL doesn't contain any additional predicates. The inclusion of
the fact table may still be having a filtering effect, so it would be
incorrect to put the results of the query into the cache and say these
are the ONLY children of that member.

 

I recall that there is a scheme to cache results of non-empty (so-called
'native') queries. I don't know why that scheme isn't working here.

 

Julian

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20070625/8005ca4b/attachment.html 


More information about the Mondrian mailing list