[Mondrian] looking for help regarding SQL generated by Mondrian

Reidmiller, Chuck creidmiller at cincom.com
Tue Jun 26 17:15:19 EDT 2007


Julian,

 

I have a bit more information that may be of interest.  I found the spot
where constraints are dropped toward the end of
Aggregation.optimizeConstraints and commented out this line of the
method:

 

            newPredicates[j] = new LiteralStarPredicate(columns[j],
true);

 

I ran the MDX test query again and the slow SQL statement from line 21
performs much better, per this extract from the debug file:

 

Segment.load: executing sql [select "dates"."inv_year" as "c0",
"planner"."planner_id" as "c1", "costtype"."cost_type" as "c2",
sum("invfacts"."a_total_cost") as "m0" from "tow"."dates" as "dates",
"tow"."invfacts" as "invfacts", "tow"."planner" as "planner",
"tow"."costtype" as "costtype" where "invfacts"."date_k" =
"dates"."date_k" and "dates"."inv_year" in (1903, 1904, 1905, 1906,
1907, 1908, 1909, 1910, 1911, 1912, 1913, 1914, 1915, 1916, 1917, 1918,
1919, 1920, 1921, 1922, 1923, 1924, 1925, 1926, 1927, 1928, 1929, 1930,
1931, 1932, 1933, 1934, 1935, 1936, 1937, 1938, 1939, 1940, 1941, 1942,
1943, 1944, 1945, 1946, 1947, 1948, 1949, 1950, 1951, 1952, 1953, 1954,
1955, 1956, 1957, 1958, 1959, 1960, 1961, 1962, 1963, 1964, 1965, 1966,
1967, 1968, 1969, 1970, 1971, 1972, 1973, 1974, 1975, 1976, 1977, 1978,
1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990,
1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002,
2003, 2004, 2005, 2006) and "invfacts"."planner_k" =
"planner"."planner_k" and ("planner"."planner_id" in ('0002', '0003',
'0004', '0008', '03', '12', '2776', 'B1', 'B100', 'B200', 'P1', 'P100')
or "planner"."planner_id" is null) and "invfacts"."costtype_k" =
"costtype"."costtype_k" and ("costtype"."cost_type" in ('current',
'planned', 'standard') or "costtype"."cost_type" is null) group by
"dates"."inv_year", "planner"."planner_id", "costtype"."cost_type"],
exec 62173 ms, exec+fetch 62267 ms, 4056 rows

 

The optimizeConstraints method discards the predicate because it was
initialized with all 104 possible years and is therefore considered
unnecessary. Now I am wondering if there is a way to initialize or
update the predicate to reflect the 3 target years from the COLUMNS axis
of the MDX query.  Any suggestions would be most welcome.

 

Thanks, Chuck

 

________________________________

From: Reidmiller, Chuck 
Sent: Tuesday, June 26, 2007 10:44 AM
To: 'Mondrian developer mailing list'
Subject: RE: [Mondrian] looking for help regarding SQL generated by
Mondrian

 

Julian,

 

Thanks for the response.  I appreciate your assistance.  The addition of
the 2 predicates has a big impact, dropping the execution time for the
SQL from line 21 to just over 1 minute.  The performance boost is
primarily due to the predicate for inv_year, since the execution time is
also just over 1 minute with only the <and "dates"."inv_year" in
('2004', '2005', '2006'")> predicate.  If I run the modified SQL with
only the <and "costtype"."cost_type" in ('current', 'planned)>
predicate, the execution time is still much improved at about 2 minutes,
20 seconds.  Do you know where Mondrian is dropping these kind of
constraints? 

 

There are only 4 rows in the costtype table, but the dates table is much
larger at 26,882 rows.  The range of years in the dates table runs from
1903-2006, which admittedly is not very "real world".  We are not using
aggregate tables in this test database, and the tests were run against
the same database on the same Linux machine for each release of
Mondrian.

 

The below test MDX query runs in about 44 seconds if I remove NON EMPTY
for the rows. 

 

Regards, Chuck  

 

________________________________

From: mondrian-bounces at pentaho.org [mailto:mondrian-bounces at pentaho.org]
On Behalf Of Julian Hyde
Sent: Monday, June 25, 2007 7:46 PM
To: 'Mondrian developer mailing list'
Subject: RE: [Mondrian] looking for help regarding SQL generated by
Mondrian

 

 

	 

	
________________________________


	From: mondrian-bounces at pentaho.org
[mailto:mondrian-bounces at pentaho.org] On Behalf Of Reidmiller, Chuck
	Sent: Monday, June 25, 2007 8:05 AM
	To: mondrian at pentaho.org
	Subject: [Mondrian] looking for help regarding SQL generated by
Mondrian

	Hi all,

	 

	I am looking for help to try to improve the performance of some
of our test queries by tweaking the SQL that Mondrian generates.  We
have an inventory test database with 10 million rows in the fact table
that we run a series of queries against.  Some of the queries, such as
this one:

	 

	select {[Cost Type].[current], [Cost Type].[planned]} *
{Tail([Time].[Year].Members,3)} ON COLUMNS,

	         NON EMPTY Crossjoin({[Planner].CurrentMember.Children},
{[Part].[All Parts].[M].Children}) ON ROWS

	         from Inventory

	         where ([Measures].[Total Cost]);

	 

	have experienced a dramatic performance degradation after 2.2.2.
The above query took approximately 2 minutes to execute against a
PostgreSQL database in 2.2.2, and now takes over 12 minutes with the
current development code base that I built on 6/22.  We cancelled it
after about 30 minutes when running it with 2.3.2.  The SQL that is
generated is very different between the 3 versions of Mondrian, and
voluminous with 2.3.2, but at the moment I am particularly interested in
this SQL from the current development code:

	 

	1) SqlTupleReader.readTuples [[Time].[Year]]: executing sql
[select "dates"."inv_year" as "c0" from "tow"."dates" as "dates" group
by "dates"."inv_year" order by "dates"."inv_year" ASC], exec 281 ms,
exec+fetch 297 ms, 1 rows

	2) SqlTupleReader.readTuples [[Cost Type].[CT]]: executing sql
[select "costtype"."cost_type" as "c0" from "tow"."costtype" as
"costtype" group by "costtype"."cost_type" order by
"costtype"."cost_type" ASC], exec 16 ms, exec+fetch 16 ms, 1 rows

	3) SqlTupleReader.readTuples [[Time].[Year]]: executing sql
[select "time_by_day"."the_year" as "c0" from "time_by_day" as
"time_by_day" group by "time_by_day"."the_year" order by
"time_by_day"."the_year" ASC], exec 32 ms, exec+fetch 32 ms, 1 rows

	4) SqlTupleReader.readTuples [[Time].[Year]]: executing sql
[select "time_by_day"."the_year" as "c0" from "time_by_day" as
"time_by_day" group by "time_by_day"."the_year" order by
"time_by_day"."the_year" ASC], exec 0 ms, exec+fetch 0 ms, 1 rows

	5) SqlMemberSource.getMemberChildren: executing sql [select
"store"."store_country" as "c0" from "store" as "store" where
UPPER("store"."store_country") = UPPER('USA') group by
"store"."store_country" order by "store"."store_country" ASC], exec 47
ms, exec+fetch 47 ms, 1 rows

	6) SqlMemberSource.getMemberChildren: executing sql [select
"store"."store_state" as "c0" from "store" as "store" where
"store"."store_country" = 'USA' and UPPER("store"."store_state") =
UPPER('CA') group by "store"."store_state" order by
"store"."store_state" ASC], exec 16 ms, exec+fetch 16 ms, 1 rows

	7) SqlMemberSource.getMemberChildren: executing sql [select
"store"."store_city" as "c0" from "store" as "store" where
"store"."store_state" = 'CA' and UPPER("store"."store_city") =
UPPER('Los Angeles') group by "store"."store_city" order by
"store"."store_city" ASC], exec 0 ms, exec+fetch 0 ms, 1 rows

	8) SqlMemberSource.getMemberChildren: executing sql [select
"customer"."country" as "c0" from "customer" as "customer" where
UPPER("customer"."country") = UPPER('USA') group by "customer"."country"
order by "customer"."country" ASC], exec 110 ms, exec+fetch 110 ms, 1
rows

	9) SqlMemberSource.getMemberChildren: executing sql [select
"customer"."state_province" as "c0" from "customer" as "customer" where
"customer"."country" = 'USA' and UPPER("customer"."state_province") =
UPPER('CA') group by "customer"."state_province" order by
"customer"."state_province" ASC], exec 0 ms, exec+fetch 0 ms, 1 rows

	10) SqlMemberSource.getMemberChildren: executing sql [select
"customer"."city" as "c0" from "customer" as "customer" where
"customer"."state_province" = 'CA' and UPPER("customer"."city") =
UPPER('Los Angeles') group by "customer"."city" order by
"customer"."city" ASC], exec 16 ms, exec+fetch 16 ms, 1 rows

	11) SqlTupleReader.readTuples [[Cost Type].[CT]]: executing sql
[select "costtype"."cost_type" as "c0" from "tow"."costtype" as
"costtype" group by "costtype"."cost_type" order by
"costtype"."cost_type" ASC], exec 0 ms, exec+fetch 0 ms, 1 rows

	12) SqlMemberSource.getMemberChildren: executing sql [select
"part"."part_type" as "c0" from "tow"."part" as "part" where
UPPER("part"."part_type") = UPPER('M') group by "part"."part_type" order
by "part"."part_type" ASC], exec 47 ms, exec+fetch 47 ms, 1 rows

	13) SqlTupleReader.readTuples [[Time].[Year]]: executing sql
[select "dates"."inv_year" as "c0" from "tow"."dates" as "dates" group
by "dates"."inv_year" order by "dates"."inv_year" ASC], exec 15 ms,
exec+fetch 15 ms, 1 rows

	14) SqlMemberSource.getMemberChildren: executing sql [select
"planner"."planner_id" as "c0" from "tow"."planner" as "planner",
"tow"."invfacts" as "invfacts" where "invfacts"."planner_k" =
"planner"."planner_k" group by "planner"."planner_id" order by
"planner"."planner_id" ASC], exec 61798 ms, exec+fetch 61798 ms, 13 rows

	15) SqlMemberSource.getMemberChildren: executing sql [select
"part"."part_nbr" as "c0" from "tow"."part" as "part", "tow"."invfacts"
as "invfacts" where "part"."part_type" = 'M' and "invfacts"."part_k" =
"part"."part_k" group by "part"."part_nbr" order by "part"."part_nbr"
ASC], exec 61766 ms, exec+fetch 61766 ms, 14 rows

	16) RolapStar.Column.getCardinality: executing sql [select
count(distinct "dates"."inv_year") as "c0" from "tow"."dates" as
"dates"], exec 156 ms, exec+fetch 156 ms, 1 rows

	17) RolapStar.Column.getCardinality: executing sql [select
count(distinct "part"."part_nbr") as "c0" from "tow"."part" as "part"],
exec 16 ms, exec+fetch 16 ms, 1 rows

	18) RolapStar.Column.getCardinality: executing sql [select
count(distinct "costtype"."cost_type") as "c0" from "tow"."costtype" as
"costtype"], exec 16 ms, exec+fetch 16 ms, 1 rows

	19) Segment.load: executing sql [select "dates"."inv_year" as
"c0", "part"."part_nbr" as "c1", "costtype"."cost_type" as "c2",
sum("invfacts"."a_total_cost") as "m0" from "tow"."dates" as "dates",
"tow"."invfacts" as "invfacts", "tow"."part" as "part", "tow"."costtype"
as "costtype" where "invfacts"."date_k" = "dates"."date_k" and
"invfacts"."part_k" = "part"."part_k" and "part"."part_nbr" in ('AXL-1',
'BED-1', 'CM15', 'DG-4', 'GEARBOX-3', 'HD-HOIST', 'HGB-3', 'HOOK',
'L-3', 'RG-5', 'SERVICE-KIT', 'SPR-4', 'WGN-2', 'WHL-1') and
"invfacts"."costtype_k" = "costtype"."costtype_k" group by
"dates"."inv_year", "part"."part_nbr", "costtype"."cost_type"], exec
61970 ms, exec+fetch 62251 ms, 4368 rows

	20) RolapStar.Column.getCardinality: executing sql [select
count(distinct "planner"."planner_id") as "c0" from "tow"."planner" as
"planner"], exec 78 ms, exec+fetch 78 ms, 1 rows

	21) Segment.load: executing sql [select "dates"."inv_year" as
"c0", "planner"."planner_id" as "c1", "costtype"."cost_type" as "c2",
sum("invfacts"."a_total_cost") as "m0" from "tow"."dates" as "dates",
"tow"."invfacts" as "invfacts", "tow"."planner" as "planner",
"tow"."costtype" as "costtype" where "invfacts"."date_k" =
"dates"."date_k" and "invfacts"."planner_k" = "planner"."planner_k" and
"invfacts"."costtype_k" = "costtype"."costtype_k" group by
"dates"."inv_year", "planner"."planner_id", "costtype"."cost_type"],
exec 417396 ms, exec+fetch 417490 ms, 4056 rows

	22) SqlMemberSource.getMemberChildren: executing sql [select
"planner"."planner_id" as "c0" from "tow"."planner" as "planner",
"tow"."invfacts" as "invfacts" where "invfacts"."planner_k" =
"planner"."planner_k" group by "planner"."planner_id" order by
"planner"."planner_id" ASC], exec 61845 ms, exec+fetch 61845 ms, 13 rows

	23) SqlMemberSource.getMemberChildren: executing sql [select
"part"."part_nbr" as "c0" from "tow"."part" as "part", "tow"."invfacts"
as "invfacts" where "part"."part_type" = 'M' and "invfacts"."part_k" =
"part"."part_k" group by "part"."part_nbr" order by "part"."part_nbr"
ASC], exec 61657 ms, exec+fetch 61657 ms, 14 rows

	24) SqlMemberSource.getMemberChildren: executing sql [select
"dates"."inv_month" as "c0" from "tow"."dates" as "dates" where
"dates"."inv_year" = 2005 group by "dates"."inv_month" order by
"dates"."inv_month" ASC], exec 172 ms, exec+fetch 172 ms, 12 rows

	25) SqlTupleReader.readTuples [[Time].[Week]]: executing sql
[select "dates"."inv_year" as "c0", "dates"."inv_month" as "c1",
"dates"."inv_week" as "c2" from "tow"."dates" as "dates" where
"dates"."inv_month" in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12) and
"dates"."inv_year" = 2005 group by "dates"."inv_year",
"dates"."inv_month", "dates"."inv_week" order by "dates"."inv_year" ASC,
"dates"."inv_month" ASC, "dates"."inv_week" ASC], exec 32 ms, exec+fetch
32 ms, 1 rows

	26) SqlTupleReader.readTuples [[Time].[Day]]: executing sql
[select "dates"."inv_year" as "c0", "dates"."inv_month" as "c1",
"dates"."inv_week" as "c2", "dates"."inv_date" as "c3" from
"tow"."dates" as "dates" where "dates"."inv_week" in (1, 2, 3, 4, 5, 5,
6, 7, 8, 9, 9, 10, 11, 12, 13, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
22, 23, 24, 25, 26, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 35, 36, 37,
38, 39, 40, 41, 42, 43, 44, 44, 45, 46, 47, 48, 48, 49, 50, 51, 52) and
"dates"."inv_month" in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12) and
"dates"."inv_year" = 2005 group by "dates"."inv_year",
"dates"."inv_month", "dates"."inv_week", "dates"."inv_date" order by
"dates"."inv_year" ASC, "dates"."inv_month" ASC, "dates"."inv_week" ASC,
"dates"."inv_date" ASC], exec 109 ms, exec+fetch 156 ms, 1 rows

	27) SqlMemberSource.getMemberChildren: executing sql [select
"dates"."inv_month" as "c0" from "tow"."dates" as "dates" where
"dates"."inv_year" = 2006 group by "dates"."inv_month" order by
"dates"."inv_month" ASC], exec 0 ms, exec+fetch 0 ms, 5 rows

	28) SqlTupleReader.readTuples [[Time].[Week]]: executing sql
[select "dates"."inv_year" as "c0", "dates"."inv_month" as "c1",
"dates"."inv_week" as "c2" from "tow"."dates" as "dates" where
"dates"."inv_month" in (1, 2, 3, 4, 5) and "dates"."inv_year" = 2006
group by "dates"."inv_year", "dates"."inv_month", "dates"."inv_week"
order by "dates"."inv_year" ASC, "dates"."inv_month" ASC,
"dates"."inv_week" ASC], exec 0 ms, exec+fetch 0 ms, 1 rows

	29) SqlTupleReader.readTuples [[Time].[Day]]: executing sql
[select "dates"."inv_year" as "c0", "dates"."inv_month" as "c1",
"dates"."inv_week" as "c2", "dates"."inv_date" as "c3" from
"tow"."dates" as "dates" where "dates"."inv_week" in (1, 2, 3, 4, 5, 5,
6, 7, 8, 9, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20) and
"dates"."inv_month" in (1, 2, 3, 4, 5) and "dates"."inv_year" = 2006
group by "dates"."inv_year", "dates"."inv_month", "dates"."inv_week",
"dates"."inv_date" order by "dates"."inv_year" ASC, "dates"."inv_month"
ASC, "dates"."inv_week" ASC, "dates"."inv_date" ASC], exec 0 ms,
exec+fetch 16 ms, 1 rows

	30) SqlMemberSource.getMemberChildren: executing sql [select
"dates"."inv_month" as "c0" from "tow"."dates" as "dates" where
"dates"."inv_year" = 2004 group by "dates"."inv_month" order by
"dates"."inv_month" ASC], exec 16 ms, exec+fetch 16 ms, 12 rows

	31) SqlTupleReader.readTuples [[Time].[Week]]: executing sql
[select "dates"."inv_year" as "c0", "dates"."inv_month" as "c1",
"dates"."inv_week" as "c2" from "tow"."dates" as "dates" where
"dates"."inv_month" in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12) and
"dates"."inv_year" = 2004 group by "dates"."inv_year",
"dates"."inv_month", "dates"."inv_week" order by "dates"."inv_year" ASC,
"dates"."inv_month" ASC, "dates"."inv_week" ASC], exec 0 ms, exec+fetch
0 ms, 1 rows

	32) SqlTupleReader.readTuples [[Time].[Day]]: executing sql
[select "dates"."inv_year" as "c0", "dates"."inv_month" as "c1",
"dates"."inv_week" as "c2", "dates"."inv_date" as "c3" from
"tow"."dates" as "dates" where "dates"."inv_week" in (1, 2, 3, 4, 5, 6,
7, 8, 9, 10, 11, 12, 13, 14, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 23,
24, 25, 26, 27, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 36, 37, 38, 39,
40, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 49, 50, 51, 52, 53) and
"dates"."inv_month" in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12) and
"dates"."inv_year" = 2004 group by "dates"."inv_year",
"dates"."inv_month", "dates"."inv_week", "dates"."inv_date" order by
"dates"."inv_year" ASC, "dates"."inv_month" ASC, "dates"."inv_week" ASC,
"dates"."inv_date" ASC], exec 0 ms, exec+fetch 47 ms, 1 rows

	33) RolapStar.Column.getCardinality: executing sql [select
count(distinct "dates"."inv_date") as "c0" from "tow"."dates" as
"dates"], exec 47 ms, exec+fetch 47 ms, 1 rows

	34) Segment.load: executing sql [select "dates"."inv_date" as
"c0", "part"."part_nbr" as "c1", "planner"."planner_id" as "c2",
"costtype"."cost_type" as "c3", sum("invfacts"."a_total_cost") as "m0"
from "tow"."dates" as "dates", "tow"."invfacts" as "invfacts",
"tow"."part" as "part", "tow"."planner" as "planner", "tow"."costtype"
as "costtype" where "invfacts"."date_k" = "dates"."date_k" and
"dates"."inv_date" in ('2004-12-31', '2005-12-30', '2006-05-16') and
"invfacts"."part_k" = "part"."part_k" and "part"."part_nbr" in ('AXL-1',
'BED-1', 'CM15', 'DG-4', 'GEARBOX-3', 'HD-HOIST', 'HGB-3', 'HOOK',
'L-3', 'RG-5', 'SERVICE-KIT', 'SPR-4', 'WGN-2', 'WHL-1') and
"invfacts"."planner_k" = "planner"."planner_k" and
("planner"."planner_id" in ('0002', '0003', '0004', '0008', '03', '12',
'2776', 'B1', 'B100', 'B200', 'P1', 'P100') or "planner"."planner_id" is
null) and "invfacts"."costtype_k" = "costtype"."costtype_k" group by
"dates"."inv_date", "part"."part_nbr", "planner"."planner_id",
"costtype"."cost_type"], exec 3281 ms, exec+fetch 3297 ms, 123 rows

	 

	I have numbered the output lines for ease of reference.  The
select in line 21 takes almost 7 minutes to complete.  Since the MDX
query specifies 2 cost types and 3 years of interest, would it be
possible to enhance Mondrian to generate this select with predicates for
inv_year and cost_type such as:

	 

	Segment.load: executing sql [select "dates"."inv_year" as "c0",
"planner"."planner_id" as "c1", "costtype"."cost_type" as "c2",
sum("invfacts"."a_total_cost") as "m0" from "tow"."dates" as "dates",
"tow"."invfacts" as "invfacts", "tow"."planner" as "planner",
"tow"."costtype" as "costtype" where "invfacts"."date_k" =
"dates"."date_k" and "dates"."inv_year" in ('2004', '2005', '2006') and
"invfacts"."planner_k" = "planner"."planner_k" and
"invfacts"."costtype_k" = "costtype"."costtype_k" and
"costtype"."cost_type" in ('current', 'planned') group by
"dates"."inv_year", "planner"."planner_id", "costtype"."cost_type"

	 

Would that query execute much faster? Mondrian drops the kinds of
constraints you have added, on the assumption that neighboring data has
a reasonable chance of being accessed in a future query; and that if you
are accessing 80% of a table you are doing a full scan, and therefore
you might as well access 100% of the table.

 

Also. it seems like that query could really benefit from an aggregate
table. 4,056 rows should take ~1s to load. I don't know whether you had
one in mondrian-2.2.

	 

	If so, I'd appreciate some guidance on where in the code to
focus my effort.  Also, I am curious about the duplication of some SQL.
Line 4 is a repeat of line 3.  Line 22 is a repeat of line 14, and line
23 is a repeat of line 15.  All 4 of these latter queries run about 1
minute, so it would be a significant performance boost if this
duplication were avoided. 

	 

I have no idea why query 4 is a duplicate of query 3. Worth some
investigation in the debugger.

 

I notice that the latter queries  14, 15, 22, 23 contain references to
the fact table, which is a clue that non-empty filtering is being
applied. The query is trying to find children of a given member which
are non-empty in a given context, as opposed to the children of a given
member, and this makes it more difficult to cache the results of the
query. Unfortunately, the context being generated into the query is
empty (i.e. the query is being executed with 'all' for all members) and
so the SQL doesn't contain any additional predicates. The inclusion of
the fact table may still be having a filtering effect, so it would be
incorrect to put the results of the query into the cache and say these
are the ONLY children of that member.

 

I recall that there is a scheme to cache results of non-empty (so-called
'native') queries. I don't know why that scheme isn't working here.

 

Julian

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20070626/ba6a294d/attachment.html 


More information about the Mondrian mailing list