[Mondrian] RE: TPC-H

Julian Hyde jhyde at pentaho.org
Tue Mar 25 18:19:35 EDT 2008


Sherman,
 
Haven't heard anything about this for a while. The 3 main things I need are
a tpch data generator/loader, a mondrian schema based on the tpch schema,
and a set of mdx queries. Any progress on any of these?
 
I want to make a start on benchmarking, so if you don't have something to
check in soon I'm going to have to start building them myself.

Julian


  _____  

From: Julian Hyde [mailto:jhyde at pentaho.org] 
Sent: Saturday, December 22, 2007 12:06 PM
To: 'Mondrian developer mailing list'
Subject: RE: TPC-H


It would be very exciting an useful to include TPC-H alongside FoodMart as a
test dataset. First, it will give use testing coverage on a different kind
of schema; second, it will allow us to do some performance benchmarks,
because the dataset can easily be scaled up to larger data sizes. Lastly, it
could be the basis for performance regression tests (i.e. tests which detect
that we have gotten slower).
 
I'd like to replicate the foodmart pattern as far as possible:

*	

	demo/Tpch.xml schema file
*	

	testsrc/.../MondrianTpchLoader.java loader program
*	

	instructions in install.html describing how to load the data
*	

	tests in the junit regression suite which run if the database is
present
*	

	aggregate tables, created by the loader program and populated by a
SQL script (compare testsrc/main/mondrian/test/loader/insert.sql)

I think these steps could be done quite easily.
 
There are a few differences:

*	

	We don't require the TPC-H data set to be present in order for the
regression suite to run; if it is not present, tests will recuse themselves
(and trivially succeed)
*	

	The TPC-H benchmark allows the database to be created at different
scales (e.g. one with 100MB, one with 1GB). The tests should know which
scale of data they are running against. Some tests might be able to run on
any scale of the data, other tests might only run on one particular scale,
and recuse themselves (trivially succeed) if the desired data set is not
present.
*	

	Sherman proposes a variant to the TPC-H schema where the line_items
table is more like a fact table. I don't want to mess with the 'official'
table structure, so why not create an additional table LINE_ITEM_FACT? The
mondrian TPC-H schema could have cubes based on each line item table, to
compare.

The trickiest thing will be passing in the connect string to the test suite.
A lot of the mondrian test suite assumes that there is just one connection,
the connection to foodmart. It is reasonable to assume that tpch, if it
exists, is in the same database as foodmart, but we could not assume that it
is in the same schema. The foodmart connect string is variously read from
mondrian.properties, or read from build.properties and passed in when ant
invokes junit, or pieced together from other properties.
 
Julian



  _____  

From: mondrian-bounces at pentaho.org [mailto:mondrian-bounces at pentaho.org] On
Behalf Of Sherman Wood
Sent: Friday, December 21, 2007 5:57 PM
To: jhyde at pentaho.org; 'Agustin Campos'
Cc: mondrian at pentaho.org
Subject: RE: [Mondrian] NON EMPTY + exception



I’ll test it out against HEAD and raise a bug if needed.

 

Yeah, we talked about doing a scalability test using TPC-H a while ago, and
now I am getting some help with that from a partner who has a large
environment – machines, disk, database.

 

I am just developing the schema now. I had few issues with the
snowflakey-ness, not the least of which is that it is a transactional model
not optimized in the way that you would like for Mondrian. I had a few
issues with Mondrian dealing with snowflakes – it was not clear how to do
joins of joins, getting the aliases right etc.

 

With the schema almost together, I am working on simulating the TPC-H test
queries in MDX, most of which are easy.

 

I could contribute this back, but how best to do it? It would make sense to
have an example TPC-H database in Mondrian in the same way as we have
Foodmart today, and a test suite against that.

 

I am also looking at using a few simple transforms of the TPC-H database
that make it more suitable for Mondrian – less snowflakey, making the
linetiem table a real fact table by adding keys on it and adding a few
indexes. These additions could also be included with a separate schema.

 

 

Sherman


  _____  


From: Julian Hyde [mailto:jhyde at pentaho.org] 
Sent: Friday, December 21, 2007 6:05 PM
To: 'Sherman Wood'; 'Agustin Campos'
Cc: mondrian at pentaho.org
Subject: RE: [Mondrian] NON EMPTY + exception

 

Sherman/Agustin,

 

Can one of you please log a bug for this? I know I just removed the
getAlias() method that Sherman referred to, but I don't think I fixed the
bug. I'll need a testcase to make sure.

 

Sherman,

 

I've been trying to persuade someone to contribute a TPC-H test suite for a
long time, and it seems like you have one working. Can you contribute it? It
would serve as a good test of mondrian's scalability and (as this case
shows) would improve coverage on a very-snowflakey schema.

 

Julian

 


  _____  


From: Sherman Wood [mailto:swood at jaspersoft.com] 
Sent: Thursday, December 20, 2007 6:47 PM
To: jhyde at pentaho.org; 'Mondrian developer mailing list'; 'Agustin Campos'
Subject: RE: [Mondrian] NON EMPTY + exception

I have this issue too.

 

>From the TPC-H data set, I have a dimension:

 

            <Dimension type="StandardDimension" name="Customer">

                        <Hierarchy name="Customer" hasAll="true"
allMemberName="All Customers" primaryKey="o_orderkey"
primaryKeyTable="orders">

                                    <Join leftKey="o_custkey"
rightAlias="customer" rightKey="c_custkey">

                                                <Table name="orders">

                                                </Table>

                                                <Join leftAlias="customer"
leftKey="c_nationkey" rightAlias="c_nation" rightKey="n_nationkey">

                                                            <Table
name="customer">

                                                            </Table>

                                                            <Join
leftAlias="c_nation" leftKey="n_regionkey" rightAlias="c_region"
rightKey="r_regionkey">

 
<Table name="nation" alias="c_nation">

 
</Table>

 
<Table name="region" alias="c_region">

 
</Table>

                                                            </Join>

                                                </Join>

                                    </Join>

                                    <Level name="Region" table="c_region"
column="r_name" type="String" uniqueMembers="true" levelType="Regular"
hideMemberIf="Never">

                                    </Level>

                                    <Level name="Nation" table="c_nation"
column="n_name" type="String" uniqueMembers="true" levelType="Regular"
hideMemberIf="Never">

                                    </Level>

                                    <Level name="Customer" table="customer"
column="c_name" type="String" uniqueMembers="false" levelType="Regular"
hideMemberIf="Never">

                                    </Level>

                        </Hierarchy>

            </Dimension>

 

With the MDX:

 

select {[Measures].[Price]} on columns,

non empty crossjoin([Commit Date].[1998].Children, [Customer].[All
Customers].[AMERICA].[UNITED STATES].Children) ON rows

from Orders

 

I get:

 

Caused by: mondrian.olap.MondrianException: Mondrian Error:Internal error:
Populating member cache with members for [[Commit Date].[Month],
[Customer.Customer].[Customer]]

      at
mondrian.resource.MondrianResource$_Def0.ex(MondrianResource.java:755)

      at mondrian.olap.Util.newInternal(Util.java:1103)

      at mondrian.olap.Util.newError(Util.java:1119)

      at
mondrian.rolap.SqlTupleReader.prepareTuples(SqlTupleReader.java:415)

      at mondrian.rolap.SqlTupleReader.readTuples(SqlTupleReader.java:441)

      at
mondrian.rolap.RolapNativeSet$SetEvaluator.executeList(RolapNativeSet.java:2
37)

      at
mondrian.rolap.RolapNativeSet$SetEvaluator.executeIterable(RolapNativeSet.ja
va:159)

      at
mondrian.rolap.RolapNativeSet$SetEvaluator.execute(RolapNativeSet.java:142)

      at
mondrian.olap.fun.CrossJoinFunDef$BaseIterCalc.evaluateIterable(CrossJoinFun
Def.java:241)

      at
mondrian.calc.impl.AbstractIterCalc.evaluate(AbstractIterCalc.java:53)

      at mondrian.rolap.RolapResult.executeAxis(RolapResult.java:715)

      at mondrian.rolap.RolapResult.evalLoad(RolapResult.java:587)

      at mondrian.rolap.RolapResult.loadMembers(RolapResult.java:565)

      at mondrian.rolap.RolapResult.<init>(RolapResult.java:251)

      at mondrian.rolap.RolapConnection.execute(RolapConnection.java:414)

      ... 27 more

Caused by: mondrian.olap.MondrianException: Mondrian Error:Internal error:
join does not have alias

      at
mondrian.resource.MondrianResource$_Def0.ex(MondrianResource.java:752)

      at mondrian.olap.Util.newInternal(Util.java:1096)

      at mondrian.olap.MondrianDef$Join.getAlias(MondrianDef.java:2176)

      at
mondrian.rolap.RolapHierarchy.rewriteRelationWithAliases(RolapHierarchy.java
:635)

      at
mondrian.rolap.RolapHierarchy.relationSubset(RolapHierarchy.java:554)

      at
mondrian.rolap.RolapHierarchy.lookupRelationSubset(RolapHierarchy.java:591)

      at
mondrian.rolap.RolapHierarchy.relationSubset(RolapHierarchy.java:548)

      at mondrian.rolap.RolapHierarchy.addToFrom(RolapHierarchy.java:486)

      at
mondrian.rolap.SqlTupleReader.addLevelMemberSql(SqlTupleReader.java:747)

      at
mondrian.rolap.SqlTupleReader.generateSelectForLevels(SqlTupleReader.java:66
6)

      at
mondrian.rolap.SqlTupleReader.makeLevelMembersSql(SqlTupleReader.java:633)

      at
mondrian.rolap.SqlTupleReader.prepareTuples(SqlTupleReader.java:329)

      ... 38 more

 

This is happening at the point where we are going from the lowest level join
(nation, region) up to the next join with customer.

 

mondrian.native.nonempty.enable=false does not help.

 

 

The structure in Mondrian HEAD is different, so now it does not make a call
to MondrianDef$Join.getAlias. The problem is there in Mondrian 2.3.2.

 

A fix I found was to change Mondrian.xml in the <Code> for the Join element:

 

            public String getAlias() {

                //throw Util.newInternal("join does not have alias");

                  return getLeftAlias();

            }

 

Change this line, and recompile the Mondrian.jar.

 

Sherman


  _____  


From: mondrian-bounces at pentaho.org [mailto:mondrian-bounces at pentaho.org] On
Behalf Of Julian Hyde
Sent: Wednesday, December 12, 2007 7:10 PM
To: 'Agustin Campos'
Cc: 'Mondrian developer mailing list'
Subject: RE: [Mondrian] NON EMPTY + exception

 

An error stack would help. I would like to see where
MondrianDef.Join.getAlias() is being called from.

 

I am guessing you have a snowflake schema?

 

Also, you should be able to work around this problem by setting
mondrian.native.nonempty.enable=false in mondrian.properties.

 

Julian

 


  _____  


From: Agustin Campos [mailto:aguscampos at gmail.com] 
Sent: Wednesday, December 12, 2007 12:04 AM
To: jhyde at pentaho.org; Mondrian developer mailing list
Subject: Re: [Mondrian] NON EMPTY + exception

all <join ...> has their own alias (tables' names on database) 
<join leftAlias... rightAlias... >
   < table name="table1" />
  <join ...>
       ...
      ....
   </join> 
</join>

All <join > has their own alias (left and right). 
<Table ....> doesn't.

Thanks for your answer, but, do you have any more ideas?



2007/12/12, Julian Hyde <jhyde at pentaho.org>: 

Heed the error message "join does not have alias" - check all <Join>
elements in your schema and make sure that specify leftAlias and rightAlias,
to make the join unambiguous.

 


  _____  


From: mondrian-bounces at pentaho.org [mailto:mondrian-bounces at pentaho.org] On
Behalf Of Agustin Campos
Sent: Tuesday, December 11, 2007 11:15 PM
To: Mondrian developer mailing list
Subject: [Mondrian] NON EMPTY + exception

Hello all:

I am having a little problem with my cube. It has 3 dimensions.

The case is that, when I ask something like this:

select NON EMPTY Crossjoin(Hierarchize(Union({[Dim1].[All]},
[Dim1].[All].Children)), {[Dim2].[All]}) ON COLUMNS, 
  NON EMPTY {[Dim3].[All]} ON ROWS
from [MyCube]

and then, I expand Dim1 I get an exception like this:

SqlMemberSource.getMemberChildren: executing sql [ SQL query ], exec 46 ms,
exec+fetch 46 ms, 7 rows
com.tonbeller.jpivot.olap.model.OlapException:
mondrian.olap.MondrianException: Mondrian Error:Internal error: Error while
executing query: MDX query which should work 

...
...
...
...

Caused by: mondrian.olap.MondrianException: Mondrian Error:Internal error:
join does not have alias
...
...
...

Do you have any idea about what could cause this problem? I know that I have
not sent the complete log but, I hope this could help to troubleshoot the
problem. 

I remind: It ONLY happends when I ask for non empty cells. It DOES WORK when
they are allowd.


Thank you a lot:

Agustín


_______________________________________________
Mondrian mailing list
Mondrian at pentaho.org
http://lists.pentaho.org/mailman/listinfo/mondrian




-- 
Agustín Campos Muñoz
http://www.acampos.net

Email: aguscamposENgmailPUNTOcom 
           acamposENiiesPUNTOes
MSN Messenger: acampozENhotmailPUNTOcom
Skype: acamposnet 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20080325/cece5d52/attachment.html 


More information about the Mondrian mailing list