[Mondrian] RE: TPC-H

Sherman Wood swood at jaspersoft.com
Sun Dec 23 14:21:49 EST 2007

Good feedback.

Because of the different load factors, it sounds like the
MondrianTpchLoader should at some level use the DBGEN algorithm to
generate the data, rather than have the data as a fixed set stored in
Perforce. I will not get to that day one, and instead do a set of
conversions to ‘Mondrianize’ an existing TPC-H database.

I think this will come in a few stages.

1.	Documentation, schemas and queries working with a TPC-H data set
2.	Test suite + the TpchLoader Mondrianizing an existing TPC-H data
3.	DBGEN algorithm based TpchLoader

I’ll be working on the tests with our partner during January, and should
be able to get to the 1st step after that.



From: mondrian-bounces at pentaho.org [mailto:mondrian-bounces at pentaho.org]
On Behalf Of Julian Hyde
Sent: Sunday, December 23, 2007 7:06 AM
To: 'Mondrian developer mailing list'
Subject: [Mondrian] RE: TPC-H

It would be very exciting an useful to include TPC-H alongside FoodMart as
a test dataset. First, it will give use testing coverage on a different
kind of schema; second, it will allow us to do some performance
benchmarks, because the dataset can easily be scaled up to larger data
sizes. Lastly, it could be the basis for performance regression tests
(i.e. tests which detect that we have gotten slower).

I'd like to replicate the foodmart pattern as far as possible:

*	demo/Tpch.xml schema file
*	testsrc/.../MondrianTpchLoader.java loader program
*	instructions in install.html describing how to load the data
*	tests in the junit regression suite which run if the database is
*	aggregate tables, created by the loader program and populated by a
SQL script (compare testsrc/main/mondrian/test/loader/insert.sql)

I think these steps could be done quite easily.

There are a few differences:

*	We don't require the TPC-H data set to be present in order for the
regression suite to run; if it is not present, tests will recuse
themselves (and trivially succeed)
*	The TPC-H benchmark allows the database to be created at different
scales (e.g. one with 100MB, one with 1GB). The tests should know which
scale of data they are running against. Some tests might be able to run on
any scale of the data, other tests might only run on one particular scale,
and recuse themselves (trivially succeed) if the desired data set is not
*	Sherman proposes a variant to the TPC-H schema where the
line_items table is more like a fact table. I don't want to mess with the
'official' table structure, so why not create an additional table
LINE_ITEM_FACT? The mondrian TPC-H schema could have cubes based on each
line item table, to compare.

The trickiest thing will be passing in the connect string to the test
suite. A lot of the mondrian test suite assumes that there is just one
connection, the connection to foodmart. It is reasonable to assume that
tpch, if it exists, is in the same database as foodmart, but we could not
assume that it is in the same schema. The foodmart connect string is
variously read from mondrian.properties, or read from build.properties and
passed in when ant invokes junit, or pieced together from other



From: mondrian-bounces at pentaho.org [mailto:mondrian-bounces at pentaho.org]
On Behalf Of Sherman Wood
Sent: Friday, December 21, 2007 5:57 PM
To: jhyde at pentaho.org; 'Agustin Campos'
Cc: mondrian at pentaho.org
Subject: RE: [Mondrian] NON EMPTY + exception

I’ll test it out against HEAD and raise a bug if needed.

Yeah, we talked about doing a scalability test using TPC-H a while ago,
and now I am getting some help with that from a partner who has a large
environment – machines, disk, database.

I am just developing the schema now. I had few issues with the
snowflakey-ness, not the least of which is that it is a transactional
model not optimized in the way that you would like for Mondrian. I had a
few issues with Mondrian dealing with snowflakes – it was not clear how to
do joins of joins, getting the aliases right etc.

With the schema almost together, I am working on simulating the TPC-H test
queries in MDX, most of which are easy.

I could contribute this back, but how best to do it? It would make sense
to have an example TPC-H database in Mondrian in the same way as we have
Foodmart today, and a test suite against that.

I am also looking at using a few simple transforms of the TPC-H database
that make it more suitable for Mondrian – less snowflakey, making the
linetiem table a real fact table by adding keys on it and adding a few
indexes. These additions could also be included with a separate schema.



From: Julian Hyde [mailto:jhyde at pentaho.org]
Sent: Friday, December 21, 2007 6:05 PM
To: 'Sherman Wood'; 'Agustin Campos'
Cc: mondrian at pentaho.org
Subject: RE: [Mondrian] NON EMPTY + exception


Can one of you please log a bug for this? I know I just removed the
getAlias() method that Sherman referred to, but I don't think I fixed the
bug. I'll need a testcase to make sure.


I've been trying to persuade someone to contribute a TPC-H test suite for
a long time, and it seems like you have one working. Can you contribute
it? It would serve as a good test of mondrian's scalability and (as this
case shows) would improve coverage on a very-snowflakey schema.



From: Sherman Wood [mailto:swood at jaspersoft.com]
Sent: Thursday, December 20, 2007 6:47 PM
To: jhyde at pentaho.org; 'Mondrian developer mailing list'; 'Agustin Campos'
Subject: RE: [Mondrian] NON EMPTY + exception

I have this issue too.

>From the TPC-H data set, I have a dimension:

            <Dimension type="StandardDimension" name="Customer">

                        <Hierarchy name="Customer" hasAll="true"
allMemberName="All Customers" primaryKey="o_orderkey"

                                    <Join leftKey="o_custkey"
rightAlias="customer" rightKey="c_custkey">

                                                <Table name="orders">


                                                <Join leftAlias="customer"
leftKey="c_nationkey" rightAlias="c_nation" rightKey="n_nationkey">



leftAlias="c_nation" leftKey="n_regionkey" rightAlias="c_region"

<Table name="nation" alias="c_nation">


<Table name="region" alias="c_region">





                                    <Level name="Region" table="c_region"
column="r_name" type="String" uniqueMembers="true" levelType="Regular"


                                    <Level name="Nation" table="c_nation"
column="n_name" type="String" uniqueMembers="true" levelType="Regular"


                                    <Level name="Customer"
table="customer" column="c_name" type="String" uniqueMembers="false"
levelType="Regular" hideMemberIf="Never">




With the MDX:

select {[Measures].[Price]} on columns,

non empty crossjoin([Commit Date].[1998].Children, [Customer].[All
Customers].[AMERICA].[UNITED STATES].Children) ON rows

from Orders

I get:

Caused by: mondrian.olap.MondrianException: Mondrian Error:Internal error:
Populating member cache with members for [[Commit Date].[Month],


      at mondrian.olap.Util.newInternal(Util.java:1103)

      at mondrian.olap.Util.newError(Util.java:1119)


      at mondrian.rolap.SqlTupleReader.readTuples(SqlTupleReader.java:441)






      at mondrian.rolap.RolapResult.executeAxis(RolapResult.java:715)

      at mondrian.rolap.RolapResult.evalLoad(RolapResult.java:587)

      at mondrian.rolap.RolapResult.loadMembers(RolapResult.java:565)

      at mondrian.rolap.RolapResult.<init>(RolapResult.java:251)

      at mondrian.rolap.RolapConnection.execute(RolapConnection.java:414)

      ... 27 more

Caused by: mondrian.olap.MondrianException: Mondrian Error:Internal error:
join does not have alias


      at mondrian.olap.Util.newInternal(Util.java:1096)

      at mondrian.olap.MondrianDef$Join.getAlias(MondrianDef.java:2176)





      at mondrian.rolap.RolapHierarchy.addToFrom(RolapHierarchy.java:486)





      ... 38 more

This is happening at the point where we are going from the lowest level
join (nation, region) up to the next join with customer.

mondrian.native.nonempty.enable=false does not help.

The structure in Mondrian HEAD is different, so now it does not make a
call to MondrianDef$Join.getAlias. The problem is there in Mondrian 2.3.2.

A fix I found was to change Mondrian.xml in the <Code> for the Join

            public String getAlias() {

                //throw Util.newInternal("join does not have alias");

                  return getLeftAlias();


Change this line, and recompile the Mondrian.jar.



From: mondrian-bounces at pentaho.org [mailto:mondrian-bounces at pentaho.org]
On Behalf Of Julian Hyde
Sent: Wednesday, December 12, 2007 7:10 PM
To: 'Agustin Campos'
Cc: 'Mondrian developer mailing list'
Subject: RE: [Mondrian] NON EMPTY + exception

An error stack would help. I would like to see where
MondrianDef.Join.getAlias() is being called from.

I am guessing you have a snowflake schema?

Also, you should be able to work around this problem by setting
mondrian.native.nonempty.enable=false in mondrian.properties.



From: Agustin Campos [mailto:aguscampos at gmail.com]
Sent: Wednesday, December 12, 2007 12:04 AM
To: jhyde at pentaho.org; Mondrian developer mailing list
Subject: Re: [Mondrian] NON EMPTY + exception

all <join ...> has their own alias (tables' names on database)
<join leftAlias... rightAlias... >
   < table name="table1" />
  <join ...>

All <join > has their own alias (left and right).
<Table ....> doesn't.

Thanks for your answer, but, do you have any more ideas?

2007/12/12, Julian Hyde <jhyde at pentaho.org>:

Heed the error message "join does not have alias" - check all <Join>
elements in your schema and make sure that specify leftAlias and
rightAlias, to make the join unambiguous.


From: mondrian-bounces at pentaho.org [mailto:mondrian-bounces at pentaho.org]
On Behalf Of Agustin Campos
Sent: Tuesday, December 11, 2007 11:15 PM
To: Mondrian developer mailing list
Subject: [Mondrian] NON EMPTY + exception

Hello all:

I am having a little problem with my cube. It has 3 dimensions.

The case is that, when I ask something like this:

select NON EMPTY Crossjoin(Hierarchize(Union({[Dim1].[All]},
[Dim1].[All].Children)), {[Dim2].[All]}) ON COLUMNS,
  NON EMPTY {[Dim3].[All]} ON ROWS
from [MyCube]

and then, I expand Dim1 I get an exception like this:

SqlMemberSource.getMemberChildren: executing sql [ SQL query ], exec 46
ms, exec+fetch 46 ms, 7 rows
mondrian.olap.MondrianException: Mondrian Error:Internal error: Error
while executing query: MDX query which should work


Caused by: mondrian.olap.MondrianException: Mondrian Error:Internal error:
join does not have alias

Do you have any idea about what could cause this problem? I know that I
have not sent the complete log but, I hope this could help to troubleshoot
the problem.

I remind: It ONLY happends when I ask for non empty cells. It DOES WORK
when they are allowd.

Thank you a lot:


Mondrian mailing list
Mondrian at pentaho.org

Agustín Campos Muñoz

Email: aguscamposENgmailPUNTOcom
MSN Messenger: acampozENhotmailPUNTOcom
Skype: acamposnet

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20071223/d5eb05fb/attachment.html 

More information about the Mondrian mailing list