[Mondrian] RE: TPC-H

Julian Hyde jhyde at pentaho.org
Wed Dec 26 02:18:08 EST 2007


> > John Sichi wrote:
> > For (2), what scale were you thinking about checking in?  I'd like to
> > keep massive files out of the Perforce depot.  Instead, we could host
> > large files at sourceforge.net and have the ant scripts automatically
> > download and unpack them.

> Sherman Wood wrote:
>
> For completeness, a small set would be checked in - Foodmart size? The
> main aim would be to have the Loader adding indexes, 
> aggregate tables etc
> to an existing TPC-H database.

I hope we never have to create large files for people to download.

Short term, it's ok if a small set is checked into perforce - no larger than
FoodMartCreateData.zip, which is 4.8MB compressed.

Longer term, when we allow the benchmark to scale to larger data sets, we
will obviously need to include a java version of dbgen, at which time so we
can probably obsolete the small data set. I don't particularly mind whether
the loader program generates temporary files or direct loads into the dbms -
I suspect it will do both eventually.

John suggested creating a benchmark directory, but we already have the demo
directory with subdirectories for specific databases access and derby, so
let's use that.

Julian

> -----Original Message-----
> From: mondrian-bounces at pentaho.org 
> [mailto:mondrian-bounces at pentaho.org] On Behalf Of Sherman Wood
> Sent: Tuesday, December 25, 2007 11:25 AM
> To: 'Mondrian developer mailing list'
> Subject: RE: [Mondrian] RE: TPC-H
> 
> 
> 
> 
> Sherman
> 
> -----Original Message-----
> From: mondrian-bounces at pentaho.org 
> [mailto:mondrian-bounces at pentaho.org]
> On Behalf Of John V. Sichi
> Sent: Tuesday, December 25, 2007 10:51 AM
> To: Mondrian developer mailing list
> Subject: Re: [Mondrian] RE: TPC-H
> 
> 
> JVS
> 
> Sherman Wood wrote:
> > Good feedback.
> >
> >
> >
> > Because of the different load factors, it sounds like the
> > MondrianTpchLoader should at some level use the DBGEN algorithm to
> > generate the data, rather than have the data as a fixed set 
> stored in
> > Perforce. I will not get to that day one, and instead do a set of
> > conversions to ‘Mondrianize’ an existing TPC-H database.
> >
> >
> >
> > I think this will come in a few stages.
> >
> >    1. Documentation, schemas and queries working with a 
> TPC-H data set
> >    2. Test suite + the TpchLoader Mondrianizing an existing 
> TPC-H data
> set
> >    3. DBGEN algorithm based TpchLoader
> >
> >
> >
> > I’ll be working on the tests with our partner during 
> January, and should
> 
> > be able to get to the 1^st step after that.
> >
> >
> >
> >
> >
> > Sherman
> >
> > 
> --------------------------------------------------------------
> ----------
> >
> > *From:* mondrian-bounces at pentaho.org
> > [mailto:mondrian-bounces at pentaho.org] *On Behalf Of *Julian Hyde
> > *Sent:* Sunday, December 23, 2007 7:06 AM
> > *To:* 'Mondrian developer mailing list'
> > *Subject:* [Mondrian] RE: TPC-H
> >
> >
> >
> > It would be very exciting an useful to include TPC-H 
> alongside FoodMart
> > as a test dataset. First, it will give use testing coverage on a
> > different kind of schema; second, it will allow us to do some
> > performance benchmarks, because the dataset can easily be 
> scaled up to
> > larger data sizes. Lastly, it could be the basis for performance
> > regression tests (i.e. tests which detect that we have 
> gotten slower).
> >
> >
> >
> > I'd like to replicate the foodmart pattern as far as possible:
> >
> >     * demo/Tpch.xml schema file
> >     * testsrc/.../MondrianTpchLoader.java loader program
> >     * instructions in install.html describing how to load the data
> >     * tests in the junit regression suite which run if the 
> database is
> >       present
> >     * aggregate tables, created by the loader program and 
> populated by a
> >       SQL script (compare 
> testsrc/main/mondrian/test/loader/insert.sql)
> >
> > I think these steps could be done quite easily.
> >
> >
> >
> > There are a few differences:
> >
> >     * We don't require the TPC-H data set to be present in 
> order for the
> >       regression suite to run; if it is not present, tests 
> will recuse
> >       themselves (and trivially succeed)
> >     * The TPC-H benchmark allows the database to be created 
> at different
> >       scales (e.g. one with 100MB, one with 1GB). The tests 
> should know
> >       which scale of data they are running against. Some 
> tests might be
> >       able to run on any scale of the data, other tests 
> might only run
> >       on one particular scale, and recuse themselves 
> (trivially succeed)
> >       if the desired data set is not present.
> >     * Sherman proposes a variant to the TPC-H schema where the
> >       line_items table is more like a fact table. I don't 
> want to mess
> >       with the 'official' table structure, so why not create an
> >       additional table LINE_ITEM_FACT? The mondrian TPC-H 
> schema could
> >       have cubes based on each line item table, to compare.
> >
> > The trickiest thing will be passing in the connect string 
> to the test
> > suite. A lot of the mondrian test suite assumes that there 
> is just one
> > connection, the connection to foodmart. It is reasonable to 
> assume that
> > tpch, if it exists, is in the same database as foodmart, 
> but we could
> > not assume that it is in the same schema. The foodmart 
> connect string is
> 
> > variously read from mondrian.properties, or read from 
> build.properties
> > and passed in when ant invokes junit, or pieced together from other
> > properties.
> >
> >
> >
> > Julian
> >
> >
> >
> >
> --------------------------------------------------------------
> ----------
> >
> >     *From:* mondrian-bounces at pentaho.org
> >     [mailto:mondrian-bounces at pentaho.org] *On Behalf Of 
> *Sherman Wood
> >     *Sent:* Friday, December 21, 2007 5:57 PM
> >     *To:* jhyde at pentaho.org; 'Agustin Campos'
> >     *Cc:* mondrian at pentaho.org
> >     *Subject:* RE: [Mondrian] NON EMPTY + exception
> >
> >     I’ll test it out against HEAD and raise a bug if needed.
> >
> >
> >
> >     Yeah, we talked about doing a scalability test using 
> TPC-H a while
> >     ago, and now I am getting some help with that from a 
> partner who has
> >     a large environment – machines, disk, database.
> >
> >
> >
> >     I am just developing the schema now. I had few issues with the
> >     snowflakey-ness, not the least of which is that it is a
> >     transactional model not optimized in the way that you 
> would like for
> >     Mondrian. I had a few issues with Mondrian dealing with 
> snowflakes –
> >     it was not clear how to do joins of joins, getting the 
> aliases right
> >     etc.
> >
> >
> >
> >     With the schema almost together, I am working on simulating the
> >     TPC-H test queries in MDX, most of which are easy.
> >
> >
> >
> >     I could contribute this back, but how best to do it? It 
> would make
> >     sense to have an example TPC-H database in Mondrian in 
> the same way
> >     as we have Foodmart today, and a test suite against that.
> >
> >
> >
> >     I am also looking at using a few simple transforms of the TPC-H
> >     database that make it more suitable for Mondrian – less 
> snowflakey,
> >     making the linetiem table a real fact table by adding 
> keys on it and
> >     adding a few indexes. These additions could also be 
> included with a
> >     separate schema.
> >
> >
> >
> >
> >
> >     Sherman
> >
> >
> --------------------------------------------------------------
> ----------
> >
> >     *From:* Julian Hyde [mailto:jhyde at pentaho.org]
> >     *Sent:* Friday, December 21, 2007 6:05 PM
> >     *To:* 'Sherman Wood'; 'Agustin Campos'
> >     *Cc:* mondrian at pentaho.org
> >     *Subject:* RE: [Mondrian] NON EMPTY + exception
> >
> >
> >
> >     Sherman/Agustin,
> >
> >
> >
> >     Can one of you please log a bug for this? I know I just 
> removed the
> >     getAlias() method that Sherman referred to, but I don't think I
> >     fixed the bug. I'll need a testcase to make sure.
> >
> >
> >
> >     Sherman,
> >
> >
> >
> >     I've been trying to persuade someone to contribute a TPC-H test
> >     suite for a long time, and it seems like you have one 
> working. Can
> >     you contribute it? It would serve as a good test of mondrian's
> >     scalability and (as this case shows) would improve coverage on a
> >     very-snowflakey schema.
> >
> >
> >
> >     Julian
> >
> >
> >
> >
> --------------------------------------------------------------
> ----------
> >
> >         *From:* Sherman Wood [mailto:swood at jaspersoft.com]
> >         *Sent:* Thursday, December 20, 2007 6:47 PM
> >         *To:* jhyde at pentaho.org; 'Mondrian developer mailing list';
> >         'Agustin Campos'
> >         *Subject:* RE: [Mondrian] NON EMPTY + exception
> >
> >         I have this issue too.
> >
> >
> >
> >          From the TPC-H data set, I have a dimension:
> >
> >
> >
> >                     <Dimension type="StandardDimension" 
> name="Customer">
> >
> >                                 <Hierarchy name="Customer" 
> hasAll="true"
> >         allMemberName="All Customers" primaryKey="o_orderkey"
> >         primaryKeyTable="orders">
> >
> >                                             <Join 
> leftKey="o_custkey"
> >         rightAlias="customer" rightKey="c_custkey">
> >
> >                                                         <Table
> >         name="orders">
> >
> >                                                         </Table>
> >
> >                                                         <Join
> >         leftAlias="customer" leftKey="c_nationkey" 
> rightAlias="c_nation"
> >         rightKey="n_nationkey">
> >
> >
> >         <Table name="customer">
> >
> >
> </Table>
> >
> >
> >         <Join leftAlias="c_nation" leftKey="n_regionkey"
> >         rightAlias="c_region" rightKey="r_regionkey">
> >
> >
> 
> >         <Table name="nation" alias="c_nation">
> >
> >
> 
> >         </Table>
> >
> >
> 
> >         <Table name="region" alias="c_region">
> >
> >
> 
> >         </Table>
> >
> >
> </Join>
> >
> >                                                         </Join>
> >
> >                                             </Join>
> >
> >                                             <Level name="Region"
> >         table="c_region" column="r_name" type="String"
> >         uniqueMembers="true" levelType="Regular" 
> hideMemberIf="Never">
> >
> >                                             </Level>
> >
> >                                             <Level name="Nation"
> >         table="c_nation" column="n_name" type="String"
> >         uniqueMembers="true" levelType="Regular" 
> hideMemberIf="Never">
> >
> >                                             </Level>
> >
> >                                             <Level name="Customer"
> >         table="customer" column="c_name" type="String"
> >         uniqueMembers="false" levelType="Regular" 
> hideMemberIf="Never">
> >
> >                                             </Level>
> >
> >                                 </Hierarchy>
> >
> >                     </Dimension>
> >
> >
> >
> >         With the MDX:
> >
> >
> >
> >         select {[Measures].[Price]} on columns,
> >
> >         non empty crossjoin([Commit Date].[1998].Children,
> >         [Customer].[All Customers].[AMERICA].[UNITED 
> STATES].Children)
> >         ON rows
> >
> >         from Orders
> >
> >
> >
> >         I get:
> >
> >
> >
> >         Caused by: _mondrian.olap.MondrianException_: Mondrian
> >         Error:Internal error: Populating member cache with 
> members for
> >         [[Commit Date].[Month], [Customer.Customer].[Customer]]
> >
> >               at
> >
> mondrian.resource.MondrianResource$_Def0.ex(_MondrianResource.
> java:755_)
> >
> >               at mondrian.olap.Util.newInternal(_Util.java:1103_)
> >
> >               at mondrian.olap.Util.newError(_Util.java:1119_)
> >
> >               at
> >
> mondrian.rolap.SqlTupleReader.prepareTuples(_SqlTupleReader.java:415_)
> >
> >               at
> >
> mondrian.rolap.SqlTupleReader.readTuples(_SqlTupleReader.java:441_)
> >
> >               at
> >
> mondrian.rolap.RolapNativeSet$SetEvaluator.executeList(_RolapN
> ativeSet.jav
> a:237_)
> >
> >               at
> >
> mondrian.rolap.RolapNativeSet$SetEvaluator.executeIterable(_Ro
> lapNativeSet
> .java:159_)
> >
> >               at
> >
> mondrian.rolap.RolapNativeSet$SetEvaluator.execute(_RolapNativ
> eSet.java:14
> 2_)
> >
> >               at
> >
> mondrian.olap.fun.CrossJoinFunDef$BaseIterCalc.evaluateIterabl
> e(_CrossJoin
> FunDef.java:241_)
> >
> >               at
> >
> mondrian.calc.impl.AbstractIterCalc.evaluate(_AbstractIterCalc
> .java:53_)
> >
> >               at
> >         
> mondrian.rolap.RolapResult.executeAxis(_RolapResult.java:715_)
> >
> >               at
> mondrian.rolap.RolapResult.evalLoad(_RolapResult.java:587_)
> >
> >               at
> >         
> mondrian.rolap.RolapResult.loadMembers(_RolapResult.java:565_)
> >
> >               at
> mondrian.rolap.RolapResult.<init>(_RolapResult.java:251_)
> >
> >               at
> >
> mondrian.rolap.RolapConnection.execute(_RolapConnection.java:414_)
> >
> >               ... 27 more
> >
> >         Caused by: _mondrian.olap.MondrianException_: Mondrian
> >         Error:Internal error: join does not have alias
> >
> >               at
> >
> mondrian.resource.MondrianResource$_Def0.ex(_MondrianResource.
> java:752_)
> >
> >               at mondrian.olap.Util.newInternal(_Util.java:1096_)
> >
> >               at
> >         
> mondrian.olap.MondrianDef$Join.getAlias(_MondrianDef.java:2176_)
> >
> >               at
> >
> mondrian.rolap.RolapHierarchy.rewriteRelationWithAliases(_Rola
> pHierarchy.j
> ava:635_)
> >
> >               at
> >
> mondrian.rolap.RolapHierarchy.relationSubset(_RolapHierarchy.j
> ava:554_)
> >
> >               at
> >
> mondrian.rolap.RolapHierarchy.lookupRelationSubset(_RolapHiera
> rchy.java:59
> 1_)
> >
> >               at
> >
> mondrian.rolap.RolapHierarchy.relationSubset(_RolapHierarchy.j
> ava:548_)
> >
> >               at
> >
> mondrian.rolap.RolapHierarchy.addToFrom(_RolapHierarchy.java:486_)
> >
> >               at
> >
> mondrian.rolap.SqlTupleReader.addLevelMemberSql(_SqlTupleReade
> r.java:747_)
> >
> >               at
> >
> mondrian.rolap.SqlTupleReader.generateSelectForLevels(_SqlTupl
> eReader.java
> :666_)
> >
> >               at
> >
> mondrian.rolap.SqlTupleReader.makeLevelMembersSql(_SqlTupleRea
> der.java:633
> _)
> >
> >               at
> >
> mondrian.rolap.SqlTupleReader.prepareTuples(_SqlTupleReader.java:329_)
> >
> >               ... 38 more
> >
> >
> >
> >         This is happening at the point where we are going from the
> >         lowest level join (nation, region) up to the next join with
> >         customer.
> >
> >
> >
> >         mondrian.native.nonempty.enable=false does not help.
> >
> >
> >
> >
> >
> >         The structure in Mondrian HEAD is different, so now 
> it does not
> >         make a call to MondrianDef$Join.getAlias. The 
> problem is there
> >         in Mondrian 2.3.2.
> >
> >
> >
> >         A fix I found was to change Mondrian.xml in the 
> <Code> for the
> >         Join element:
> >
> >
> >
> >                     public String getAlias() {
> >
> >                         //throw Util.newInternal("join does not have
> >         alias");
> >
> >                           return getLeftAlias();
> >
> >                     }
> >
> >
> >
> >         Change this line, and recompile the Mondrian.jar.
> >
> >
> >
> >         Sherman
> >
> >
> --------------------------------------------------------------
> ----------
> >
> >         *From:* mondrian-bounces at pentaho.org
> >         [mailto:mondrian-bounces at pentaho.org] *On Behalf Of 
> *Julian Hyde
> >         *Sent:* Wednesday, December 12, 2007 7:10 PM
> >         *To:* 'Agustin Campos'
> >         *Cc:* 'Mondrian developer mailing list'
> >         *Subject:* RE: [Mondrian] NON EMPTY + exception
> >
> >
> >
> >         An error stack would help. I would like to see where
> >         MondrianDef.Join.getAlias() is being called from.
> >
> >
> >
> >         I am guessing you have a snowflake schema?
> >
> >
> >
> >         Also, you should be able to work around this 
> problem by setting
> >         mondrian.native.nonempty.enable=false in 
> mondrian.properties.
> >
> >
> >
> >         Julian
> >
> >
> >
> >
> --------------------------------------------------------------
> ----------
> >
> >             *From:* Agustin Campos [mailto:aguscampos at gmail.com]
> >             *Sent:* Wednesday, December 12, 2007 12:04 AM
> >             *To:* jhyde at pentaho.org; Mondrian developer mailing list
> >             *Subject:* Re: [Mondrian] NON EMPTY + exception
> >
> >             all <join ...> has their own alias (tables' names on
> database)
> >             <join leftAlias... rightAlias... >
> >                < table name="table1" />
> >               <join ...>
> >                    ...
> >                   ....
> >                </join>
> >             </join>
> >
> >             All <join > has their own alias (left and right).
> >             <Table ....> doesn't.
> >
> >             Thanks for your answer, but, do you have any more ideas?
> >
> >             2007/12/12, Julian Hyde <jhyde at pentaho.org
> >             <mailto:jhyde at pentaho.org>>:
> >
> >             Heed the error message "join does not have 
> alias" - check
> >             all <Join> elements in your schema and make sure that
> >             specify leftAlias and rightAlias, to make the join
> unambiguous.
> >
> >
> >
> >
> --------------------------------------------------------------
> ----------
> >
> >                 *From:* mondrian-bounces at pentaho.org
> >                 <mailto:mondrian-bounces at pentaho.org>
> >                 [mailto:mondrian-bounces at pentaho.org
> >                 <mailto:mondrian-bounces at pentaho.org>] *On Behalf Of
> >                 *Agustin Campos
> >                 *Sent:* Tuesday, December 11, 2007 11:15 PM
> >                 *To:* Mondrian developer mailing list
> >                 *Subject:* [Mondrian] NON EMPTY + exception
> >
> >                 Hello all:
> >
> >                 I am having a little problem with my cube. It has 3
> >                 dimensions.
> >
> >                 The case is that, when I ask something like this:
> >
> >                 select NON EMPTY
> >                 Crossjoin(Hierarchize(Union({[Dim1].[All]},
> >                 [Dim1].[All].Children)), {[Dim2].[All]}) ON COLUMNS,
> >                   NON EMPTY {[Dim3].[All]} ON ROWS
> >                 from [MyCube]
> >
> >                 and then, I expand Dim1 I get an exception 
> like this:
> >
> >                 SqlMemberSource.getMemberChildren: 
> executing sql [/ SQL
> >                 query/ ], exec 46 ms, exec+fetch 46 ms, 7 rows
> >                 com.tonbeller.jpivot.olap.model.OlapException:
> >                 mondrian.olap.MondrianException: Mondrian 
> Error:Internal
> >                 error: Error while executing query: /MDX query which
> >                 should work /
> >
> >                 ...
> >                 ...
> >                 ...
> >                 ...
> >
> >                 Caused by: mondrian.olap.MondrianException: Mondrian
> >                 Error:Internal error: join does not have alias
> >                 ...
> >                 ...
> >                 ...
> >
> >                 Do you have any idea about what could cause this
> >                 problem? I know that I have not sent the 
> complete log
> >                 but, I hope this could help to troubleshoot 
> the problem.
> >
> >                 I remind: It ONLY happends when I ask for non empty
> >                 cells. It DOES WORK when they are allowd.
> >
> >
> >                 Thank you a lot:
> >
> >                 Agustín
> >
> >
> >             _______________________________________________
> >             Mondrian mailing list
> >             Mondrian at pentaho.org <mailto:Mondrian at pentaho.org>
> >             http://lists.pentaho.org/mailman/listinfo/mondrian
> >
> >
> >
> >
> >             --
> >             Agustín Campos Muñoz
> >             http://www.acampos.net
> >
> >             Email: aguscamposENgmailPUNTOcom
> >                        acamposENiiesPUNTOes
> >             MSN Messenger: acampozENhotmailPUNTOcom
> >             Skype: acamposnet
> >
> >
> > 
> --------------------------------------------------------------
> ----------
> >
> > _______________________________________________
> > Mondrian mailing list
> > Mondrian at pentaho.org
> > http://lists.pentaho.org/mailman/listinfo/mondrian
> 
> _______________________________________________
> Mondrian mailing list
> Mondrian at pentaho.org
> http://lists.pentaho.org/mailman/listinfo/mondrian
> 
> _______________________________________________
> Mondrian mailing list
> Mondrian at pentaho.org
> http://lists.pentaho.org/mailman/listinfo/mondrian
> 




More information about the Mondrian mailing list