[Mondrian] RE: TPC-H

John V. Sichi jsichi at gmail.com
Mon Dec 24 18:51:29 EST 2007


For (2), what scale were you thinking about checking in?  I'd like to 
keep massive files out of the Perforce depot.  Instead, we could host 
large files at sourceforge.net and have the ant scripts automatically 
download and unpack them.

JVS

Sherman Wood wrote:
> Good feedback.
> 
>  
> 
> Because of the different load factors, it sounds like the 
> MondrianTpchLoader should at some level use the DBGEN algorithm to 
> generate the data, rather than have the data as a fixed set stored in 
> Perforce. I will not get to that day one, and instead do a set of 
> conversions to ‘Mondrianize’ an existing TPC-H database.
> 
>  
> 
> I think this will come in a few stages.
> 
>    1. Documentation, schemas and queries working with a TPC-H data set
>    2. Test suite + the TpchLoader Mondrianizing an existing TPC-H data set
>    3. DBGEN algorithm based TpchLoader
> 
>  
> 
> I’ll be working on the tests with our partner during January, and should 
> be able to get to the 1^st step after that.
> 
>  
> 
>  
> 
> Sherman
> 
> ------------------------------------------------------------------------
> 
> *From:* mondrian-bounces at pentaho.org 
> [mailto:mondrian-bounces at pentaho.org] *On Behalf Of *Julian Hyde
> *Sent:* Sunday, December 23, 2007 7:06 AM
> *To:* 'Mondrian developer mailing list'
> *Subject:* [Mondrian] RE: TPC-H
> 
>  
> 
> It would be very exciting an useful to include TPC-H alongside FoodMart 
> as a test dataset. First, it will give use testing coverage on a 
> different kind of schema; second, it will allow us to do some 
> performance benchmarks, because the dataset can easily be scaled up to 
> larger data sizes. Lastly, it could be the basis for performance 
> regression tests (i.e. tests which detect that we have gotten slower).
> 
>  
> 
> I'd like to replicate the foodmart pattern as far as possible:
> 
>     * demo/Tpch.xml schema file
>     * testsrc/.../MondrianTpchLoader.java loader program
>     * instructions in install.html describing how to load the data
>     * tests in the junit regression suite which run if the database is
>       present
>     * aggregate tables, created by the loader program and populated by a
>       SQL script (compare testsrc/main/mondrian/test/loader/insert.sql)
> 
> I think these steps could be done quite easily.
> 
>  
> 
> There are a few differences:
> 
>     * We don't require the TPC-H data set to be present in order for the
>       regression suite to run; if it is not present, tests will recuse
>       themselves (and trivially succeed)
>     * The TPC-H benchmark allows the database to be created at different
>       scales (e.g. one with 100MB, one with 1GB). The tests should know
>       which scale of data they are running against. Some tests might be
>       able to run on any scale of the data, other tests might only run
>       on one particular scale, and recuse themselves (trivially succeed)
>       if the desired data set is not present.
>     * Sherman proposes a variant to the TPC-H schema where the
>       line_items table is more like a fact table. I don't want to mess
>       with the 'official' table structure, so why not create an
>       additional table LINE_ITEM_FACT? The mondrian TPC-H schema could
>       have cubes based on each line item table, to compare.
> 
> The trickiest thing will be passing in the connect string to the test 
> suite. A lot of the mondrian test suite assumes that there is just one 
> connection, the connection to foodmart. It is reasonable to assume that 
> tpch, if it exists, is in the same database as foodmart, but we could 
> not assume that it is in the same schema. The foodmart connect string is 
> variously read from mondrian.properties, or read from build.properties 
> and passed in when ant invokes junit, or pieced together from other 
> properties.
> 
>  
> 
> Julian
> 
>      
> 
>     ------------------------------------------------------------------------
> 
>     *From:* mondrian-bounces at pentaho.org
>     [mailto:mondrian-bounces at pentaho.org] *On Behalf Of *Sherman Wood
>     *Sent:* Friday, December 21, 2007 5:57 PM
>     *To:* jhyde at pentaho.org; 'Agustin Campos'
>     *Cc:* mondrian at pentaho.org
>     *Subject:* RE: [Mondrian] NON EMPTY + exception
> 
>     I’ll test it out against HEAD and raise a bug if needed.
> 
>      
> 
>     Yeah, we talked about doing a scalability test using TPC-H a while
>     ago, and now I am getting some help with that from a partner who has
>     a large environment – machines, disk, database.
> 
>      
> 
>     I am just developing the schema now. I had few issues with the
>     snowflakey-ness, not the least of which is that it is a
>     transactional model not optimized in the way that you would like for
>     Mondrian. I had a few issues with Mondrian dealing with snowflakes –
>     it was not clear how to do joins of joins, getting the aliases right
>     etc.
> 
>      
> 
>     With the schema almost together, I am working on simulating the
>     TPC-H test queries in MDX, most of which are easy.
> 
>      
> 
>     I could contribute this back, but how best to do it? It would make
>     sense to have an example TPC-H database in Mondrian in the same way
>     as we have Foodmart today, and a test suite against that.
> 
>      
> 
>     I am also looking at using a few simple transforms of the TPC-H
>     database that make it more suitable for Mondrian – less snowflakey,
>     making the linetiem table a real fact table by adding keys on it and
>     adding a few indexes. These additions could also be included with a
>     separate schema.
> 
>      
> 
>      
> 
>     Sherman
> 
>     ------------------------------------------------------------------------
> 
>     *From:* Julian Hyde [mailto:jhyde at pentaho.org]
>     *Sent:* Friday, December 21, 2007 6:05 PM
>     *To:* 'Sherman Wood'; 'Agustin Campos'
>     *Cc:* mondrian at pentaho.org
>     *Subject:* RE: [Mondrian] NON EMPTY + exception
> 
>      
> 
>     Sherman/Agustin,
> 
>      
> 
>     Can one of you please log a bug for this? I know I just removed the
>     getAlias() method that Sherman referred to, but I don't think I
>     fixed the bug. I'll need a testcase to make sure.
> 
>      
> 
>     Sherman,
> 
>      
> 
>     I've been trying to persuade someone to contribute a TPC-H test
>     suite for a long time, and it seems like you have one working. Can
>     you contribute it? It would serve as a good test of mondrian's
>     scalability and (as this case shows) would improve coverage on a
>     very-snowflakey schema.
> 
>      
> 
>     Julian
> 
>          
> 
>         ------------------------------------------------------------------------
> 
>         *From:* Sherman Wood [mailto:swood at jaspersoft.com]
>         *Sent:* Thursday, December 20, 2007 6:47 PM
>         *To:* jhyde at pentaho.org; 'Mondrian developer mailing list';
>         'Agustin Campos'
>         *Subject:* RE: [Mondrian] NON EMPTY + exception
> 
>         I have this issue too.
> 
>          
> 
>          From the TPC-H data set, I have a dimension:
> 
>          
> 
>                     <Dimension type="StandardDimension" name="Customer">
> 
>                                 <Hierarchy name="Customer" hasAll="true"
>         allMemberName="All Customers" primaryKey="o_orderkey"
>         primaryKeyTable="orders">
> 
>                                             <Join leftKey="o_custkey"
>         rightAlias="customer" rightKey="c_custkey">
> 
>                                                         <Table
>         name="orders">
> 
>                                                         </Table>
> 
>                                                         <Join
>         leftAlias="customer" leftKey="c_nationkey" rightAlias="c_nation"
>         rightKey="n_nationkey">
> 
>                                                                    
>         <Table name="customer">
> 
>                                                                     </Table>
> 
>                                                                    
>         <Join leftAlias="c_nation" leftKey="n_regionkey"
>         rightAlias="c_region" rightKey="r_regionkey">
> 
>                                                                                
>         <Table name="nation" alias="c_nation">
> 
>                                                                                
>         </Table>
> 
>                                                                                
>         <Table name="region" alias="c_region">
> 
>                                                                                
>         </Table>
> 
>                                                                     </Join>
> 
>                                                         </Join>
> 
>                                             </Join>
> 
>                                             <Level name="Region"
>         table="c_region" column="r_name" type="String"
>         uniqueMembers="true" levelType="Regular" hideMemberIf="Never">
> 
>                                             </Level>
> 
>                                             <Level name="Nation"
>         table="c_nation" column="n_name" type="String"
>         uniqueMembers="true" levelType="Regular" hideMemberIf="Never">
> 
>                                             </Level>
> 
>                                             <Level name="Customer"
>         table="customer" column="c_name" type="String"
>         uniqueMembers="false" levelType="Regular" hideMemberIf="Never">
> 
>                                             </Level>
> 
>                                 </Hierarchy>
> 
>                     </Dimension>
> 
>          
> 
>         With the MDX:
> 
>          
> 
>         select {[Measures].[Price]} on columns,
> 
>         non empty crossjoin([Commit Date].[1998].Children,
>         [Customer].[All Customers].[AMERICA].[UNITED STATES].Children)
>         ON rows
> 
>         from Orders
> 
>          
> 
>         I get:
> 
>          
> 
>         Caused by: _mondrian.olap.MondrianException_: Mondrian
>         Error:Internal error: Populating member cache with members for
>         [[Commit Date].[Month], [Customer.Customer].[Customer]]
> 
>               at
>         mondrian.resource.MondrianResource$_Def0.ex(_MondrianResource.java:755_)
> 
>               at mondrian.olap.Util.newInternal(_Util.java:1103_)
> 
>               at mondrian.olap.Util.newError(_Util.java:1119_)
> 
>               at
>         mondrian.rolap.SqlTupleReader.prepareTuples(_SqlTupleReader.java:415_)
> 
>               at
>         mondrian.rolap.SqlTupleReader.readTuples(_SqlTupleReader.java:441_)
> 
>               at
>         mondrian.rolap.RolapNativeSet$SetEvaluator.executeList(_RolapNativeSet.java:237_)
> 
>               at
>         mondrian.rolap.RolapNativeSet$SetEvaluator.executeIterable(_RolapNativeSet.java:159_)
> 
>               at
>         mondrian.rolap.RolapNativeSet$SetEvaluator.execute(_RolapNativeSet.java:142_)
> 
>               at
>         mondrian.olap.fun.CrossJoinFunDef$BaseIterCalc.evaluateIterable(_CrossJoinFunDef.java:241_)
> 
>               at
>         mondrian.calc.impl.AbstractIterCalc.evaluate(_AbstractIterCalc.java:53_)
> 
>               at
>         mondrian.rolap.RolapResult.executeAxis(_RolapResult.java:715_)
> 
>               at mondrian.rolap.RolapResult.evalLoad(_RolapResult.java:587_)
> 
>               at
>         mondrian.rolap.RolapResult.loadMembers(_RolapResult.java:565_)
> 
>               at mondrian.rolap.RolapResult.<init>(_RolapResult.java:251_)
> 
>               at
>         mondrian.rolap.RolapConnection.execute(_RolapConnection.java:414_)
> 
>               ... 27 more
> 
>         Caused by: _mondrian.olap.MondrianException_: Mondrian
>         Error:Internal error: join does not have alias
> 
>               at
>         mondrian.resource.MondrianResource$_Def0.ex(_MondrianResource.java:752_)
> 
>               at mondrian.olap.Util.newInternal(_Util.java:1096_)
> 
>               at
>         mondrian.olap.MondrianDef$Join.getAlias(_MondrianDef.java:2176_)
> 
>               at
>         mondrian.rolap.RolapHierarchy.rewriteRelationWithAliases(_RolapHierarchy.java:635_)
> 
>               at
>         mondrian.rolap.RolapHierarchy.relationSubset(_RolapHierarchy.java:554_)
> 
>               at
>         mondrian.rolap.RolapHierarchy.lookupRelationSubset(_RolapHierarchy.java:591_)
> 
>               at
>         mondrian.rolap.RolapHierarchy.relationSubset(_RolapHierarchy.java:548_)
> 
>               at
>         mondrian.rolap.RolapHierarchy.addToFrom(_RolapHierarchy.java:486_)
> 
>               at
>         mondrian.rolap.SqlTupleReader.addLevelMemberSql(_SqlTupleReader.java:747_)
> 
>               at
>         mondrian.rolap.SqlTupleReader.generateSelectForLevels(_SqlTupleReader.java:666_)
> 
>               at
>         mondrian.rolap.SqlTupleReader.makeLevelMembersSql(_SqlTupleReader.java:633_)
> 
>               at
>         mondrian.rolap.SqlTupleReader.prepareTuples(_SqlTupleReader.java:329_)
> 
>               ... 38 more
> 
>          
> 
>         This is happening at the point where we are going from the
>         lowest level join (nation, region) up to the next join with
>         customer.
> 
>          
> 
>         mondrian.native.nonempty.enable=false does not help.
> 
>          
> 
>          
> 
>         The structure in Mondrian HEAD is different, so now it does not
>         make a call to MondrianDef$Join.getAlias. The problem is there
>         in Mondrian 2.3.2.
> 
>          
> 
>         A fix I found was to change Mondrian.xml in the <Code> for the
>         Join element:
> 
>          
> 
>                     public String getAlias() {
> 
>                         //throw Util.newInternal("join does not have
>         alias");
> 
>                           return getLeftAlias();
> 
>                     }
> 
>          
> 
>         Change this line, and recompile the Mondrian.jar.
> 
>          
> 
>         Sherman
> 
>         ------------------------------------------------------------------------
> 
>         *From:* mondrian-bounces at pentaho.org
>         [mailto:mondrian-bounces at pentaho.org] *On Behalf Of *Julian Hyde
>         *Sent:* Wednesday, December 12, 2007 7:10 PM
>         *To:* 'Agustin Campos'
>         *Cc:* 'Mondrian developer mailing list'
>         *Subject:* RE: [Mondrian] NON EMPTY + exception
> 
>          
> 
>         An error stack would help. I would like to see where
>         MondrianDef.Join.getAlias() is being called from.
> 
>          
> 
>         I am guessing you have a snowflake schema?
> 
>          
> 
>         Also, you should be able to work around this problem by setting
>         mondrian.native.nonempty.enable=false in mondrian.properties.
> 
>          
> 
>         Julian
> 
>              
> 
>             ------------------------------------------------------------------------
> 
>             *From:* Agustin Campos [mailto:aguscampos at gmail.com]
>             *Sent:* Wednesday, December 12, 2007 12:04 AM
>             *To:* jhyde at pentaho.org; Mondrian developer mailing list
>             *Subject:* Re: [Mondrian] NON EMPTY + exception
> 
>             all <join ...> has their own alias (tables' names on database)
>             <join leftAlias... rightAlias... >
>                < table name="table1" />
>               <join ...>
>                    ...
>                   ....
>                </join>
>             </join>
> 
>             All <join > has their own alias (left and right).
>             <Table ....> doesn't.
> 
>             Thanks for your answer, but, do you have any more ideas?
> 
>             2007/12/12, Julian Hyde <jhyde at pentaho.org
>             <mailto:jhyde at pentaho.org>>:
> 
>             Heed the error message "join does not have alias" - check
>             all <Join> elements in your schema and make sure that
>             specify leftAlias and rightAlias, to make the join unambiguous.
> 
>                  
> 
>                 ------------------------------------------------------------------------
> 
>                 *From:* mondrian-bounces at pentaho.org
>                 <mailto:mondrian-bounces at pentaho.org>
>                 [mailto:mondrian-bounces at pentaho.org
>                 <mailto:mondrian-bounces at pentaho.org>] *On Behalf Of
>                 *Agustin Campos
>                 *Sent:* Tuesday, December 11, 2007 11:15 PM
>                 *To:* Mondrian developer mailing list
>                 *Subject:* [Mondrian] NON EMPTY + exception
> 
>                 Hello all:
> 
>                 I am having a little problem with my cube. It has 3
>                 dimensions.
> 
>                 The case is that, when I ask something like this:
> 
>                 select NON EMPTY
>                 Crossjoin(Hierarchize(Union({[Dim1].[All]},
>                 [Dim1].[All].Children)), {[Dim2].[All]}) ON COLUMNS,
>                   NON EMPTY {[Dim3].[All]} ON ROWS
>                 from [MyCube]
> 
>                 and then, I expand Dim1 I get an exception like this:
> 
>                 SqlMemberSource.getMemberChildren: executing sql [/ SQL
>                 query/ ], exec 46 ms, exec+fetch 46 ms, 7 rows
>                 com.tonbeller.jpivot.olap.model.OlapException:
>                 mondrian.olap.MondrianException: Mondrian Error:Internal
>                 error: Error while executing query: /MDX query which
>                 should work /
> 
>                 ...
>                 ...
>                 ...
>                 ...
> 
>                 Caused by: mondrian.olap.MondrianException: Mondrian
>                 Error:Internal error: join does not have alias
>                 ...
>                 ...
>                 ...
> 
>                 Do you have any idea about what could cause this
>                 problem? I know that I have not sent the complete log
>                 but, I hope this could help to troubleshoot the problem.
> 
>                 I remind: It ONLY happends when I ask for non empty
>                 cells. It DOES WORK when they are allowd.
> 
> 
>                 Thank you a lot:
> 
>                 Agustín
> 
> 
>             _______________________________________________
>             Mondrian mailing list
>             Mondrian at pentaho.org <mailto:Mondrian at pentaho.org>
>             http://lists.pentaho.org/mailman/listinfo/mondrian
> 
> 
> 
> 
>             -- 
>             Agustín Campos Muñoz
>             http://www.acampos.net
> 
>             Email: aguscamposENgmailPUNTOcom
>                        acamposENiiesPUNTOes
>             MSN Messenger: acampozENhotmailPUNTOcom
>             Skype: acamposnet
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Mondrian mailing list
> Mondrian at pentaho.org
> http://lists.pentaho.org/mailman/listinfo/mondrian




More information about the Mondrian mailing list