[Mondrian] RE: TPC-H
Julian Hyde
jhyde at pentaho.org
Wed Dec 26 02:18:08 EST 2007
> > John Sichi wrote:
> > For (2), what scale were you thinking about checking in? I'd like to
> > keep massive files out of the Perforce depot. Instead, we could host
> > large files at sourceforge.net and have the ant scripts automatically
> > download and unpack them.
> Sherman Wood wrote:
>
> For completeness, a small set would be checked in - Foodmart size? The
> main aim would be to have the Loader adding indexes,
> aggregate tables etc
> to an existing TPC-H database.
I hope we never have to create large files for people to download.
Short term, it's ok if a small set is checked into perforce - no larger than
FoodMartCreateData.zip, which is 4.8MB compressed.
Longer term, when we allow the benchmark to scale to larger data sets, we
will obviously need to include a java version of dbgen, at which time so we
can probably obsolete the small data set. I don't particularly mind whether
the loader program generates temporary files or direct loads into the dbms -
I suspect it will do both eventually.
John suggested creating a benchmark directory, but we already have the demo
directory with subdirectories for specific databases access and derby, so
let's use that.
Julian
> -----Original Message-----
> From: mondrian-bounces at pentaho.org
> [mailto:mondrian-bounces at pentaho.org] On Behalf Of Sherman Wood
> Sent: Tuesday, December 25, 2007 11:25 AM
> To: 'Mondrian developer mailing list'
> Subject: RE: [Mondrian] RE: TPC-H
>
>
>
>
> Sherman
>
> -----Original Message-----
> From: mondrian-bounces at pentaho.org
> [mailto:mondrian-bounces at pentaho.org]
> On Behalf Of John V. Sichi
> Sent: Tuesday, December 25, 2007 10:51 AM
> To: Mondrian developer mailing list
> Subject: Re: [Mondrian] RE: TPC-H
>
>
> JVS
>
> Sherman Wood wrote:
> > Good feedback.
> >
> >
> >
> > Because of the different load factors, it sounds like the
> > MondrianTpchLoader should at some level use the DBGEN algorithm to
> > generate the data, rather than have the data as a fixed set
> stored in
> > Perforce. I will not get to that day one, and instead do a set of
> > conversions to Mondrianize an existing TPC-H database.
> >
> >
> >
> > I think this will come in a few stages.
> >
> > 1. Documentation, schemas and queries working with a
> TPC-H data set
> > 2. Test suite + the TpchLoader Mondrianizing an existing
> TPC-H data
> set
> > 3. DBGEN algorithm based TpchLoader
> >
> >
> >
> > Ill be working on the tests with our partner during
> January, and should
>
> > be able to get to the 1^st step after that.
> >
> >
> >
> >
> >
> > Sherman
> >
> >
> --------------------------------------------------------------
> ----------
> >
> > *From:* mondrian-bounces at pentaho.org
> > [mailto:mondrian-bounces at pentaho.org] *On Behalf Of *Julian Hyde
> > *Sent:* Sunday, December 23, 2007 7:06 AM
> > *To:* 'Mondrian developer mailing list'
> > *Subject:* [Mondrian] RE: TPC-H
> >
> >
> >
> > It would be very exciting an useful to include TPC-H
> alongside FoodMart
> > as a test dataset. First, it will give use testing coverage on a
> > different kind of schema; second, it will allow us to do some
> > performance benchmarks, because the dataset can easily be
> scaled up to
> > larger data sizes. Lastly, it could be the basis for performance
> > regression tests (i.e. tests which detect that we have
> gotten slower).
> >
> >
> >
> > I'd like to replicate the foodmart pattern as far as possible:
> >
> > * demo/Tpch.xml schema file
> > * testsrc/.../MondrianTpchLoader.java loader program
> > * instructions in install.html describing how to load the data
> > * tests in the junit regression suite which run if the
> database is
> > present
> > * aggregate tables, created by the loader program and
> populated by a
> > SQL script (compare
> testsrc/main/mondrian/test/loader/insert.sql)
> >
> > I think these steps could be done quite easily.
> >
> >
> >
> > There are a few differences:
> >
> > * We don't require the TPC-H data set to be present in
> order for the
> > regression suite to run; if it is not present, tests
> will recuse
> > themselves (and trivially succeed)
> > * The TPC-H benchmark allows the database to be created
> at different
> > scales (e.g. one with 100MB, one with 1GB). The tests
> should know
> > which scale of data they are running against. Some
> tests might be
> > able to run on any scale of the data, other tests
> might only run
> > on one particular scale, and recuse themselves
> (trivially succeed)
> > if the desired data set is not present.
> > * Sherman proposes a variant to the TPC-H schema where the
> > line_items table is more like a fact table. I don't
> want to mess
> > with the 'official' table structure, so why not create an
> > additional table LINE_ITEM_FACT? The mondrian TPC-H
> schema could
> > have cubes based on each line item table, to compare.
> >
> > The trickiest thing will be passing in the connect string
> to the test
> > suite. A lot of the mondrian test suite assumes that there
> is just one
> > connection, the connection to foodmart. It is reasonable to
> assume that
> > tpch, if it exists, is in the same database as foodmart,
> but we could
> > not assume that it is in the same schema. The foodmart
> connect string is
>
> > variously read from mondrian.properties, or read from
> build.properties
> > and passed in when ant invokes junit, or pieced together from other
> > properties.
> >
> >
> >
> > Julian
> >
> >
> >
> >
> --------------------------------------------------------------
> ----------
> >
> > *From:* mondrian-bounces at pentaho.org
> > [mailto:mondrian-bounces at pentaho.org] *On Behalf Of
> *Sherman Wood
> > *Sent:* Friday, December 21, 2007 5:57 PM
> > *To:* jhyde at pentaho.org; 'Agustin Campos'
> > *Cc:* mondrian at pentaho.org
> > *Subject:* RE: [Mondrian] NON EMPTY + exception
> >
> > Ill test it out against HEAD and raise a bug if needed.
> >
> >
> >
> > Yeah, we talked about doing a scalability test using
> TPC-H a while
> > ago, and now I am getting some help with that from a
> partner who has
> > a large environment machines, disk, database.
> >
> >
> >
> > I am just developing the schema now. I had few issues with the
> > snowflakey-ness, not the least of which is that it is a
> > transactional model not optimized in the way that you
> would like for
> > Mondrian. I had a few issues with Mondrian dealing with
> snowflakes
> > it was not clear how to do joins of joins, getting the
> aliases right
> > etc.
> >
> >
> >
> > With the schema almost together, I am working on simulating the
> > TPC-H test queries in MDX, most of which are easy.
> >
> >
> >
> > I could contribute this back, but how best to do it? It
> would make
> > sense to have an example TPC-H database in Mondrian in
> the same way
> > as we have Foodmart today, and a test suite against that.
> >
> >
> >
> > I am also looking at using a few simple transforms of the TPC-H
> > database that make it more suitable for Mondrian less
> snowflakey,
> > making the linetiem table a real fact table by adding
> keys on it and
> > adding a few indexes. These additions could also be
> included with a
> > separate schema.
> >
> >
> >
> >
> >
> > Sherman
> >
> >
> --------------------------------------------------------------
> ----------
> >
> > *From:* Julian Hyde [mailto:jhyde at pentaho.org]
> > *Sent:* Friday, December 21, 2007 6:05 PM
> > *To:* 'Sherman Wood'; 'Agustin Campos'
> > *Cc:* mondrian at pentaho.org
> > *Subject:* RE: [Mondrian] NON EMPTY + exception
> >
> >
> >
> > Sherman/Agustin,
> >
> >
> >
> > Can one of you please log a bug for this? I know I just
> removed the
> > getAlias() method that Sherman referred to, but I don't think I
> > fixed the bug. I'll need a testcase to make sure.
> >
> >
> >
> > Sherman,
> >
> >
> >
> > I've been trying to persuade someone to contribute a TPC-H test
> > suite for a long time, and it seems like you have one
> working. Can
> > you contribute it? It would serve as a good test of mondrian's
> > scalability and (as this case shows) would improve coverage on a
> > very-snowflakey schema.
> >
> >
> >
> > Julian
> >
> >
> >
> >
> --------------------------------------------------------------
> ----------
> >
> > *From:* Sherman Wood [mailto:swood at jaspersoft.com]
> > *Sent:* Thursday, December 20, 2007 6:47 PM
> > *To:* jhyde at pentaho.org; 'Mondrian developer mailing list';
> > 'Agustin Campos'
> > *Subject:* RE: [Mondrian] NON EMPTY + exception
> >
> > I have this issue too.
> >
> >
> >
> > From the TPC-H data set, I have a dimension:
> >
> >
> >
> > <Dimension type="StandardDimension"
> name="Customer">
> >
> > <Hierarchy name="Customer"
> hasAll="true"
> > allMemberName="All Customers" primaryKey="o_orderkey"
> > primaryKeyTable="orders">
> >
> > <Join
> leftKey="o_custkey"
> > rightAlias="customer" rightKey="c_custkey">
> >
> > <Table
> > name="orders">
> >
> > </Table>
> >
> > <Join
> > leftAlias="customer" leftKey="c_nationkey"
> rightAlias="c_nation"
> > rightKey="n_nationkey">
> >
> >
> > <Table name="customer">
> >
> >
> </Table>
> >
> >
> > <Join leftAlias="c_nation" leftKey="n_regionkey"
> > rightAlias="c_region" rightKey="r_regionkey">
> >
> >
>
> > <Table name="nation" alias="c_nation">
> >
> >
>
> > </Table>
> >
> >
>
> > <Table name="region" alias="c_region">
> >
> >
>
> > </Table>
> >
> >
> </Join>
> >
> > </Join>
> >
> > </Join>
> >
> > <Level name="Region"
> > table="c_region" column="r_name" type="String"
> > uniqueMembers="true" levelType="Regular"
> hideMemberIf="Never">
> >
> > </Level>
> >
> > <Level name="Nation"
> > table="c_nation" column="n_name" type="String"
> > uniqueMembers="true" levelType="Regular"
> hideMemberIf="Never">
> >
> > </Level>
> >
> > <Level name="Customer"
> > table="customer" column="c_name" type="String"
> > uniqueMembers="false" levelType="Regular"
> hideMemberIf="Never">
> >
> > </Level>
> >
> > </Hierarchy>
> >
> > </Dimension>
> >
> >
> >
> > With the MDX:
> >
> >
> >
> > select {[Measures].[Price]} on columns,
> >
> > non empty crossjoin([Commit Date].[1998].Children,
> > [Customer].[All Customers].[AMERICA].[UNITED
> STATES].Children)
> > ON rows
> >
> > from Orders
> >
> >
> >
> > I get:
> >
> >
> >
> > Caused by: _mondrian.olap.MondrianException_: Mondrian
> > Error:Internal error: Populating member cache with
> members for
> > [[Commit Date].[Month], [Customer.Customer].[Customer]]
> >
> > at
> >
> mondrian.resource.MondrianResource$_Def0.ex(_MondrianResource.
> java:755_)
> >
> > at mondrian.olap.Util.newInternal(_Util.java:1103_)
> >
> > at mondrian.olap.Util.newError(_Util.java:1119_)
> >
> > at
> >
> mondrian.rolap.SqlTupleReader.prepareTuples(_SqlTupleReader.java:415_)
> >
> > at
> >
> mondrian.rolap.SqlTupleReader.readTuples(_SqlTupleReader.java:441_)
> >
> > at
> >
> mondrian.rolap.RolapNativeSet$SetEvaluator.executeList(_RolapN
> ativeSet.jav
> a:237_)
> >
> > at
> >
> mondrian.rolap.RolapNativeSet$SetEvaluator.executeIterable(_Ro
> lapNativeSet
> .java:159_)
> >
> > at
> >
> mondrian.rolap.RolapNativeSet$SetEvaluator.execute(_RolapNativ
> eSet.java:14
> 2_)
> >
> > at
> >
> mondrian.olap.fun.CrossJoinFunDef$BaseIterCalc.evaluateIterabl
> e(_CrossJoin
> FunDef.java:241_)
> >
> > at
> >
> mondrian.calc.impl.AbstractIterCalc.evaluate(_AbstractIterCalc
> .java:53_)
> >
> > at
> >
> mondrian.rolap.RolapResult.executeAxis(_RolapResult.java:715_)
> >
> > at
> mondrian.rolap.RolapResult.evalLoad(_RolapResult.java:587_)
> >
> > at
> >
> mondrian.rolap.RolapResult.loadMembers(_RolapResult.java:565_)
> >
> > at
> mondrian.rolap.RolapResult.<init>(_RolapResult.java:251_)
> >
> > at
> >
> mondrian.rolap.RolapConnection.execute(_RolapConnection.java:414_)
> >
> > ... 27 more
> >
> > Caused by: _mondrian.olap.MondrianException_: Mondrian
> > Error:Internal error: join does not have alias
> >
> > at
> >
> mondrian.resource.MondrianResource$_Def0.ex(_MondrianResource.
> java:752_)
> >
> > at mondrian.olap.Util.newInternal(_Util.java:1096_)
> >
> > at
> >
> mondrian.olap.MondrianDef$Join.getAlias(_MondrianDef.java:2176_)
> >
> > at
> >
> mondrian.rolap.RolapHierarchy.rewriteRelationWithAliases(_Rola
> pHierarchy.j
> ava:635_)
> >
> > at
> >
> mondrian.rolap.RolapHierarchy.relationSubset(_RolapHierarchy.j
> ava:554_)
> >
> > at
> >
> mondrian.rolap.RolapHierarchy.lookupRelationSubset(_RolapHiera
> rchy.java:59
> 1_)
> >
> > at
> >
> mondrian.rolap.RolapHierarchy.relationSubset(_RolapHierarchy.j
> ava:548_)
> >
> > at
> >
> mondrian.rolap.RolapHierarchy.addToFrom(_RolapHierarchy.java:486_)
> >
> > at
> >
> mondrian.rolap.SqlTupleReader.addLevelMemberSql(_SqlTupleReade
> r.java:747_)
> >
> > at
> >
> mondrian.rolap.SqlTupleReader.generateSelectForLevels(_SqlTupl
> eReader.java
> :666_)
> >
> > at
> >
> mondrian.rolap.SqlTupleReader.makeLevelMembersSql(_SqlTupleRea
> der.java:633
> _)
> >
> > at
> >
> mondrian.rolap.SqlTupleReader.prepareTuples(_SqlTupleReader.java:329_)
> >
> > ... 38 more
> >
> >
> >
> > This is happening at the point where we are going from the
> > lowest level join (nation, region) up to the next join with
> > customer.
> >
> >
> >
> > mondrian.native.nonempty.enable=false does not help.
> >
> >
> >
> >
> >
> > The structure in Mondrian HEAD is different, so now
> it does not
> > make a call to MondrianDef$Join.getAlias. The
> problem is there
> > in Mondrian 2.3.2.
> >
> >
> >
> > A fix I found was to change Mondrian.xml in the
> <Code> for the
> > Join element:
> >
> >
> >
> > public String getAlias() {
> >
> > //throw Util.newInternal("join does not have
> > alias");
> >
> > return getLeftAlias();
> >
> > }
> >
> >
> >
> > Change this line, and recompile the Mondrian.jar.
> >
> >
> >
> > Sherman
> >
> >
> --------------------------------------------------------------
> ----------
> >
> > *From:* mondrian-bounces at pentaho.org
> > [mailto:mondrian-bounces at pentaho.org] *On Behalf Of
> *Julian Hyde
> > *Sent:* Wednesday, December 12, 2007 7:10 PM
> > *To:* 'Agustin Campos'
> > *Cc:* 'Mondrian developer mailing list'
> > *Subject:* RE: [Mondrian] NON EMPTY + exception
> >
> >
> >
> > An error stack would help. I would like to see where
> > MondrianDef.Join.getAlias() is being called from.
> >
> >
> >
> > I am guessing you have a snowflake schema?
> >
> >
> >
> > Also, you should be able to work around this
> problem by setting
> > mondrian.native.nonempty.enable=false in
> mondrian.properties.
> >
> >
> >
> > Julian
> >
> >
> >
> >
> --------------------------------------------------------------
> ----------
> >
> > *From:* Agustin Campos [mailto:aguscampos at gmail.com]
> > *Sent:* Wednesday, December 12, 2007 12:04 AM
> > *To:* jhyde at pentaho.org; Mondrian developer mailing list
> > *Subject:* Re: [Mondrian] NON EMPTY + exception
> >
> > all <join ...> has their own alias (tables' names on
> database)
> > <join leftAlias... rightAlias... >
> > < table name="table1" />
> > <join ...>
> > ...
> > ....
> > </join>
> > </join>
> >
> > All <join > has their own alias (left and right).
> > <Table ....> doesn't.
> >
> > Thanks for your answer, but, do you have any more ideas?
> >
> > 2007/12/12, Julian Hyde <jhyde at pentaho.org
> > <mailto:jhyde at pentaho.org>>:
> >
> > Heed the error message "join does not have
> alias" - check
> > all <Join> elements in your schema and make sure that
> > specify leftAlias and rightAlias, to make the join
> unambiguous.
> >
> >
> >
> >
> --------------------------------------------------------------
> ----------
> >
> > *From:* mondrian-bounces at pentaho.org
> > <mailto:mondrian-bounces at pentaho.org>
> > [mailto:mondrian-bounces at pentaho.org
> > <mailto:mondrian-bounces at pentaho.org>] *On Behalf Of
> > *Agustin Campos
> > *Sent:* Tuesday, December 11, 2007 11:15 PM
> > *To:* Mondrian developer mailing list
> > *Subject:* [Mondrian] NON EMPTY + exception
> >
> > Hello all:
> >
> > I am having a little problem with my cube. It has 3
> > dimensions.
> >
> > The case is that, when I ask something like this:
> >
> > select NON EMPTY
> > Crossjoin(Hierarchize(Union({[Dim1].[All]},
> > [Dim1].[All].Children)), {[Dim2].[All]}) ON COLUMNS,
> > NON EMPTY {[Dim3].[All]} ON ROWS
> > from [MyCube]
> >
> > and then, I expand Dim1 I get an exception
> like this:
> >
> > SqlMemberSource.getMemberChildren:
> executing sql [/ SQL
> > query/ ], exec 46 ms, exec+fetch 46 ms, 7 rows
> > com.tonbeller.jpivot.olap.model.OlapException:
> > mondrian.olap.MondrianException: Mondrian
> Error:Internal
> > error: Error while executing query: /MDX query which
> > should work /
> >
> > ...
> > ...
> > ...
> > ...
> >
> > Caused by: mondrian.olap.MondrianException: Mondrian
> > Error:Internal error: join does not have alias
> > ...
> > ...
> > ...
> >
> > Do you have any idea about what could cause this
> > problem? I know that I have not sent the
> complete log
> > but, I hope this could help to troubleshoot
> the problem.
> >
> > I remind: It ONLY happends when I ask for non empty
> > cells. It DOES WORK when they are allowd.
> >
> >
> > Thank you a lot:
> >
> > Agustín
> >
> >
> > _______________________________________________
> > Mondrian mailing list
> > Mondrian at pentaho.org <mailto:Mondrian at pentaho.org>
> > http://lists.pentaho.org/mailman/listinfo/mondrian
> >
> >
> >
> >
> > --
> > Agustín Campos Muñoz
> > http://www.acampos.net
> >
> > Email: aguscamposENgmailPUNTOcom
> > acamposENiiesPUNTOes
> > MSN Messenger: acampozENhotmailPUNTOcom
> > Skype: acamposnet
> >
> >
> >
> --------------------------------------------------------------
> ----------
> >
> > _______________________________________________
> > Mondrian mailing list
> > Mondrian at pentaho.org
> > http://lists.pentaho.org/mailman/listinfo/mondrian
>
> _______________________________________________
> Mondrian mailing list
> Mondrian at pentaho.org
> http://lists.pentaho.org/mailman/listinfo/mondrian
>
> _______________________________________________
> Mondrian mailing list
> Mondrian at pentaho.org
> http://lists.pentaho.org/mailman/listinfo/mondrian
>
More information about the Mondrian
mailing list