[Mondrian] Mondrian Performance Test Harness

Mon Aug 23 15:37:00 EDT 2010

>As far as I can tell TPC-DS is not being actively used. Others have
discussed issue of which benchmark to use. The most 

>popular seems to be the Star Schema benchmark. See e.g. 

>http://www.mysqlperformanceblog.com/2010/01/07/star-schema-bechmark-inf
obright-infinidb-and-luciddb/, which has a 

>discussion of the merits of various benchmarks. I'm fairly sure that
someone has created a mondrian schema for the Star Schema 

>benchmark.

Good references, I've seen those. SSB is a snowflake model with a single
fact table. I'd prefer to be able to test Virtual cubes. I came down on
the side of TPC-DS because it seemed more meaty as a data model and
there was code to support it, even if it wasn't finished or actively
used as a TPC benchmark.

But I'm still open to ideas, and it could turn out that the code doesn't
really work for TPC-DS.

>A scalable database generator is essential

Unfortunately both SSB and TPC-DS use data generators written in C, but
we may be able to take on porting that to Java as scope for this
semester or a follow on project.

>The other thing that I would like is the ability to do performance
regression tests...

Agreed. In addition to the points you raise, I'm also interested in the
question of how to get a multi-user throughput test that doesn't
degenerate into responding to all queries out of cache. Both of these
benchmarks have some provision for parameterized queries. I would be
curious to see some experiments on whether randomly parameterized
queries would return consistent throughput measurements.

--jeff

From: mondrian-bounces at pentaho.org [mailto:mondrian-bounces at pentaho.org]
On Behalf Of Julian Hyde
Sent: Monday, August 23, 2010 2:35 PM
To: 'Mondrian developer mailing list'
Subject: RE: [Mondrian] Mondrian Performance Test Harness

Jeff,

Thanks for raising this. It's something we have needed for a long time.

See my comments inline.

	 Jeff wrote: 

	I wanted to describe an idea for creating a Mondrian Performance
Test Harness and see if there was any input from the community. This is
more than "maybe I'll get around to this" - the company I work for has a
relationship with the CS department of a nearby university, and I've
arranged to use this as a senior design project for a student team this
semester.

	Here's the concept... Because we use Mondrian in a performance
sensitive application (is there some other kind?), I wanted to figure
out a way to have better regression test coverage of performance and
throughput. What I have in mind is:

	* A new test database besides FoodMart with a scalable data
generator (able to generate different size databases). 

Agreed. A scalable database generator is essential. FoodMart is too
small for serious performance testing, and no one wants to download a
10GB database. Ergo, we need a generator.

	* A Mondrian schema for this database.

	* A set of MDX queries that exercise the engine.

	* A JMeter test script to send these queries as XMLA requests as
a single or multi-threaded workload. 

	I was thinking of using TPC-DS
(http://www.tpc.org/tpcds/default.asp) as the database and data
generator. I can't tell if this benchmark is still being worked on, but
there's a download with a set of tools. The data model contains multiple
fact tables, so it should be possible to exercise virtual cubes, which
is important to me. 

As far as I can tell TPC-DS is not being actively used. Others have
discussed issue of which benchmark to use. The most popular seems to be
the Star Schema benchmark. See e.g.
http://www.mysqlperformanceblog.com/2010/01/07/star-schema-bechmark-info
bright-infinidb-and-luciddb/
<http://www.mysqlperformanceblog.com/2010/01/07/star-schema-bechmark-inf
obright-infinidb-and-luciddb/> , which has a discussion of the merits of
various benchmarks. I'm fairly sure that someone has created a mondrian
schema for the Star Schema benchmark.

I would like to include this performance suite in mondrian's
distribution as an optional set of tests. That implies that you should
provide a fairly easy way to load the data set onto any database and
instructions for how to set up the test harness.

The other thing that I would like is the ability to do performance
regression tests. That is, run the same tests regularly on the source
code, and detect when a developer makes a change that damages
performance. Test running times contain random noise -- a test might run
slower one day for a variety of reasons such as cosmic rays striking the
hard disk drive -- and so the test infrastructure would need only report
degradations when several runs of the test have been significantly
slower. This would entail keeping a database of historic performance
stats, and computing say the standard deviation of each number.

A framework for performance regression testing -- say as an extension to
junit -- could be a nice research/open source project for someone.

Julian

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20100823/db271b1f/attachment.html