[Mondrian] Mondrian Performance Test Harness
jeff.s.wright at thomsonreuters.com
jeff.s.wright at thomsonreuters.com
Mon Aug 23 15:37:00 EDT 2010
>As far as I can tell TPC-DS is not being actively used. Others have
discussed issue of which benchmark to use. The most
>popular seems to be the Star Schema benchmark. See e.g.
obright-infinidb-and-luciddb/, which has a
>discussion of the merits of various benchmarks. I'm fairly sure that
someone has created a mondrian schema for the Star Schema
Good references, I've seen those. SSB is a snowflake model with a single
fact table. I'd prefer to be able to test Virtual cubes. I came down on
the side of TPC-DS because it seemed more meaty as a data model and
there was code to support it, even if it wasn't finished or actively
used as a TPC benchmark.
But I'm still open to ideas, and it could turn out that the code doesn't
really work for TPC-DS.
>A scalable database generator is essential
Unfortunately both SSB and TPC-DS use data generators written in C, but
we may be able to take on porting that to Java as scope for this
semester or a follow on project.
>The other thing that I would like is the ability to do performance
Agreed. In addition to the points you raise, I'm also interested in the
question of how to get a multi-user throughput test that doesn't
degenerate into responding to all queries out of cache. Both of these
benchmarks have some provision for parameterized queries. I would be
curious to see some experiments on whether randomly parameterized
queries would return consistent throughput measurements.
From: mondrian-bounces at pentaho.org [mailto:mondrian-bounces at pentaho.org]
On Behalf Of Julian Hyde
Sent: Monday, August 23, 2010 2:35 PM
To: 'Mondrian developer mailing list'
Subject: RE: [Mondrian] Mondrian Performance Test Harness
Thanks for raising this. It's something we have needed for a long time.
See my comments inline.
I wanted to describe an idea for creating a Mondrian Performance
Test Harness and see if there was any input from the community. This is
more than "maybe I'll get around to this" - the company I work for has a
relationship with the CS department of a nearby university, and I've
arranged to use this as a senior design project for a student team this
Here's the concept... Because we use Mondrian in a performance
sensitive application (is there some other kind?), I wanted to figure
out a way to have better regression test coverage of performance and
throughput. What I have in mind is:
* A new test database besides FoodMart with a scalable data
generator (able to generate different size databases).
Agreed. A scalable database generator is essential. FoodMart is too
small for serious performance testing, and no one wants to download a
10GB database. Ergo, we need a generator.
* A Mondrian schema for this database.
* A set of MDX queries that exercise the engine.
* A JMeter test script to send these queries as XMLA requests as
a single or multi-threaded workload.
I was thinking of using TPC-DS
(http://www.tpc.org/tpcds/default.asp) as the database and data
generator. I can't tell if this benchmark is still being worked on, but
there's a download with a set of tools. The data model contains multiple
fact tables, so it should be possible to exercise virtual cubes, which
is important to me.
As far as I can tell TPC-DS is not being actively used. Others have
discussed issue of which benchmark to use. The most popular seems to be
the Star Schema benchmark. See e.g.
obright-infinidb-and-luciddb/> , which has a discussion of the merits of
various benchmarks. I'm fairly sure that someone has created a mondrian
schema for the Star Schema benchmark.
I would like to include this performance suite in mondrian's
distribution as an optional set of tests. That implies that you should
provide a fairly easy way to load the data set onto any database and
instructions for how to set up the test harness.
The other thing that I would like is the ability to do performance
regression tests. That is, run the same tests regularly on the source
code, and detect when a developer makes a change that damages
performance. Test running times contain random noise -- a test might run
slower one day for a variety of reasons such as cosmic rays striking the
hard disk drive -- and so the test infrastructure would need only report
degradations when several runs of the test have been significantly
slower. This would entail keeping a database of historic performance
stats, and computing say the standard deviation of each number.
A framework for performance regression testing -- say as an extension to
junit -- could be a nice research/open source project for someone.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Mondrian