[Mondrian] Mondrian Performance Test Harness

Julian Hyde jhyde at pentaho.com
Mon Aug 23 14:35:29 EDT 2010

Thanks for raising this. It's something we have needed for a long time.
See my comments inline.

 Jeff wrote: 

I wanted to describe an idea for creating a Mondrian Performance Test
Harness and see if there was any input from the community. This is more than
"maybe I'll get around to this" - the company I work for has a relationship
with the CS department of a nearby university, and I've arranged to use this
as a senior design project for a student team this semester.


Here's the concept... Because we use Mondrian in a performance sensitive
application (is there some other kind?), I wanted to figure out a way to
have better regression test coverage of performance and throughput. What I
have in mind is:


* A new test database besides FoodMart with a scalable data generator (able
to generate different size databases). 

Agreed. A scalable database generator is essential. FoodMart is too small
for serious performance testing, and no one wants to download a 10GB
database. Ergo, we need a generator.

* A Mondrian schema for this database.

* A set of MDX queries that exercise the engine.

* A JMeter test script to send these queries as XMLA requests as a single or
multi-threaded workload. 

I was thinking of using TPC-DS (http://www.tpc.org/tpcds/default.asp) as the
database and data generator. I can't tell if this benchmark is still being
worked on, but there's a download with a set of tools. The data model
contains multiple fact tables, so it should be possible to exercise virtual
cubes, which is important to me. 

As far as I can tell TPC-DS is not being actively used. Others have
discussed issue of which benchmark to use. The most popular seems to be the
Star Schema benchmark. See e.g.
ht-infinidb-and-luciddb/, which has a discussion of the merits of various
benchmarks. I'm fairly sure that someone has created a mondrian schema for
the Star Schema benchmark.


I would like to include this performance suite in mondrian's distribution as
an optional set of tests. That implies that you should provide a fairly easy
way to load the data set onto any database and instructions for how to set
up the test harness.


The other thing that I would like is the ability to do performance
regression tests. That is, run the same tests regularly on the source code,
and detect when a developer makes a change that damages performance. Test
running times contain random noise -- a test might run slower one day for a
variety of reasons such as cosmic rays striking the hard disk drive -- and
so the test infrastructure would need only report degradations when several
runs of the test have been significantly slower. This would entail keeping a
database of historic performance stats, and computing say the standard
deviation of each number.


A framework for performance regression testing -- say as an extension to
junit -- could be a nice research/open source project for someone.




-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20100823/3dd8b6dc/attachment.html 

More information about the Mondrian mailing list