[Mondrian] SSB Schema

Julian Hyde jhyde at pentaho.com
Thu Dec 2 20:51:07 EST 2010


A few points:
 
1. Before we get it working in Hudson, I'd like to just get it working. If
we document the process of setting up, it can be done as a manual test.
 
2. Making it an automated process will be a different challenge. We will
need to cope with natural variance in the results. The variance will be
greater if we run on a VM, especially one with other tenants, but there will
always be variance. We can discuss approaches to dealing with variance; one
approach that springs to mind is bollinger bands (report whenever a result,
or a short-term moving average, moves 2 or 3 standard deviations from a
longer-term moving average). We can discuss others. I would also be inclined
to do it under the auspices of a project independent of mondrian. Maybe an
extension to junit for performance regression testing.
 
3. MySQL and Oracle are good dual platforms. I like the idea of using Oracle
as the main database. Not politically correct in the open source world, but
its performance is broadly representative of other databases (whereas MySQL
has some weak points for BI queries). We have Oracle available in the
Pentaho's Hudson environment.
 
4. The smallest size sounds good. (If we document the process, per 1, it
would be easy to run with other sizes.)
 
5. We manage without a java generator. The task of generating data sets and
loading them only has to be done once, and it can be a manual process.
 
Julian


  _____  

From: mondrian-bounces at pentaho.org [mailto:mondrian-bounces at pentaho.org] On
Behalf Of jeff.s.wright at thomsonreuters.com
Sent: Wednesday, December 01, 2010 12:31 PM
To: mondrian at pentaho.org
Subject: RE: [Mondrian] SSB Schema



I think it's useful to discuss what it would take to set this up for the
Mondrian Hudson server.

 

I'm not familiar with the continuous integration environment other than the
Hudson emails I see posted to the mailing list. Here are my assumptions and
guesses, please correct and add as needed:

 

.         I assume this is a Linux environment, maybe even virtual.

.         I assume this is a shared environment, meaning there are other
loads on the hardware, and that a performance regression test would have to
have some way of self-calibrating or at least have tolerances.

.         I assume that an open source database is preferred for testing. I
had the students work with mysql. This is a little less than ideal from my
selfish point of view. One performance tweak that's important to us is
grouping sets. I know that is available for Oracle, I was assuming not for
mysql.

.         We should agree to a target database size. The smallest TPC-DS
database size is 1GB. The data model includes at least one dimension with >
1M rows. The students have been working with the smallest size. I was hoping
to kick the tires some with their final test setup and see if that is indeed
big enough to get interesting query performance. I suspect it is. 

 

I'm actually trying to see if I can arrange with my company and the
university to sponsor another semester of work on the performance test
harness. Most of the fall semester was consumed with the mechanics of the
technology stack: mysql, mondrian, jmeter, mdx. I think another semester's
work could take this to an actual regression test - convert results to a
pass/fail.

 

Btw, the data generator generated some discussion before. The TPC-DS data
generator is written in C. The students spent some time investigating an
automated java port, but hit dead ends. It looks like that would be a manual
effort. I'm not really considering that right now.

 

--jeff

 

From: mondrian-bounces at pentaho.org [mailto:mondrian-bounces at pentaho.org] On
Behalf Of Luc Boudreau
Sent: Tuesday, November 30, 2010 9:09 AM
To: Mondrian developer mailing list
Subject: Re: [Mondrian] SSB Schema

 

Hi Jeff,

I saw your emails but figured I'd reach out at large. I'm very happy to hear
that your project went forward. Would your students like to contribute the
test suite back to the Mondrian project? I could serve as a contact for them
so we can get this thing done once their work is finished.

Cheers!

Luc



On Tue, Nov 30, 2010 at 9:01 AM, <jeff.s.wright at thomsonreuters.com> wrote:

I've been working with some CS students at NCSU to create a Mondrian schema
for the TPC-DS database for use in performance testing. We chose TPC-DS over
SSB because it's a larger data model that enables us to test virtual cubes.
They're also creating a set of test queries and a JMeter test script to
execute them via XMLA.

 

They're scheduled to be done with their project on Dec 10.

 

--Jeff Wright

 

From: mondrian-bounces at pentaho.org [mailto:mondrian-bounces at pentaho.org] On
Behalf Of Luc Boudreau
Sent: Tuesday, November 30, 2010 8:54 AM
To: Mondrian developer mailing list
Subject: [Mondrian] SSB Schema

 

Hi everyone,

We are looking at adding a performance benchmark into Mondrian's test suite.
Is there anyone who wrote a Mondrian schema for the Star Schema Benchmark
(SSB)? If so, would you share it with the community?

Thanks and please pass the word around!


_______________________________________________
Mondrian mailing list
Mondrian at pentaho.org
http://lists.pentaho.org/mailman/listinfo/mondrian

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20101202/df7b7c4a/attachment.html 


More information about the Mondrian mailing list