[Mondrian] SSB Schema

jeff.s.wright at thomsonreuters.com jeff.s.wright at thomsonreuters.com
Fri Dec 3 08:37:21 EST 2010


>1. Before we get it working in Hudson, I'd like to just get it working.
If we document the process of setting up, it can be done as a manual
test.

 

A documented procedure is part of the students' assignment.

 

>2. Making it an automated process will be a different challenge.

 

I like the idea of a moving average and a tolerance. It would be
interesting to try to monitor system load and use that to calibrate the
pass/fail criteria, but I suspect that gets into the realm of PhD
research when you start looking at distributed environments and VMs. 

 

>3. MySQL and Oracle are good dual platforms. I like the idea of using
Oracle as the main database. Not politically correct in the open source
world, but its performance is broadly representative of other databases
(whereas MySQL has some weak points for BI queries).

 

Delighted to hear that, because I've seen issues with mysql too.

 

A few more ideas:

 

6. None of the students had prior knowledge of MDX, so the initial set
of queries is going to be simplistic. I'd like to see this ultimately
based on a directory of queries, similar to the Mondrian JUnit tests.
That would give us a chance to develop queries that exercise features
that are important to us (like native evaluation, large schemas, large
result sets, virtual cubes, grouping sets) and others could contribute
queries in the areas where they have interest.

 

7. I'd like to see both a single user query performance test and a
multi-user throughput test. Multi-user has some interesting problems
because of caching. The brute force solution is to do multi-user testing
with cache off, but there might be other approaches that enable a mix of
cached/non-cached queries. I'm thinking about parameterizing queries
with random parameters, or randomizing the order of queries in different
threads. It would be interesting to see how consistent (or not)
throughput measurements would be in those cases. 

 

--jeff

 

From: mondrian-bounces at pentaho.org [mailto:mondrian-bounces at pentaho.org]
On Behalf Of Julian Hyde
Sent: Thursday, December 02, 2010 8:51 PM
To: 'Mondrian developer mailing list'
Subject: RE: [Mondrian] SSB Schema

 

A few points:

 

1. Before we get it working in Hudson, I'd like to just get it working.
If we document the process of setting up, it can be done as a manual
test.

 

2. Making it an automated process will be a different challenge. We will
need to cope with natural variance in the results. The variance will be
greater if we run on a VM, especially one with other tenants, but there
will always be variance. We can discuss approaches to dealing with
variance; one approach that springs to mind is bollinger bands (report
whenever a result, or a short-term moving average, moves 2 or 3 standard
deviations from a longer-term moving average). We can discuss others. I
would also be inclined to do it under the auspices of a project
independent of mondrian. Maybe an extension to junit for performance
regression testing.

 

3. MySQL and Oracle are good dual platforms. I like the idea of using
Oracle as the main database. Not politically correct in the open source
world, but its performance is broadly representative of other databases
(whereas MySQL has some weak points for BI queries). We have Oracle
available in the Pentaho's Hudson environment.

 

4. The smallest size sounds good. (If we document the process, per 1, it
would be easy to run with other sizes.)

 

5. We manage without a java generator. The task of generating data sets
and loading them only has to be done once, and it can be a manual
process.

 

Julian

	 

________________________________

	From: mondrian-bounces at pentaho.org
[mailto:mondrian-bounces at pentaho.org] On Behalf Of
jeff.s.wright at thomsonreuters.com
	Sent: Wednesday, December 01, 2010 12:31 PM
	To: mondrian at pentaho.org
	Subject: RE: [Mondrian] SSB Schema

	I think it's useful to discuss what it would take to set this up
for the Mondrian Hudson server.

	 

	I'm not familiar with the continuous integration environment
other than the Hudson emails I see posted to the mailing list. Here are
my assumptions and guesses, please correct and add as needed:

	 

	*         I assume this is a Linux environment, maybe even
virtual.

	*         I assume this is a shared environment, meaning there
are other loads on the hardware, and that a performance regression test
would have to have some way of self-calibrating or at least have
tolerances.

	*         I assume that an open source database is preferred for
testing. I had the students work with mysql. This is a little less than
ideal from my selfish point of view. One performance tweak that's
important to us is grouping sets. I know that is available for Oracle, I
was assuming not for mysql.

	*         We should agree to a target database size. The
smallest TPC-DS database size is 1GB. The data model includes at least
one dimension with > 1M rows. The students have been working with the
smallest size. I was hoping to kick the tires some with their final test
setup and see if that is indeed big enough to get interesting query
performance. I suspect it is. 

	 

	I'm actually trying to see if I can arrange with my company and
the university to sponsor another semester of work on the performance
test harness. Most of the fall semester was consumed with the mechanics
of the technology stack: mysql, mondrian, jmeter, mdx. I think another
semester's work could take this to an actual regression test - convert
results to a pass/fail.

	 

	Btw, the data generator generated some discussion before. The
TPC-DS data generator is written in C. The students spent some time
investigating an automated java port, but hit dead ends. It looks like
that would be a manual effort. I'm not really considering that right
now.

	 

	--jeff

	 

	From: mondrian-bounces at pentaho.org
[mailto:mondrian-bounces at pentaho.org] On Behalf Of Luc Boudreau
	Sent: Tuesday, November 30, 2010 9:09 AM
	To: Mondrian developer mailing list
	Subject: Re: [Mondrian] SSB Schema

	 

	Hi Jeff,
	
	I saw your emails but figured I'd reach out at large. I'm very
happy to hear that your project went forward. Would your students like
to contribute the test suite back to the Mondrian project? I could serve
as a contact for them so we can get this thing done once their work is
finished.
	
	Cheers!
	
	Luc

	On Tue, Nov 30, 2010 at 9:01 AM,
<jeff.s.wright at thomsonreuters.com> wrote:

	I've been working with some CS students at NCSU to create a
Mondrian schema for the TPC-DS database for use in performance testing.
We chose TPC-DS over SSB because it's a larger data model that enables
us to test virtual cubes. They're also creating a set of test queries
and a JMeter test script to execute them via XMLA.

	 

	They're scheduled to be done with their project on Dec 10.

	 

	--Jeff Wright

	 

	From: mondrian-bounces at pentaho.org
[mailto:mondrian-bounces at pentaho.org] On Behalf Of Luc Boudreau
	Sent: Tuesday, November 30, 2010 8:54 AM
	To: Mondrian developer mailing list
	Subject: [Mondrian] SSB Schema

	 

	Hi everyone,
	
	We are looking at adding a performance benchmark into Mondrian's
test suite. Is there anyone who wrote a Mondrian schema for the Star
Schema Benchmark (SSB)? If so, would you share it with the community?
	
	Thanks and please pass the word around!

	
	_______________________________________________
	Mondrian mailing list
	Mondrian at pentaho.org
	http://lists.pentaho.org/mailman/listinfo/mondrian

	 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20101203/22455d45/attachment.html 


More information about the Mondrian mailing list