[Mondrian] All Members vs Aggregate

Julian Hyde julianhyde at speakeasy.net
Tue May 8 14:44:44 EDT 2007


Thiyagu,
 
The details are there, but some of the larger design issues still need
to be solved. In short, we need a top-down design. Here are a few ideas.
 
To start with, I would like to see a description of the design that a
typical DBA would understand. 'Here is a simple MDX statement, and here
is the SQL with GROUPING SETS it generates'.
 
A slightly more complex case. Consider the case where we load CA, OR,
and all cities in WA. What will the SQL look like for that query? I
don't think it's possible to constrain some grouping sets without
constraining others, so we would end up loading all cities in CA and OR.
That might be problem; maybe we can generate SQL to avoid that, or just
not merge the batches together.
 
What happens if a query requests batches A, B, and C, and B is already
being loaded?
 
What happens if a query requests batches A, B, C, D, and B and D are
better satisfied using aggregate tables? Suppose further that B and D
can be satisfied using the same aggregate table; should we generate
GROUPING SETS syntax against aggregate tables?
 
If a query only has one GROUPING SET, will be use the GROUPING SET
syntax, or just the regular GROUP BY?
 
Concurrency control is a huge issue to be designed. Previously, load
would lock an Aggregate. That is not longer feasible, because load will
affect more than one Aggregate. The POC code continues to lock the first

 
Several places in the code take an array of Aggregations. I think that
they should take an array of Segments instead. This is valid because - I
think - the Segments in a given load will all belong to different
Aggregations. This array of Segments could be refactored into a class,
which could hold some of the newly-static methods.
 
You have added tests in CmdRunnerTest and FastBatchingCellReaderTest. I
think the MDX-level tests should be in BasicQueryTest - you are not
testing new functionality in CmdRunner, after all.
FastBatchingCellReaderTest could be useful; I look forward to seeing it
filled out. If I were writing this feature, I would be adding tests very
similar to those in TestAggregationManager
 
Specific code reviews.
 
AggQuerySpec:

*	

	Use collections (e.g. lists) for modern code. They are
definitely clearer than 2D arrays. For example, 'SqlQuery.ClauseList[]
getGroupingSets(RolapStar.Column[][] columns, SqlQuery sqlQuery)' would
certainly benefit from this.

Aggregation:

*	

	The method
    public synchronized void load(
            RolapStar.Column[] columns,
            RolapStar.Measure[] measures,
            StarColumnPredicate[] predicates,
                     RolapAggregationManager.PinSet pinnedSegments,
                     List<Aggregation> grouping
    )
is effectively static now you have added the 'grouping' parameter. Which
means that the 'synchronized' keyword is now dubious: if you're loading
segments A, B, and C it is not valid to just lock A.
*	

	Check the formatting for parameters in this method and other new
methods in this file.
*	

	setColumns: Making columns mutable seems to be asking for
trouble. I note that setColumns is called from
FastBatchingCellReader.merge, but what if the aggregation concerned has
existing segments? This needs to be re-thought.
*	

	A lot of Aggregation methods have become static. I don't like
static methods. My hunch is that we can refactor out a new class which
represents a collection of segments being loaded at the same time.
*	

	getSegment is poorly named since it returns more than one
segment

AggregationManager:

*	

	loadAggregation has a parameter constrainedColumnsBitKey, and
uses it to call Rolapstar.lookupOrCreateAggregation. That is no longer
appropriate: the aggregations in the grouping will each have a different
bit key. This is a symptom of a larger problem: the aggregations in the
grouping should be treated symmetrically, and there should be no
'primary' aggregation to call 'load' against.
*	

	'spec.generateSqlQuery(grouping)' : shouldn't the grouping be
part of the QuerySpec?

Segment:

*	

	exec() and getObjects() should return Object[][] not Object[].
Even better, return List<Object[]>.
*	

	getObjects: Why does this throw QueryCanceledException? There
are many reasons why ResultSet.next() might throw.

DrillThroughQuerySpec:

*	

	should assert that groupingSet array is empty

CmdRunnerTest:

*	

	move tests out of CmdRunnerTest into BasicQueryTest or similar.
You're not really testing functionality in CmdRunner.

SqlQuery:

*	

	
	Rather than making ClauseList public, I suggest you create a
specific inner class GroupingSetClause. Some of the other helper methods
could be moved to this class.
*	

	SqlQuery.groupingSets should accept StringBuilder rather than
return String.
*	

	'grouping set' makes terminology confusing - not your fault -
but I suggest that you rename 'setGroupingSet(ClauseList[]
groupingSets)' to 'setGroupingSetList(List<ClauseList> groupingSetList)
*	Grouping set names are identifiers. They need to be quoted
according to the dialect.

General:

*	

	use StringBuilder rather than StringBuffer for all new code
*	

	use foreach-style 'for' loops wherever possible
*	

	javadoc

Julian


  _____  

From: mondrian-bounces at pentaho.org [mailto:mondrian-bounces at pentaho.org]
On Behalf Of Thiyagu Palanisamy
Sent: Tuesday, May 08, 2007 7:30 AM
To: mondrian at pentaho.org
Subject: Re: [Mondrian] All Members vs Aggregate



Hello, 

For implementing Sql with Grouping Set we need to change the way the
following classes interact during loading of Cell request data. 

1. fastBatchingCellReader.loadAggregations() -> batch.loadAggregation() 
2. batch.loadAggregation() -> aggregationManager.loadAggregation() 
3. aggregationManager.loadAggregation() -> aggregation.load() 
4. aggregation.load() -> Segment.load() 

Using SQL with Grouping Set function needs to be decided at the Batch
level, once we identify the batches that can be grouped then we need to
tag them together and pass this information down to Segment.load() to
fire single SQL for multiple batches. 

Segment.load() static method needs information about all the segment
instances which are going to be queried together along with some
additional information from Aggregation/Batch class. 

So we need to put together all the segment instances from multiple
grouped batches and pass it to Segment.load(). 

We are thinking of the following approach to achieve this, 

Instead of aggregation.load() calling Segment.load() with its Segments
list, aggregation.load() need to return Segment list of each batch to
the FBCR(or class representing Grouped Batch). 
Then FBCR will group the segments of grouped batches and make a call to
Segment.load static method to fire single sql with grouping sets and
populate all those Segments. 

We are close to finishing a working prototype based on the above
approach. 
Hence would like to get your early feedback to ensure that we are on
right track. 

Code posted at http://forums.pentaho.org/showthread.php?t=53644 is POC
for logic of grouping the Batch and extracting information from a
grouping set sql to multiple segments of multiple batches. 

Thanks, 
Thiyagu 

  


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20070508/ed22c601/attachment.html 


More information about the Mondrian mailing list