[Mondrian] RE: change 9710: aggregating count-distinct over compound cells

Julian Hyde julianhyde at speakeasy.net
Mon Oct 15 13:52:36 EDT 2007


> Rushan Chen wrote:
>
> I was also looking at the caching problem you referred to. I was 
> thinking of including in the aggregate cache key a "CellContext" to 
> represent the compound member; so the new key will consist of 
> ColumnBitKeys and CellContext. Usually CellContext is empty, 
> except for 
> aggregates such as count-distinct that are computed for 
> compound members 
> which cannot be rolled up from single cells computed for the members 
> that are contained in the compound member.
> 
> Here is an example of the aggregation cache content storing 
> the product 
> and store dimensions. Aggregations are represented by 
> AggegationKeys(which will include CellContext, in {}), and Cells are 
> represented by the CellKeys.
> 
> Aggregation                                             Cell
> 
> (store, {[beer], [wine].[Bordeaux]})       (south)  -- aggregate for 
> store "south", over "beer" and "Bordeaux wine"
>                                                                    
> (east)     -- aggregate for store "east",  over "beer" and 
> "Bordeaux wine"
> 
> (store, beverage group)                          (east, beer)   -- 
> aggregate for store "east" and  beverage group "beer"
>                                                               
>     (west, 
> beer)  --  aggregate for store "west" and beverage group "beer"
>                                                               
>     (west, 
> wine)  --   aggregate for store "west" and beverage group "wine"
> 
> (store, beverage type)                            (west, Bordeaux 
> wine)   -- aggregate for store "west" and beverage type 
> "Bordeaux wine"
> 
> CellRequest will have to include a similar field too to 
> distinguish the 
> CellContext from the members that constrain the cell. If 
> CellContext and 
> constraining columns match, cells can be accumulated along the 
> constraining column, and be loaded in  a single SQL statement.
> 
> Some implications of this idea:
> (1) Aggregate cache loookup and loading will need to be changed quite 
> extensively.
> (2) Effectively segmenting the aggregate cache into areas that do not 
> have a CellContext, and areas with CellContext. (Is this the 
> one you are 
> trying to avoid?)
> (3) CellContext  contains sets of compound members, and have a less 
> compact representation (unlike columnBitKey). The key portion of the 
> aggregate cache will be a lot larger, and the lookup will be slower; 
> however, once the matching Aggregation is located, the actual 
> cell can 
> be located using just the constraining column values. Also, the cell 
> keys stored in the Segment will be the same as if there were 
> no compound 
> members.
> 
> Like the original single cell loading scheme, this new one 
> likely will 
> take some effort to make it work. So I'd really appreciate 
> comments on 
> the design; or, if a simpler approach can be found.

I thought of the 'CellContext' approach, and realised it had the same
problem - the entire caching infrastructure needs to be aware of it, and
it's easy to forget about it. Given that compound members are rarely used,
we'd have problems finding all of the bugs. 

I suggested the solution in my previous email - a special column value which
lives in the regular bitmap. A client who is looking for [Beer and wine]
looks in  the cache for

  (product_special: 'Beer and wine',
   product_type: null)

Note that 'Beer' and 'Wine' are values of the product_type column, but the
constraint on product_special makes them superfluous. This column would be
part of the regular bitmap, so could not easily be ignored. The cache even
obeys proper rollup semantics. The most difficult problem to overcome is
that these columns need to be added on the fly when someone adds a calc
member of a previously unused dimensionality.

Julian




More information about the Mondrian mailing list