[Mondrian] Aggregate Tables Not Being Used

Nikolas Everett nik9000 at gmail.com
Tue Jan 13 10:15:49 EST 2009


I'm experimenting with Mondrian to see if it will fit our reporting needs.
I've set up a small fact table backed into Postgres 8.3 with 241116 rows.
Thats about 1 day's worth of data for this fact table.  I'm definitely going
to need aggregate tables to make this sensible.  I won't have a problem
making the tables and have a great server to host the production system, but
I need to be sure that I don't hit the actual fact table when I don't want
to.  Is there some utility that I can use to plan my queries to see what
tables they'll hit?

Also, I'm having trouble getting aggregate tables to work at all.  I'd be
super appreciative if someone could help me with that.

This is an abbreviated cube:

>   <Cube name="Signals">
>     <Table name="signalreportdata">
>       <AggName name="signalreportdata_hour">
>         <AggFactCount column="fact_count"/>
>         <AggMeasure name="[Measures].[Count]" column="count"/>
>         <AggForeignKey factColumn="hour_id" aggColumn="hour_id"/>
>       </AggName>
>     </Table>
>     <DimensionUsage name="Request" source="Request"
> foreignKey="request_id"/>
>     <Dimension name="Response">
>       <Hierarchy hasAll="true" allMemberName="All Responses">
>         <Level name="Response" column="response" uniqueMambers="false"/>
>       </Hierarchy>
>     </Dimension>
>     <DimensionUsage name="OutPeer" source="Peer" foreignKey="outpeer_id"
> usagePrefix="Out"/>
>     <DimensionUsage name="InPeer" source="Peer" foreignKey="inpeer_id"
> usagePrefix="In"/>
>     <DimensionUsage name="Timestamp" source="Hour" foreignKey="hour_id"/>
>     <DimensionUsage name="File" source="File" foreignKey="file_id"/>
>     <Measure name="Count" column="count" aggregator="sum"
> datatype="Integer"/>
>   </Cube>
>

The fact table look like:

> CREATE TABLE signalreportdata
> (
>   id bigint NOT NULL,
>   "version" bigint NOT NULL,
>   count bigint NOT NULL,
>   response integer NOT NULL,
>   file_id bigint NOT NULL,
>   hour_id bigint NOT NULL,
>   inpeer_id bigint NOT NULL,
>   outpeer_id bigint NOT NULL,
>   request_id bigint NOT NULL,
>   CONSTRAINT signalreportdata_pkey PRIMARY KEY (id),
>   CONSTRAINT fkfe865be64c75fae7 FOREIGN KEY (hour_id)
>       REFERENCES "hour" (id) MATCH SIMPLE
>       ON UPDATE NO ACTION ON DELETE NO ACTION,
>   CONSTRAINT fkfe865be6548c3162 FOREIGN KEY (inpeer_id)
>       REFERENCES peer (id) MATCH SIMPLE
>       ON UPDATE NO ACTION ON DELETE NO ACTION,
>   CONSTRAINT fkfe865be6936dccef FOREIGN KEY (file_id)
>       REFERENCES signalfile (id) MATCH SIMPLE
>       ON UPDATE NO ACTION ON DELETE NO ACTION,
>   CONSTRAINT fkfe865be69f63f48d FOREIGN KEY (request_id)
>       REFERENCES request (id) MATCH SIMPLE
>       ON UPDATE NO ACTION ON DELETE NO ACTION,
>   CONSTRAINT fkfe865be6a2e07f19 FOREIGN KEY (outpeer_id)
>       REFERENCES peer (id) MATCH SIMPLE
>       ON UPDATE NO ACTION ON DELETE NO ACTION
> )
> WITH (OIDS=FALSE);
>

The aggregate table looks like:

> CREATE TABLE signalreportdata_hour
> (
>   hour_id bigint,
>   count numeric
> )
> WITH (OIDS=FALSE);
>

I brought this online by dropping the mondrian-embedded war into jetty and
removing FoodMart and adding my schema.  I also hacked up the JSPs to point
to my schema and dropped the postgres jdbc driver in the WEB-INF/lib
directory.

I've covinced mondrian to be super verbose about its logging.  I think this
is the relevant bit:

> 443  [1473444918 at qtp0-7] DEBUG mondrian.rolap.aggmatcher.ExplicitRules  -
> ExplicitRules.Group:
>   name=signalreportdata
>   TableDefs: [
>     ExplicitRules.NameTableDef:
>       id=0
>       ignoreCase=true
>       Levels: [
>       ]
>       Measures: [
>         Measure:
>           name=[Measures].[Count]
>           column=count
>       ]
>       name=signalreportdata_hour
>   ]
>
>
> 446  [1473444918 at qtp0-7] DEBUG mondrian.rolap.aggmatcher.AggTableManager
> -
> RolapStar:
>   Table:
>     alias=signalreportdata
>     relation=signalreportdata
>     Columns:
>       Response (1): "signalreportdata"."response"
>       Count (7): sum("signalreportdata"."count")
>     Table:
>       alias=request
>       relation=request
>       Columns:
>         Request (0): "request"."name"
>       Condition:
>         left="signalreportdata"."request_id"
>         right="request"."id"
>     Table:
>       alias=peer
>       relation=peer
>       Columns:
>         Hostname (2): "peer"."hostname"
>       Condition:
>         left="signalreportdata"."outpeer_id"
>         right="peer"."id"
>     Table:
>       alias=peer_1
>       relation=peer
>       Columns:
>         Hostname (3): "peer_1"."hostname"
>       Condition:
>         left="signalreportdata"."inpeer_id"
>         right="peer_1"."id"
>     Table:
>       alias=hour
>       relation=hour
>       Columns:
>         Timestamp (4): "hour"."timestamp"
>       Condition:
>         left="signalreportdata"."hour_id"
>         right="hour"."id"
>     Table:
>       alias=signalfile_view
>       relation=signalfile_view
>       Columns:
>         Server (5): "signalfile_view"."server"
>         File (6): "signalfile_view"."filename"
>       Condition:
>         left="signalreportdata"."file_id"
>         right="signalfile_view"."id"
>
>
> 446  [1473444918 at qtp0-7] DEBUG mondrian.rolap.RolapSchema  - Pool.get:
> create schema
> "file:/home/nik/Desktop/jetty/jetty-6.1.14/webapps/mondrian/WEB-INF/queries/Signals.xml"
> 451  [1473444918 at qtp0-7] DEBUG mondrian.rolap.RolapConnection  -
>
> select
>   [Measures].[Count] on columns,
>   {([Timestamp].[All Timestamps], [InPeer].[All InPeers], [OutPeer].[All
> OutPeers], [Request].[All Requests], [Response].[All Responses], [File].[All
> Files])} ON rows
> from Signals
>
>

The query above gets evaluated run as

select sum("signalreportdata"."count") as "m0" from "signalreportdata" as
"signalreportdata"

when it probably ought to have been
select sum("signalreportdata_hour"."count") as "m0" from
"signalreportdata_hour" as "signalreportdata_hour"

Its not like there is a big speed difference now, but there will be when my
primary fact table has over one hundred million rows.

Thanks so much your help, good list.

Nik Everett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20090113/615b6708/attachment.html 


More information about the Mondrian mailing list