[Mondrian] duplicate foreign keys in foodmart

Etienne Dube etdube at gmail.com
Fri Sep 25 10:33:59 EDT 2009


Hi Mondrian developers,

I posted this a few days ago in the Pentaho forums but I haven't got an 
answer yet. Maybe one of you could enlighten me on that.

------------------------------
While toying around with the foodmart DB, I noticed that some of the 
facts in the sales_fact_1997 and sales_fact_1998 tables have duplicate 
foreign keys. For example, if I run the following SQL query to retrieve 
rows for which there is more than one occurrence of a combination of 
values for the foreign keys:

select a.* from sales_fact_1997 as a,
(select product_id, time_id, customer_id, promotion_id, store_id, count(*)
from sales_fact_1997
group by product_id, time_id, customer_id, promotion_id, store_id
having count(*) > 1) as b
where a.product_id = b.product_id and a.time_id = b.time_id and 
a.customer_id = b.customer_id
and a.promotion_id = b.promotion_id and a.store_id = b.store_id;

this yields these rows (sorry for the ugly formatting):

product_id    time_id    customer_id    promotion_id    store_id   
 store_sales    store_cost    unit_sales
----------    -------    -----------    ------------    --------   
 -----------    ----------    ----------
290            488        1690        0                16           
 5.1000        2.2440        3.0000
290            488        1690        0                16           
 5.1000        1.8870        3.0000
485            692        1850        1235            16           
 6.5700        2.3652        3.0000
485            692        1850        1235            16           
 8.7600        4.2924        4.0000
800            725        2244        0                17           
 9.7800        4.1076        3.0000
800            725        2244        0                17           
 6.5200        2.3472        2.0000
1262        381        3386        1845            24           
 2.2800        1.0716        3.0000
1262        381        3386        1845            24           
 3.0400        1.2160        4.0000
1539        482        3491        1220            16           
 14.3200        6.7304        4.0000
1539        482        3491        1220            16           
 10.7400        4.4034        3.0000
1381        655        4903        0                3           
 6.5100        2.5389        3.0000
1381        655        4903        0                3           
 4.3400        2.0832        2.0000
42            630        5026        0                17           
 11.2800        3.7224        4.0000
42            630        5026        0                17           
 5.6400        2.3124        2.0000
883            466        9535        824                11           
 8.7000        3.1320        3.0000
883            466        9535        824                11           
 14.5000        4.4950        5.0000

Notice the rows with duplicate foreign keys values, but having different 
values in the measures columns. I thought that usually (according to 
Kimball's Data Warehouse Toolkit), foreign keys in a fact table should 
form a composite primary key. Of course in the case of that table, this 
doesn't hold true and creation of a primary key fails due to the 
duplicate values.

Is that a normal and sane design choice, or is there something wrong or 
unusual with this fact table? Will Mondrian do the right thing, i.e. 
aggregate the measure values, when there are duplicate facts (w/ same keys)?

Thanks,

Etienne




More information about the Mondrian mailing list