[Mondrian] duplicate foreign keys in foodmart
Etienne Dube
etdube at gmail.com
Fri Sep 25 10:33:59 EDT 2009
Hi Mondrian developers,
I posted this a few days ago in the Pentaho forums but I haven't got an
answer yet. Maybe one of you could enlighten me on that.
------------------------------
While toying around with the foodmart DB, I noticed that some of the
facts in the sales_fact_1997 and sales_fact_1998 tables have duplicate
foreign keys. For example, if I run the following SQL query to retrieve
rows for which there is more than one occurrence of a combination of
values for the foreign keys:
select a.* from sales_fact_1997 as a,
(select product_id, time_id, customer_id, promotion_id, store_id, count(*)
from sales_fact_1997
group by product_id, time_id, customer_id, promotion_id, store_id
having count(*) > 1) as b
where a.product_id = b.product_id and a.time_id = b.time_id and
a.customer_id = b.customer_id
and a.promotion_id = b.promotion_id and a.store_id = b.store_id;
this yields these rows (sorry for the ugly formatting):
product_id time_id customer_id promotion_id store_id
store_sales store_cost unit_sales
---------- ------- ----------- ------------ --------
----------- ---------- ----------
290 488 1690 0 16
5.1000 2.2440 3.0000
290 488 1690 0 16
5.1000 1.8870 3.0000
485 692 1850 1235 16
6.5700 2.3652 3.0000
485 692 1850 1235 16
8.7600 4.2924 4.0000
800 725 2244 0 17
9.7800 4.1076 3.0000
800 725 2244 0 17
6.5200 2.3472 2.0000
1262 381 3386 1845 24
2.2800 1.0716 3.0000
1262 381 3386 1845 24
3.0400 1.2160 4.0000
1539 482 3491 1220 16
14.3200 6.7304 4.0000
1539 482 3491 1220 16
10.7400 4.4034 3.0000
1381 655 4903 0 3
6.5100 2.5389 3.0000
1381 655 4903 0 3
4.3400 2.0832 2.0000
42 630 5026 0 17
11.2800 3.7224 4.0000
42 630 5026 0 17
5.6400 2.3124 2.0000
883 466 9535 824 11
8.7000 3.1320 3.0000
883 466 9535 824 11
14.5000 4.4950 5.0000
Notice the rows with duplicate foreign keys values, but having different
values in the measures columns. I thought that usually (according to
Kimball's Data Warehouse Toolkit), foreign keys in a fact table should
form a composite primary key. Of course in the case of that table, this
doesn't hold true and creation of a primary key fails due to the
duplicate values.
Is that a normal and sane design choice, or is there something wrong or
unusual with this fact table? Will Mondrian do the right thing, i.e.
aggregate the measure values, when there are duplicate facts (w/ same keys)?
Thanks,
Etienne
More information about the Mondrian
mailing list