[Mondrian] Spot the mistake
tom at wamonline.org.uk
tom at wamonline.org.uk
Fri Mar 12 07:03:44 EST 2010
BODY { font-family:Arial, Helvetica, sans-serif;font-size:12px;
Take 2(Just closed the tab down on my first email, not clever)
Thanks for the reply Julian I've been begging for a sensible
explanation on IRC but have failed to get anything that I actually
understand:
My dimension looks like this:
CONTENT_TK VERSION CONTENT_ID CONTENT_TITLE
ISRC_UPC_CODE ARTIST_NAME
========== ======= ==========
============= ============= ===========
264 1 373
Ghostbusters 1521
Theme
8664 2 373
Ghostbusters 1521
Theme
2911 1 3143
Ghostbusters Theme- 49236982 Theme
5664 2 3143
Ghostbusters 49236982
Theme
5910 3 3143
Ghostbusters N/A
Theme
So whilst it looks a bit random, this is actually correct because we
have 2 ghostbusters themes by different content providers, but I don't
want the content provider information in this hierarchy.
Also if they are going to title the piece of content the same, I
can't really prevent that either.
So when I run this:
select NON EMPTY {[Measures].[Content Purchases]} ON COLUMNS, NON
EMPTY Hierarchize(Union(Union({[Content].[All Content]},
[Content].[All Content].Children), [Content].[All
Content].[Theme].Children)) ON ROWSfrom [Purchases]
The member lookup returns this:
CONTENT_ID CONTENT_TITLE CONTENT_TITLE==========
============= ============= 373
Ghostbusters Ghostbusters 3143
Ghostbusters Ghostbusters
So we clearly have 2 distinct members, but when the query is
executed:
`D_CONTENT`.`CONTENT_ID` in ('318', '324', '325', '326', '336',
'350', '354', '356', '373', '374', '381', '393', '395', '397',
'403', '406', '410', '414', '424', '427', '434', '436', '442',
'454', '455', '456', '462', '466', '475', '476', '480', '482',
'487', '495', '503', '510', '532', '540', '541', '3013', '3014',
'3015', '3025', '3026', '3027', '3028', '3029', '3030', '3031',
'3032', '3033', '3034', '3035', '3036', '3037', '3038', '3039',
'3040', '3041', '3042', '3043', '3044', '3045', '3046', '3047',
'3048', '3049', '3050', '3051', '3052', '3054', '3055', '3056',
'3057', '3058', '3059', '3060', '3061', '3062', '3063', '3064',
'3065', '3066', '3067', '3068', '3070', '3071', '3073', '3074',
'3075', '3076', '3077', '3078', '3079', '3080', '3081', '3082',
'3083', '3084', '3085', '3086', '3087', '3088', '3089', '3090',
'3092', '3096', '3116', '3139', '3140', '3141', '3142', '3144',
'3145', '3147', '3148', '3156', '3157', '3176', '3427', '3445',
'3538', '3664', '5780', '5781', '5782', '5819', '5820', '5822',
'5823')
The latter is no longer in the query, which is one of the bits I'm
failing to understand.
Then we move on to the working query, which is bascially just a
trimmed down version of query 1:
select {[Measures].[Content Purchases]} ON COLUMNS,
[Content].[All Content].[Theme].Children ON ROWS
from [Purchases]
When this does its member look up it returns this:
COnTENT_ID CONTENT_TITLE CONTENT_ID ISRC_CODE
========== ============= ========== =========
373 Ghostbusters 373
1521
3143 Ghostbusters Theme- 3143
49236982
3143 Ghostbusters 3143
49236982
3143 Ghostbusters 3143
N/A
And then the member is included in the lookup and returns what I
would call the correct result.
Now, both you and on IRC is was mentioned that my ETL may be at
fault, but I don't see quite what I could do to alter the fact,
people have to change the content_title, I can't prevent that, also
in the cube I have to use CONTENT_TITLE as a nameColumn, because
obviously CONTENT_ID makes little sense, and as I mentioned I can't
use it on a captionColumn where it seems to make the difference.
Should the lookup not use CONTENT_ID as the unique id and then run
with whatever else on the nameColumn.
Either way, if it grouped everything in 1/2 or 3 Ghostbusters
members, that would be preferential to it just not fetching 4000
purchases.
Cheers
Tom
On Fri 12/03/10 7:01 AM , "Julian Hyde" jhyde at pentaho.com sent:
This might not be it, but here's something to check anyway.
Make sure that your caption, name etc. are functionally dependent on
the level key. For example, consider the following fictitious
dimension table for a Product dimension with levels Product and
Manufacturer: MFR_ID MFR_NAME PROD_ID PROD_NAME ====== ========
======= ========= 1 Sony 100 Walkman 1 Hitachi 101
Playstation 2 Toyota 102 Prius 2 Toyota 103
Camry Mondrian assumes that MFR_NAME is functionally dependent on
MFR_ID, but it isn't for MFR_ID = 1. Depending on whether the query
selects MFR_NAME, you'll either get 2 or 3 distinct manufacturers.
This applies to attributes of levels (name, caption, ordinal, and
parent if it is a parent-child level) and any member properties of
that level. You might think this is lame, but for efficiency,
Mondrian gets all the properties of a member using 'SELECT DISTINCT'
(or SELECT ... GROUP BY, same thing). Basically, Mondrian delivers
good performance by trusting that you have done your job properly at
ETL time. By the way, there is a similar trade-off where Mondrian
assumes that every foreign key from the fact table (or indeed between
tables in a snowflake dimension) has referential integrity. Schema
workbench ought to have a validate mode that checks this stuff. This
is the kind of thing we might put into a future release of Enterprise
Edition. Julian
-------------------------
From: mondrian-bounces at pentaho.org
[mailto:mondrian-bounces at pentaho.org] On Behalf Of Ati Rosselet
Sent: Thursday, March 11, 2010 7:27 AM
To: tom at wamonline.org.uk; Mondrian developer mailing list
Subject: Re: Re: [Mondrian] Spot the mistake
try posting your cube definition as well.. that might help
someone spot a problem...
On Thu, Mar 11, 2010 at 4:19 PM, wrote:
Ok here's my analysis
If I remove the nameColumn definition from that row as I said
earlier jpivot then returns the results I would expect.
If I then add CONTENT_TITLE to captionColumn then it returns
the 3 records in the resultset.
Which is fine in JPivot..... but PRD doesn't support
captionColumn.
This must also be surely quite a serious flaw as people's
resultset's could be correct for 90% of queries and then just
randomly drop stuff silently?
Cheers
Tom
On Thu 11/03/10 2:26 PM , tom at wamonline.org.uk [2] sent:
Try it to the list this time:
Changing uniquemembers has no affect
:)
Tom
On Thu 11/03/10 1:24 PM , Ati Rosselet ati.rosselet at gmail.com
[3] sent:
hmm... maybe you have unique set to true for the
name or caption columns?
On Thu, Mar 11, 2010 at 1:41 PM, wrote:
Ok then,
If I remove the nameColumn and captionColumn from the
Content Dimension Track Name level, so I'm left with bare
content id's the records reappear.
Bug?
On Thu 11/03/10 12:29 PM , tom at wamonline.org.uk [4] sent:
Alright
guys I'm having some weird issues with my scd and
mondrian:
Basically in our dimension we have 3 records with the
same artist name, same track name, but differing
content_id's.
If I do this mdx query:
select {[Measures].[Content Purchases]} ON COLUMNS,
[Content].[All Content].[Theme].Children ON
ROWS
from [Purchases]
It returns 3 different with records with the various
sales against them this is the correct result, as there
are 3 different dimension entries for the Ghostbusters
Theme tune.
But if for example I run:
select NON EMPTY {[Measures].[Content Purchases]} ON
COLUMNS,
NON EMPTY Hierarchize(Union(Union({[Content].[All
Content]}, [Content].[All Content].Children), [Content].[All
Content].[Theme].Children)) ON ROWS
from [Purchases]
Which is mdx generated by jpivot, it only returns the
first member from the content dimension, even though
when it runs the sql queries to retrieve members the
other ghostbuster records are returned in the initial
member search, if that makes sense.
This is the SQL generated for the member lookup:
select `D_CONTENT`.`CONTENT_ID` as `c0`,
`D_CONTENT`.`CONTENT_TITLE` as `c1`,
`D_CONTENT`.`CONTENT_TITLE` as `c2` from `D_CONTENT` as
`D_CONTENT`, `F_CONTENT_PURCHASES` as `F_CONTENT_PURCHASES` where
`F_CONTENT_PURCHASES`.`CONTENT_TK` = `D_CONTENT`.`CONTENT_TK`
and `D_CONTENT`.`ARTIST_NAME` = 'Theme' group by
`D_CONTENT`.`CONTENT_ID`, `D_CONTENT`.`CONTENT_TITLE` order by
ISNULL(`D_CONTENT`.`CONTENT_ID`), `D_CONTENT`.`CONTENT_ID`
ASC
This is the SQL generated for looking up the records in
the fact:
select `D_CONTENT`.`ARTIST_NAME` as `c0`,
`D_CONTENT`.`CONTENT_ID` as `c1`,
count(`F_CONTENT_PURCHASES`.`TRANSACTION_ID`) as `m0` from
`D_CONTENT` as `D_CONTENT`, `F_CONTENT_PURCHASES` as
`F_CONTENT_PURCHASES` where `F_CONTENT_PURCHASES`.`CONTENT_TK` =
`D_CONTENT`.`CONTENT_TK` and `D_CONTENT`.`ARTIST_NAME` =
'Theme' and `D_CONTENT`.`CONTENT_ID` in ('318', '324',
'325', '326', '336', '350', '354', '356', '373', '374',
'381', '393', '395', '397', '403', '406', '410', '414',
'424', '427', '434', '436', '442', '454', '455', '456',
'462', '466', '475', '476', '480', '482', '487', '495',
'503', '510', '532', '540', '541', '3013', '3014',
'3015', '3025', '3026', '3027', '3028', '3029', '3030', '3031',
'3032', '3033', '3034', '3035', '3036', '3037', '3038', '3039',
'3040', '3041', '3042', '3043', '3044', '3045', '3046',
'3047', '3048', '3049', '3050', '3051', '3052', '3054',
'3055', '3056', '3057', '3058', '3059', '3060', '3061',
'3062', '3063', '3064', '3065', '3066', '3067', '3068',
'3070', '3071', '3073', '3074', '3075', '3076', '3077',
'3078', '3079', '3080', '3081', '3082', '3083', '3084',
'3085', '3086', '3087', '3088', '3089', '3090', '3092',
'3096', '3116', '3139', '3140', '3141', '3142', '3144',
'3145', '3147', '3148', '3156', '3157', '3176', '3427', '3445',
'3538', '3664', '5780', '5781', '5782', '5819', '5820', '5822',
'5823') group by `D_CONTENT`.`ARTIST_NAME`,
`D_CONTENT`.`CONTENT_ID`]
The content id for the missing records obviously isn't
included in the Where.. IN().
What procedural error have I missed?
Cheers
Tom
_______________________________________________
Mondrian mailing list
Mondrian at pentaho.org
http://lists.pentaho.org/mailman/listinfo/mondrian [5]
_______________________________________________
Mondrian mailing list
Mondrian at pentaho.org [6]
http://lists.pentaho.org/mailman/listinfo/mondrian [7]
Links:
------
[1] mailto:tom at wamonline.org.uk
[2] mailto:tom at wamonline.org.uk
[3] mailto:ati.rosselet at gmail.com
[4] mailto:tom at wamonline.org.uk
[5] http://lists.pentaho.org/mailman/listinfo/mondrian
[6] mailto:Mondrian at pentaho.org
[7] http://lists.pentaho.org/mailman/listinfo/mondrian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20100312/09754d60/attachment.html
More information about the Mondrian
mailing list