[Mondrian] Spot the mistake

tom at wamonline.org.uk tom at wamonline.org.uk
Fri Mar 12 07:03:44 EST 2010


  BODY { font-family:Arial, Helvetica, sans-serif;font-size:12px; 
 Take 2(Just closed the tab down on my first email, not clever)
 Thanks for the reply Julian I've been begging for a sensible
explanation on IRC but have failed to get anything that I actually
understand:
 My dimension looks like this:
 CONTENT_TK      VERSION          CONTENT_ID      CONTENT_TITLE  
ISRC_UPC_CODE  ARTIST_NAME
   ==========         =======          ==========       
=============        =============    ===========
           264                         1                       373   
                  Ghostbusters                   1521                 
       Theme
          8664                        2                      373     
                 Ghostbusters                   1521                  
      Theme
          2911                        1                    3143      
                Ghostbusters Theme-   49236982                 Theme
          5664                        2                    3143      
                Ghostbusters                  49236982               
Theme
          5910                        3                     3143     
                Ghostbusters                    N/A                   
      Theme
 So whilst it looks a bit random, this is actually correct because we
have 2 ghostbusters themes by different content providers, but I don't
want the content provider information in this hierarchy.
 Also if they are going to title the piece of content the same, I
can't really prevent that either.
 So when I run this:
 select NON EMPTY {[Measures].[Content  Purchases]} ON COLUMNS,  NON
EMPTY  Hierarchize(Union(Union({[Content].[All Content]},
[Content].[All  Content].Children), [Content].[All
Content].[Theme].Children)) ON ROWSfrom [Purchases]
 The member lookup returns this:
  CONTENT_ID      CONTENT_TITLE    CONTENT_TITLE==========        
=============       =============      373                    
Ghostbusters          Ghostbusters     3143                  
Ghostbusters           Ghostbusters
 So we clearly have 2 distinct members, but when the query is
executed:
  `D_CONTENT`.`CONTENT_ID` in ('318', '324', '325', '326', '336',
'350',  '354', '356', '373', '374', '381', '393', '395', '397',
'403', '406',  '410', '414', '424', '427', '434', '436', '442',
'454', '455', '456',  '462', '466', '475', '476', '480', '482',
'487', '495', '503', '510',  '532', '540', '541', '3013', '3014',
'3015', '3025', '3026', '3027',  '3028', '3029', '3030', '3031',
'3032', '3033', '3034', '3035', '3036',  '3037', '3038', '3039',
'3040', '3041', '3042', '3043', '3044', '3045',  '3046', '3047',
'3048', '3049', '3050', '3051', '3052', '3054', '3055',  '3056',
'3057', '3058', '3059', '3060', '3061', '3062', '3063', '3064', 
'3065', '3066', '3067', '3068', '3070', '3071', '3073', '3074',
'3075',  '3076', '3077', '3078', '3079', '3080', '3081', '3082',
'3083', '3084',  '3085', '3086', '3087', '3088', '3089', '3090',
'3092', '3096', '3116',  '3139', '3140', '3141', '3142', '3144',
'3145', '3147', '3148', '3156',  '3157', '3176', '3427', '3445',
'3538', '3664', '5780', '5781', '5782',  '5819', '5820', '5822',
'5823')
 The latter is no longer in the query, which is one of the bits I'm
failing to understand.
 Then we move on to the working query, which is bascially just a
trimmed down version of query 1:
 select {[Measures].[Content Purchases]} ON COLUMNS,
    [Content].[All Content].[Theme].Children ON ROWS
 from [Purchases]
 When this does its member look up it returns this:
 COnTENT_ID    CONTENT_TITLE        CONTENT_ID    ISRC_CODE
 ==========    =============              ==========    =========
         373                 Ghostbusters                      373   
               1521
         3143              Ghostbusters Theme-      3143             
49236982        
          3143             Ghostbusters                     3143     
        49236982         
          3143             Ghostbusters                      3143    
               N/A 
 And then the member is included in the lookup and returns what I
would call the correct result.
 Now, both you and on IRC is was mentioned that my ETL may be at
fault, but I don't see quite what I could do to alter the fact,
people have to change the content_title, I can't prevent that, also
in the cube I have to use CONTENT_TITLE as a nameColumn, because
obviously CONTENT_ID makes little sense, and as I mentioned I can't
use it on a captionColumn where it seems to make the difference.
Should the lookup not use CONTENT_ID as the unique id and then run
with whatever else on the nameColumn.
 Either way, if it grouped everything in 1/2 or 3 Ghostbusters
members, that would be preferential to it just not fetching 4000
purchases.
 Cheers
 Tom
 On Fri 12/03/10  7:01 AM , "Julian Hyde" jhyde at pentaho.com sent:
    This might not be it, but here's something to check  anyway.  
Make sure that your caption, name etc. are functionally  dependent on
the level key. For example, consider the following fictitious 
dimension table for a Product dimension with levels Product and 
Manufacturer:   MFR_ID MFR_NAME PROD_ID PROD_NAME ====== ========
======= ========= 1      Sony     100     Walkman 1      Hitachi  101
     Playstation 2      Toyota    102     Prius 2      Toyota    103  
  Camry   Mondrian assumes that MFR_NAME is functionally dependent on 
MFR_ID, but it isn't for MFR_ID = 1. Depending on whether the query
selects  MFR_NAME, you'll either get 2 or 3 distinct manufacturers.  
This applies to attributes of levels (name, caption, ordinal,  and
parent if it is a parent-child level) and any member properties of
that  level.   You might think this is lame, but for efficiency,
Mondrian  gets all the properties of a member using 'SELECT DISTINCT'
(or SELECT ... GROUP  BY, same thing). Basically, Mondrian delivers
good performance by trusting that  you have done your job properly at
ETL time.   By the way, there is a similar trade-off where Mondrian 
assumes that every foreign key from the fact table (or indeed between
tables in  a snowflake dimension) has referential integrity.   Schema
workbench ought to have a validate mode that checks  this stuff. This
is the kind of thing we might put into a future release of  Enterprise
Edition.   Julian    
-------------------------
   From: mondrian-bounces at pentaho.org   
[mailto:mondrian-bounces at pentaho.org] On Behalf Of Ati    Rosselet
 Sent: Thursday, March 11, 2010 7:27 AM
 To:    tom at wamonline.org.uk; Mondrian developer mailing list
 Subject: Re:    Re: [Mondrian] Spot the mistake
    try posting your cube definition as well.. that might help
someone    spot a problem...
   On Thu, Mar 11, 2010 at 4:19 PM,     wrote:
   Ok here's my analysis
 If I remove the nameColumn      definition from that row as I said
earlier jpivot then returns the results I      would expect.
 If I then add CONTENT_TITLE to captionColumn then it      returns
the 3 records in the resultset.
 Which is fine in JPivot.....      but PRD doesn't support
captionColumn.
 This must also be surely quite      a serious flaw as people's
resultset's could be correct for 90% of queries      and then just
randomly drop stuff      silently?
 Cheers
 Tom
 On      Thu 11/03/10 2:26 PM , tom at wamonline.org.uk [2] sent:
 Try        it to the list this time: 
 Changing uniquemembers has no affect        
 :)
 Tom
 On Thu        11/03/10 1:24 PM , Ati Rosselet ati.rosselet at gmail.com
[3] sent:
        hmm...          maybe you have unique set to true for the
name or caption          columns?
         On Thu, Mar 11, 2010 at 1:41 PM,  wrote:
 Ok then,
 If I remove the nameColumn and            captionColumn from the
Content Dimension Track Name level, so I'm left            with bare
content id's the records reappear.            
 Bug?
 On Thu            11/03/10 12:29 PM , tom at wamonline.org.uk [4] sent:
                                             Alright             
guys I'm having some weird issues with my scd and             
mondrian:
 Basically in our dimension we have 3 records with              the
same artist name, same track name, but differing             
content_id's.
 If I do this mdx query:
 select              {[Measures].[Content Purchases]} ON COLUMNS,    
           [Content].[All Content].[Theme].Children ON             
ROWS
 from [Purchases]
 It returns 3 different with              records with the various
sales against them this is the correct              result, as there
are 3 different dimension entries for the              Ghostbusters
Theme tune.
 But if for example I run:
 select NON EMPTY {[Measures].[Content Purchases]} ON             
COLUMNS,
   NON EMPTY Hierarchize(Union(Union({[Content].[All             
Content]}, [Content].[All Content].Children), [Content].[All         
    Content].[Theme].Children)) ON ROWS
 from [Purchases]
 Which              is mdx generated by jpivot, it only returns the
first member from              the content dimension, even though
when it runs the sql queries to              retrieve members the
other ghostbuster records are returned in the              initial
member search, if that makes sense.
 This is the              SQL generated for the member lookup:
 select              `D_CONTENT`.`CONTENT_ID` as `c0`,
`D_CONTENT`.`CONTENT_TITLE` as              `c1`,
`D_CONTENT`.`CONTENT_TITLE` as `c2` from `D_CONTENT` as             
`D_CONTENT`, `F_CONTENT_PURCHASES` as `F_CONTENT_PURCHASES` where    
         `F_CONTENT_PURCHASES`.`CONTENT_TK` = `D_CONTENT`.`CONTENT_TK`
and              `D_CONTENT`.`ARTIST_NAME` = 'Theme' group by         
    `D_CONTENT`.`CONTENT_ID`, `D_CONTENT`.`CONTENT_TITLE` order by    
         ISNULL(`D_CONTENT`.`CONTENT_ID`), `D_CONTENT`.`CONTENT_ID`   
          ASC
 This is the SQL generated for looking up the records in             
the fact:
 select `D_CONTENT`.`ARTIST_NAME` as `c0`,             
`D_CONTENT`.`CONTENT_ID` as `c1`,             
count(`F_CONTENT_PURCHASES`.`TRANSACTION_ID`) as `m0` from           
  `D_CONTENT` as `D_CONTENT`, `F_CONTENT_PURCHASES` as             
`F_CONTENT_PURCHASES` where `F_CONTENT_PURCHASES`.`CONTENT_TK` =     
        `D_CONTENT`.`CONTENT_TK` and `D_CONTENT`.`ARTIST_NAME` =
'Theme' and              `D_CONTENT`.`CONTENT_ID` in ('318', '324',
'325', '326', '336',              '350', '354', '356', '373', '374',
'381', '393', '395', '397',              '403', '406', '410', '414',
'424', '427', '434', '436', '442',              '454', '455', '456',
'462', '466', '475', '476', '480', '482',              '487', '495',
'503', '510', '532', '540', '541', '3013', '3014',             
'3015', '3025', '3026', '3027', '3028', '3029', '3030', '3031',      
       '3032', '3033', '3034', '3035', '3036', '3037', '3038', '3039',
             '3040', '3041', '3042', '3043', '3044', '3045', '3046',
'3047',              '3048', '3049', '3050', '3051', '3052', '3054',
'3055', '3056',              '3057', '3058', '3059', '3060', '3061',
'3062', '3063', '3064',              '3065', '3066', '3067', '3068',
'3070', '3071', '3073', '3074',              '3075', '3076', '3077',
'3078', '3079', '3080', '3081', '3082',              '3083', '3084',
'3085', '3086', '3087', '3088', '3089', '3090',              '3092',
'3096', '3116', '3139', '3140', '3141', '3142', '3144',             
'3145', '3147', '3148', '3156', '3157', '3176', '3427', '3445',      
       '3538', '3664', '5780', '5781', '5782', '5819', '5820', '5822',
             '5823') group by `D_CONTENT`.`ARTIST_NAME`,             
`D_CONTENT`.`CONTENT_ID`]
 The content id for the missing              records obviously isn't
included in the Where.. IN().
 What              procedural error have I              missed?
 Cheers
 Tom
 _______________________________________________
 Mondrian            mailing list
 Mondrian at pentaho.org
 http://lists.pentaho.org/mailman/listinfo/mondrian [5]
 _______________________________________________
 Mondrian      mailing list
 Mondrian at pentaho.org [6]
 http://lists.pentaho.org/mailman/listinfo/mondrian [7]


Links:
------
[1] mailto:tom at wamonline.org.uk
[2] mailto:tom at wamonline.org.uk
[3] mailto:ati.rosselet at gmail.com
[4] mailto:tom at wamonline.org.uk
[5] http://lists.pentaho.org/mailman/listinfo/mondrian
[6] mailto:Mondrian at pentaho.org
[7] http://lists.pentaho.org/mailman/listinfo/mondrian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pentaho.org/pipermail/mondrian/attachments/20100312/09754d60/attachment.html 


More information about the Mondrian mailing list