[Mondrian] Integrating data from non-JDBC data sources

Torsten Schlabach tschlabach at gmx.net
Mon Aug 17 11:36:43 EDT 2009


Hi Julian!

> You don't want to access every row in your fact table,
> then join to the time dimension, filter out those in year 2008 and
> aggregate at the quarter level. You want the database to join, filter
> and aggregate simultaneously.

Of course this is the ideal case, but the world isn't always ideal. And
your data source isn't always where you wish it was.

Simply put, for my problem, I have two alternatives:

1) Do the classic ETL thing, i.e. use Kettle (pardon: PDI) to get
everything into one nice database

+ I will be faster.
- I don't have live data.

2) Access data directly from different sources

+ I will have live data.
+ I don't need to create a separate database which holds the analytic data
+ I can live with one less tool (Kettle / PDI)
- It will be slower.

But practially speaking, I expect some 10.000 records in my non-JDBC
datasource and the datasource would even allow filtering, for example,
for the year. I don't think that processing the couple of 1.000 records
for one year in memory should cost too much performance, but it would
just be convenient.

If I were to make some experiments, could you give me some hints at
which classes to look in the Mondrian source code?

Regards,
Torsten


Julian Hyde schrieb:
> In the early days we discussed custom member readers, that would allow you
> to programmatically create a dimension that is not backed by a JDBC table.
> In principle you could also create a custom cell reader, to replace a JDBC
> fact table.
> 
> The problem is, a lot of the power comes from joining fact tables to
> dimension tables. You don't want to access every row in your fact table,
> then join to the time dimension, filter out those in year 2008 and aggregate
> at the quarter level. You want the database to join, filter and aggregate
> simultaneously.
> 
> I guess if you could present a fact table and all of its attributes
> pre-joined -- so each record in the fact table would have a year and a
> quarter -- then this API for non-JDBC fact data would be workable.
> 
> I don't want to put you off. I'd like to continue this discussion, since it
> would give us the opportunity to use a distributed cache or to delegate work
> to one or many slave servers.
> 
> Julian
> 
>> -----Original Message-----
>> From: mondrian-bounces at pentaho.org 
>> [mailto:mondrian-bounces at pentaho.org] On Behalf Of Torsten Schlabach
>> Sent: Friday, August 07, 2009 9:24 AM
>> To: mondrian at pentaho.org
>> Subject: [Mondrian] Integrating data from non-JDBC data sources
>>
>> Dear list!
>>
>> I am sure this question has been asked several times already, 
>> but I was unable to find any usable answers.
>>
>> My goal is to make Mondrian access a fact table from a data 
>> source which isn't JDBC. Is there any code available yet for 
>> that? Should this be easy to implement or are there any 
>> design decisions in Mondrian / PDI which would make this a 
>> larger endeavor?
>>
>> Any pointers to information which I have missed are very welcome.
>>
>> Regards,
>> Torsten
>> _______________________________________________
>> Mondrian mailing list
>> Mondrian at pentaho.org
>> http://lists.pentaho.org/mailman/listinfo/mondrian
>>
>>
> 
> _______________________________________________
> Mondrian mailing list
> Mondrian at pentaho.org
> http://lists.pentaho.org/mailman/listinfo/mondrian



More information about the Mondrian mailing list