<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=us-ascii">
<STYLE type=text/css><!-- DIV {margin:0px;} --></STYLE>
<META content="MSHTML 6.00.6000.16546" name=GENERATOR></HEAD>
<BODY>
<DIV dir=ltr align=left><SPAN class=345231113-25112007><FONT face=Verdana
color=#000080 size=2>I am really nervous that the feature you are proposing will
be perfect to you but not very useful to anyone else. </FONT></SPAN><SPAN
class=345231113-25112007><FONT face=Verdana color=#000080 size=2>I cannot afford
to add yet another mechanism to mondrian without it fulfilling at least one
major feature. Measures with custom aggregation paths would be such a feature;
and writeback would be another.</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=345231113-25112007><FONT face=Verdana
color=#000080 size=2></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=345231113-25112007><FONT face=Verdana
color=#000080 size=2>Writeback tables would, I think, have to be able to contain
cells at different levels of aggregation. If a measure was 'writeback enabled',
mondrian would look for values in the writeback table before trying to read from
the fact table or aggregate tables.</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=345231113-25112007><FONT face=Verdana
color=#000080 size=2></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=345231113-25112007><FONT face=Verdana
color=#000080 size=2>I have already described how I think we could support
measures with custom aggregators. The system designer could choose whether the
measure would exist in the fact table, or only be in aggregate tables with the
required column name.</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=345231113-25112007><FONT face=Verdana
color=#000080 size=2></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=345231113-25112007><FONT face=Verdana
color=#000080 size=2>I agree that it would be awkward to have ETL data mixed in
with writeback data. But remember that aggregate tables are a mechanism, not a
process. You can design aggregate tables which contain only writeback data, and
you can design other aggregate tables which are populated by an ETL
process.</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=345231113-25112007><FONT face=Verdana
color=#000080 size=2></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=345231113-25112007><FONT face=Verdana
color=#000080 size=2>I think that the 'aggregator='none' ' mechanism I proposed
would achieve the effect that you want, but it would be a bit more work. If I
help you implement this feature, are you prepared to do it this
way?</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=345231113-25112007><FONT face=Verdana
color=#000080 size=2></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=345231113-25112007><FONT face=Verdana
color=#000080 size=2>Julian</FONT></SPAN></DIV><BR>
<BLOCKQUOTE
style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #000080 2px solid; MARGIN-RIGHT: 0px">
<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>
<HR tabIndex=-1>
<FONT face=Tahoma size=2><B>From:</B> mondrian-bounces@pentaho.org
[mailto:mondrian-bounces@pentaho.org] <B>On Behalf Of </B>michael
bienstein<BR><B>Sent:</B> Friday, November 23, 2007 5:05 PM<BR><B>To:</B>
Mondrian developer mailing list<BR><B>Subject:</B> [Mondrian] Again on the
non-aggregable measure<BR></FONT><BR></DIV>
<DIV></DIV>
<DIV style="FONT-SIZE: 12pt; FONT-FAMILY: arial,helvetica,sans-serif">
<DIV>Julian,<BR><BR>First off I can see where you are coming from and I see
our major difference here is that I have a project where I have to fix an
existing application to work quicker and my idea was to simply adapt Mondrian
to fit *quickly*. You - completely understandably - want a much more
general feature. If we can work around how to deal with this difference
the rest will flow easily.<BR><BR>BTW, the reason I want Mondrian is not for
the cool MDX stuff, it's the cache. The technology I use currently has
to go to disk each time to read the rows. Mondrian can keep it in
memory. The other reason is that Java servlets allows one process in the
OS to handle multiple requests but the current technology has one process per
request and each one hits the disk each time! As a result I have an upper
limit on the number of concurrent users of about 50 before the OS has trouble
and the CPU is at 100% not on doing computation but on thrashing between the
jobs and giving time to the file system to work out how to handle the
load. I need more users in parallel so I want to load it into memory and
run it off one memory image for 250 users in parallel.<BR><BR>So on your
ideas:<BR>We have two different requirements that are both valid but are
essentially different. You want to have more control in the schema about
how to aggregate measures which use out-of-the-ordinary aggregations and use
this to leverage modern SQL to generate aggregate tables based on the fact
table. I.e. data in the aggregate tables is still *dependent* on the
fact table - they just speed up performance. I on the other hand want
aggregate tables that are not derived from the fact table. It is
essentially "write-back" data. In my example the users write this data
back in a separate application and the data is rolled into the nightly ETL
job. I haven't talked about write-back because I don't need to go
through Mondrian to get this, but that's essentially what it is. This
data is still independent because there is explicitly no rollup possible from
a fact table to an aggregation table. As you can see these are very
different requirements.<BR><BR>Now it is technically possible to create
aggregate tables that contain some measure columns that were calculated from
the fact table and also some columns already there from the ETL job.
It's ugly though. You would have to modify the table created by the ETL
job to append the pre-calculated data to. Adding columns to tables that
were created from the ETL makes me shiver. It's doable but you risk too
much. I can see a benefit in terms of disk space in that the level
columns don't have to be duplicated in separate tables, but if we are spending
time optimizing for disk when disk drives are cheap (and just for write-back
data which can't be huge because it's created only by employees, not external
systems), then I think it's a bit of wasted effort especially since the
cheaper memory gets the more Mondrian is probably going to just read the
tables once into memory and the disk space won't count.<BR><BR>If you want to
keep the schema metadata simple by putting non-aggregable measures into the
same cube as the normal measures then we should allow for measures to define
different aggregate tables at each level of aggregation. E.g. "Volume"
at Region*Month isin table "normalagg_region_month" while "Budget" at
Region*Month is in table "budget_region_month". That's doable in the XML
without making it too terrible, but after that, is it worth it? I'm not
sure if it would be worth it for you. I know for my project it won't be
though. I therefore (re-)propose just having a whole RolapStar for
pre-prepared write-back cell data. I am not considering more interesting
write-back cases such as "Budget for January for Stationary is 1000 so the
implied budget for pencils on 4 January is 1000/31/#expected ratio of pencils
cost to total stationary". These sorts of more complicated implications
that roll-down rather than up are completely out of my scope.<BR><BR>Hoping
you agree to keep the two concepts
separate,<BR><BR>Michael<BR></DIV></DIV><BR>
<HR SIZE=1>
Ne gardez plus qu'une seule adresse mail ! <A
href="http://www.trueswitch.com/yahoo-fr/">Copiez vos mails</A> vers Yahoo!
Mail </BLOCKQUOTE></BODY></HTML>