<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=us-ascii">
<STYLE type=text/css><!-- DIV {margin:0px;} --></STYLE>
<META content="MSHTML 6.00.6000.16414" name=GENERATOR></HEAD>
<BODY>
<DIV dir=ltr align=left><FONT face=Verdana color=#000080
size=2></FONT> </DIV><FONT face=Verdana color=#000080 size=2></FONT><BR>
<BLOCKQUOTE
style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #000080 2px solid; MARGIN-RIGHT: 0px">
<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>
<HR tabIndex=-1>
<FONT face=Tahoma size=2><B>From:</B> mondrian-bounces@pentaho.org
[mailto:mondrian-bounces@pentaho.org] <B>On Behalf Of </B>michael
bienstein<BR><B>Sent:</B> Friday, March 09, 2007 2:36 AM<BR><B>To:</B>
Mondrian developer mailing list<BR><B>Subject:</B> [Mondrian] Multithreading
etc<BR></FONT><BR></DIV>
<DIV></DIV>
<DIV
style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif">
<DIV>I sent this to the list but it gets bounced because I attached the code
in a zip file. How do I send code through without checking it in because
it is still orthogonal to the codebase?</DIV></DIV></BLOCKQUOTE>
<DIV><SPAN class=210092408-10032007><FONT face=Verdana color=#000080 size=2>Have
you tried attaching a zip file to a forum thread?</FONT> </SPAN></DIV>
<DIV><SPAN class=210092408-10032007><FONT face=Verdana color=#000080
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=210092408-10032007><FONT face=Verdana color=#000080
size=2>Alternatively, you could send to mondrian-devel. That list still works,
although it's not much used anymore.</FONT></SPAN></DIV>
<BLOCKQUOTE
style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #000080 2px solid; MARGIN-RIGHT: 0px"><FONT
face=Verdana color=#000080 size=2></FONT>
<DIV
style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif"><BR>Well,
I have code that works for multi-threading infrastructure so I would like to
know if it is worth continuing with this or not.<BR><BR>As for ROLLUP/CUBE my
thoughts are:<BR>1) Either we keep the codebase simple by sticking to a
standard (SQL2003) even if this standard is not yet implemented widely and
certain databases have better special features than others, or we allow a
per-database SQL generation system. The argument for the second makes
sense only if the developer resources to write and maintain each dialect comes
from the database vendor or their community. Mondrian is probably at a
stage that such discussions can be undertaken with the database vendors.<SPAN
class=210092408-10032007><FONT face=Verdana color=#000080
size=2> </FONT></SPAN></DIV>
<DIV
style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif"><SPAN
class=210092408-10032007><FONT face=Verdana color=#000080
size=2></FONT></SPAN> </DIV></BLOCKQUOTE>
<DIV
style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif"><SPAN
class=210092408-10032007><FONT face=Verdana color=#000080 size=2>I've been
talking with Matt Campbell (mkambol) about how this could be implemented.
Apparently Oracle, DB2 and Teradata (the main platforms of interest to Matt)
implement the "GROUP BY GROUPING SETS" construct which we will need, with the
same syntax.</FONT></SPAN></DIV>
<DIV
style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif"><SPAN
class=210092408-10032007><FONT face=Verdana color=#000080
size=2></FONT></SPAN> </DIV>
<DIV
style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif"><SPAN
class=210092408-10032007><FONT face=Verdana color=#000080 size=2>Grouping sets
are good because they allow us to specify exactly which groups we want the DBMS
to return. If we had used the ROLLUP construct, we would have had to write logic
in mondrian to figure out which aggregations could be grouped together in the
same query. But with GROUPING SETS, the DBMS can figure out which
aggregations can be computed by rolling others.</FONT></SPAN></DIV>
<DIV
style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif"><SPAN
class=210092408-10032007><FONT face=Verdana color=#000080
size=2></FONT></SPAN> </DIV>
<DIV
style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif"><SPAN
class=210092408-10032007><FONT face=Verdana color=#000080 size=2><SPAN
class=210092408-10032007><FONT face=Verdana color=#000080 size=2>We will also
need the GROUPING function.</FONT></SPAN></FONT></SPAN></DIV>
<DIV
style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif"><SPAN
class=210092408-10032007><FONT face=Verdana color=#000080
size=2></FONT></SPAN> </DIV>
<DIV
style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif"><SPAN
class=210092408-10032007><FONT face=Verdana color=#000080 size=2>Since these
three databases support what we need, I am inclined to stick to the standard. I
haven't checked whether other databases support this syntax, but I am hopeful
that they do, or soon will.</FONT></SPAN></DIV>
<BLOCKQUOTE
style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #000080 2px solid; MARGIN-RIGHT: 0px">
<DIV
style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif"><SPAN
class=210092408-10032007><FONT face=Verdana color=#000080
size=2></FONT></SPAN> </DIV>
<DIV
style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif"><SPAN
class=210092408-10032007> </SPAN><BR>2) Architecturally this implies
loading multiple Aggregations from one SQL query. That requires a
rethink of the way the cell cache loading is done because at the moment an
Aggregation is loaded one at a time and in a synchronized block on the
Aggregation. Similar concerns have to be dealt with for in-memory
rollups. I think that synchronized is too forceful. We need
something more like a Lock from java.util.concurrent so we can do
tryLock(). Look at the TxLock idea I have in the code I'm
attaching.<SPAN class=210092408-10032007><FONT face=Verdana color=#000080
size=2> </FONT></SPAN></DIV></BLOCKQUOTE>
<DIV
style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif"><SPAN
class=210092408-10032007><FONT face=Verdana color=#000080 size=2>Yes, this issue
came to light in our design discussions also.</FONT></SPAN></DIV>
<DIV
style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif"><SPAN
class=210092408-10032007><FONT face=Verdana color=#000080
size=2></FONT></SPAN> </DIV>
<DIV
style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif"><SPAN
class=210092408-10032007><FONT face=Verdana color=#000080 size=2>I look forward
to reading your code, but it occurs to me that we can leverage
</FONT></SPAN><SPAN class=210092408-10032007><FONT face=Verdana color=#000080
size=2>aggregations' state of 'ready' or 'loading'. We could
upgrade this to a lock, so another thread can wait for a loading aggregation to
become ready.</FONT></SPAN></DIV>
<DIV
style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif"><SPAN
class=210092408-10032007><FONT face=Verdana color=#000080
size=2></FONT></SPAN> </DIV>
<DIV
style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif"><SPAN
class=210092408-10032007><FONT face=Verdana color=#000080 size=2>Synchronized
will still need to be used, and carefully, to ensure that no thread ever sees
the system in an inconsistent state.</FONT></SPAN></DIV>
<BLOCKQUOTE
style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #000080 2px solid; MARGIN-RIGHT: 0px">
<DIV
style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif"><SPAN
class=210092408-10032007> </SPAN><BR><BR>As for multi-threading:<BR>I
have only written most of the base infrastructure, not the cell loading.
To integrate would require a significant amount of work in Mondrian's code to
pass all interaction with Mondrian through TxSystem.runWithTx().
<BR><BR>Basic concerns are:<BR></DIV>
<DIV class=MsoNormal
style="FONT-SIZE: 12pt; MARGIN-LEFT: 36pt; TEXT-INDENT: -18pt; FONT-FAMILY: times new roman, new york, times, serif"><SPAN><SPAN>1)<SPAN
style="FONT-WEIGHT: normal; FONT-SIZE: 7pt; LINE-HEIGHT: normal; FONT-STYLE: normal; FONT-VARIANT: normal; font-size-adjust: none; font-stretch: normal">
</SPAN></SPAN></SPAN><SPAN>Threads should be able to share data related to the
request across the threads.</SPAN></DIV>
<DIV class=MsoNormal
style="FONT-SIZE: 12pt; MARGIN-LEFT: 36pt; TEXT-INDENT: -18pt; FONT-FAMILY: times new roman, new york, times, serif"><SPAN><SPAN>2)<SPAN
style="FONT-WEIGHT: normal; FONT-SIZE: 7pt; LINE-HEIGHT: normal; FONT-STYLE: normal; FONT-VARIANT: normal; font-size-adjust: none; font-stretch: normal">
</SPAN></SPAN></SPAN><SPAN>A Thread should be loaned to a request and returned
in a way that is well-nigh fail-safe (i.e. the thread shouldn’t keep running
of the request fails in some way).</SPAN></DIV>
<DIV class=MsoNormal
style="FONT-SIZE: 12pt; MARGIN-LEFT: 36pt; TEXT-INDENT: -18pt; FONT-FAMILY: times new roman, new york, times, serif"><SPAN><SPAN>3)<SPAN
style="FONT-WEIGHT: normal; FONT-SIZE: 7pt; LINE-HEIGHT: normal; FONT-STYLE: normal; FONT-VARIANT: normal; font-size-adjust: none; font-stretch: normal">
</SPAN></SPAN></SPAN><SPAN>We should be able in a parameter of some sort
decide to NOT use threads at all.</SPAN></DIV>
<DIV class=MsoNormal
style="FONT-SIZE: 12pt; MARGIN-LEFT: 36pt; TEXT-INDENT: -18pt; FONT-FAMILY: times new roman, new york, times, serif"><SPAN><SPAN>4)<SPAN
style="FONT-WEIGHT: normal; FONT-SIZE: 7pt; LINE-HEIGHT: normal; FONT-STYLE: normal; FONT-VARIANT: normal; font-size-adjust: none; font-stretch: normal">
</SPAN></SPAN></SPAN><SPAN>The number of threads should be
configurable.</SPAN></DIV>
<DIV class=MsoNormal
style="FONT-SIZE: 12pt; MARGIN-LEFT: 36pt; TEXT-INDENT: -18pt; FONT-FAMILY: times new roman, new york, times, serif"><SPAN><SPAN>5)<SPAN
style="FONT-WEIGHT: normal; FONT-SIZE: 7pt; LINE-HEIGHT: normal; FONT-STYLE: normal; FONT-VARIANT: normal; font-size-adjust: none; font-stretch: normal">
</SPAN></SPAN></SPAN><SPAN>There should be an independence from the rest of
the code base.</SPAN></DIV>
<DIV class=MsoNormal
style="FONT-SIZE: 12pt; MARGIN-LEFT: 36pt; TEXT-INDENT: -18pt; FONT-FAMILY: times new roman, new york, times, serif"><SPAN><SPAN>6)<SPAN
style="FONT-WEIGHT: normal; FONT-SIZE: 7pt; LINE-HEIGHT: normal; FONT-STYLE: normal; FONT-VARIANT: normal; font-size-adjust: none; font-stretch: normal">
</SPAN></SPAN></SPAN><SPAN>We should be able to make use of custom thread
pools or use managed thread pools from the application server.</SPAN></DIV>
<DIV class=MsoNormal
style="FONT-SIZE: 12pt; MARGIN-LEFT: 36pt; TEXT-INDENT: -18pt; FONT-FAMILY: times new roman, new york, times, serif"><SPAN><SPAN>7)<SPAN
style="FONT-WEIGHT: normal; FONT-SIZE: 7pt; LINE-HEIGHT: normal; FONT-STYLE: normal; FONT-VARIANT: normal; font-size-adjust: none; font-stretch: normal">
</SPAN></SPAN></SPAN><SPAN>Then there is a relatively minor issue with
read-consistency for near-real-time data that turns out to be a real
head-ache.<SPAN> </SPAN>This can be done by either: using the
transaction semantics of the underlying data store <U>or</U> modifying all SQL
requests and cache interactions with a timestamp and/or transaction id of some
sort.<SPAN> </SPAN>E.g. when an MDX requests begins it asks the
underlying data store for the id of the last completed transaction that
modified data and keeps this in a request-scope available to all
threads.<SPAN> </SPAN>Then it appends “changedTxId <=
${lastTxIdWhenFirstEntered}” to each WHERE clause.<SPAN> </SPAN>If
however we use the underlying data store’s transactions then we must keep open
the JDBC Connection for the duration of the request reusing it on the same
thread for each interaction with that data store.</SPAN></DIV>
<DIV class=MsoNormal
style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif"><SPAN>Now,
I think that the best way to take advantage of multiple threads in the storage
system is NOT launching multiple SQLs on the same star schema but different
aggregations but rather to use <U>partitioning</U> of data.<SPAN>
</SPAN>That is to segment the cell data (and maybe dimension data) based on
values of certain columns.<SPAN> </SPAN>For example year<2007 and
year=2007 in<SPAN> </SPAN>two different partitions.<SPAN>
</SPAN>This can be introduced slowly by simply making a RolapStar one
Partition for the moment.<SPAN> </SPAN>Having said that aggregation
tables are also a type of Partition and hitting two of them at once should be
quite easy.</SPAN></DIV>
<DIV
style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif">So
the design I am introducing has the following features:<BR>1) A scope for
"request" or "interaction" that is larger than the Thread that begins
it. Since this is similar to a transaction I've called it a Tx.
See the mondrian.tx package. Each sub-system in Mondrian can enlist a
representation of itself in the Tx.<BR>2) Break up the different tasks
performed into Task objects that can be run potentially in parallel.
Allow a set of Tasks to be tied to the same Thread so that the same JDBC
Connection can be used for all of them for read-consistency and cleaned up at
the end of the Tx. This is done declaratively so the implementation can
be changed easily. The implementation can also ensure that the J2EE
context is passed onto separate threads (JNDI, context class loader
etc).<BR>3) A system of fail-quick locks at the Tx scope rather than just
Thread scope. <BR><BR>If this is worth persuing as a design for the next
version then good. If not I'll stop now.<SPAN
class=210092408-10032007><FONT face=Verdana color=#000080
size=2> </FONT></SPAN></DIV>
<DIV
style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif"><SPAN
class=210092408-10032007><FONT face=Verdana color=#000080
size=2></FONT></SPAN> </DIV></BLOCKQUOTE>
<DIV
style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif"><SPAN
class=210092408-10032007><FONT face=Verdana color=#000080 size=2>This definitely
sounds plausible... I'd like to read through your code before I answer in
detail.</FONT></SPAN></DIV>
<DIV
style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif"><SPAN
class=210092408-10032007><FONT face=Verdana color=#000080
size=2></FONT></SPAN><SPAN class=210092408-10032007><FONT face=Verdana
color=#000080 size=2></FONT></SPAN> </DIV>
<DIV
style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif"><SPAN
class=210092408-10032007><FONT face=Verdana color=#000080
size=2>Julian</FONT></SPAN></DIV></BODY></HTML>