[Mondrian] Mondrian Roles and Cell Level Access - A Post Mortem

Richard Emberson remberson at edgedynamics.com
Mon May 14 10:15:01 EDT 2007


Mondrian Roles and Cell Level Access - A Post Mortem


Current Mondrian allows one to permission a Hierarchy's leaf level
members. Queries run at the leaf level will then only display
those permissioned members. Queries at higher levels, that is, at
aggregate levels, with such a permissioning, the results do not
reflect the leaf level permissioning.
Rather, all leafs are aggregated into the generated results.
For some, this is the behavior that is desired, but, for others,
this is viewed as a security hole - a user should only see what
they are permissioned to see even at higher, aggregate levels.
I have just such a customer requirement.

Last month I attempted to patch Mondrian to address this requirement.
A further requirement is that a given dimension can have up to 1
million leaf members.
While these million leaf members will not have the same parent member,
one could have tens of thousands of leaf members that do have the same
parent. What is important about this requirement is that it
pretty much says that one can not enumerate the permitted member names
in some SQL clause - there are just too many.
Most customers do not have such large dimensions nor do they have
so many leaf level siblings - FoodMart certainly does not, but it can occur
(I have such a customer).

A further requirement is that performance be comparable to a 
non-permissioned
system. This implies that aggregate tables still be used in some
manner and that the in-memory aggregate cache also be used.

The last requirement was that the modifications to Mondrian be small - I
did not want to re-write whole parts of the system.

The Requirements:

     1) Aggregate results include only data from permissioned members.
     2) Support large number of permissioned members.
     3) Performance using aggregate tables and in-memory aggregate cache.
     4) Point modifications to Mondrian only.

What I ended up with was a version of Mondrian that fulfilled all the
requirement but the requirement dealing with aggregate tables.
I actually had an earlier version that even allowed the use
aggregate tables but some had issues with the approach I took,
so I backed it out.

Changes to Mondrian code:

Role.CellLevel
     First, I extented the Role interface with a new interface called
     Role.CellLevel with the following additional methods:
         int getId();
         boolean hasAggregateTables();
         boolean hasAllPermissions();
         void initialize(Schema s, DataSource dataSource);

     In the initialize() method in the Role.CellLevel implementation class,
     for each Hierarchy in the schema that was permissioned,
     walked down the Hierarchy's Levels determining the Access value per 
Member
     for the current user (role).
     This approach was taken because our member-role permissions were
     stored in the database for leaf-level members only and the permissions
     of intermediate level members had to be inferred from that of their
     child members.

     Also, the Role.CellLevel interface has the following inner (but static)
     class:
         public class Context {
             public static final ThreadLocal<Role> Current =
                 new ThreadLocal<Role>();
         }
     This was used to pass the Role down the Mondrian execution stack.
RolapConnection
     In RolapConnection's setRole() method, if the Role is an instance 
of the
     Role.CellLevel, then its initialize() method is called allowing one to
     initialize the Role from the Schema and DataSource.

     In the RolapConnection's execute() method, the RolapConnection's 
current
     Role was registered with the current thread by calling:
         Role.CellLevel.Context.Current.set(getRole());

RolapStar aggregate cache
     Role-based aggregate caching in RolapStar was implemented by
     changing all of the caches from using a BitKey as the template Map key
     to using an object which contains both the BitKey and the thread's
     current Role.
     No other changes were required to all that RolapStar aggregate
     caching code!!!

Sql generation and schema definition
     In the schema definition of a Hierarchy, I added a new element,
     "PermissionExpr" (same base type as "MemberExpr") that contained
     a where-clause predicate to be used for determining the access
     permission of a given role for that Hierarchy. For example, the
     Customer Hierarchy might have the template predicate (for mysql):

     `FACT_TABLE_NAME`.`FACT_TABLE_FOREIGN_KEY` in (
         select `customer_permissions`.`customer_id`
         from `customer_permissions` as `customer_permissions`
         where `customer_permissions`.`role_id` = ROLE_ID )

     Where the key word FACT_TABLE_NAME and FACT_TABLE_FOREIGN_KEY
     are replaced with the query's Cube's fact table and the
     foreign key to the Hierarchy table and the key word ROLE_ID
     is replaced with the current user's role id value prior to
     SQL execution.

     In the low-level classes where SQL is generated,
         mondrian.rolap.agg.AggregationManager
         mondrian.rolap.agg.SegmentArrayQuerySpec
         mondrian.rolap.agg.AbstractQuerySpec
     I made modification so that when the current thread's Role was of type
     Role.CellLevel, then, for each Hierarchy in the query that had
     the schema template where-clause predicate (in the PermissionExpr 
element),
     the predicate's key words were replaced and it was added to the
     queries generated SQL.

Aggregate Tables
     For aggregate tables, I added a new column, called "role_id" to
     all aggregates, created three roles with permission tables for
     the Product and Customer dimesions and modified the db loading code
     so that it created aggregate table rollups for not only the default,
     full permission, role; but also the other three partially
     permissioned roles.

     In the class
         mondrian.rolap.agg.AggQuerySpec
     the where-clause predicate
         role_id = ROLE_ID
     where ROLE_ID is replaces with the current user's role id was added
     to the generated SQL.

Loading Mondrian
     When loading the Mondrian dataset (MondrianFoodMartLoader) the
     two test permission tables were created (product_permissions and
     customer_permissions), three test user roles created with entries
     in the permission tables and aggregate table generated that
     included the normal roleup with default role id -1, as well as
     roleups for the three test user roles.

It all worked just fine. Aggregate tables were used as needed and the
in-memory cache, cached data per role.

The fact that I had created aggregate table entries per role id
was deemed unacceptable, so I backed out
all of the runtime and db load code associated with the "role_id" column
in aggregate tables and in the AggregationManager. Now, if the Role
was a Role.CellLevel, I forced the use of the base fact table
with role id base permission where-clause expressions - aggregate tables
were no longer used if the user had a role id.

This also worked but was slower. Trying to figure out how to generate
auto-magically the correct constraints based upon current Role's accesses
permissions within the RolapStar code (which may be the correct way
to do it) was too hard to do in the two days remaining before
code freeze - so I punted.

I should mention that in the JPivot MondrianModel initialize() method
I take the MondrianModel's new Role instance variable and if not null,
set the Connection's Role to it. It is the JPivot MondrianModel
that our JPivot/Mondrian container has access to.

A final thought, the approach I took to creating and using
aggregate tables (adding a role_id column, rolling up per role id,
and generating SQL with the where clause predicate "role_id = ROLE_ID")
was deemed "not olap-like". Of course, it certainly is rolap-like since
rolap involves using (leveraging) a relation database to generate olap 
results.
Except for the fact that creating all of the additional rollups, one per
role, for the aggregate table - which is a true negative. I view the
distinction as being orthodoxy versus practicality. It would certainly be
good if the role-base constrains could be added to the generated SQL
by the RolapStar (AggStar) code, but if you have a situation where
there are hundred of thousands of leaf members and, say, a given
role is permissioned to access fifty thousand of them (and these leaf
level members can not be summarized as access to their parent(s) members
- a parent member has 100,000 leaf members but only 50,000 are
permissioned for the role in question) then one can not generate
SQL that explicitly list the permitted leaf members; it just does
not scale. Rather if one took a rolap approach and inserted the
PermissionExpr(s) SQL into the generated aggregate table query SQL,
the database would be doing the work and one could have large numbers
of permissioned leaf members for a given role.

Richard


-- 
Quis custodiet ipsos custodes:
This email message is for the sole use of the intended recipient(s) and
may contain confidential information.  Any unauthorized review, use,
disclosure or distribution is prohibited.  If you are not the intended
recipient, please contact the sender by reply email and destroy all
copies of the original message.



More information about the Mondrian mailing list