Tackling Expiration Dates in Pentaho

Posted by admin on December 16, 2010  /   Posted in BI and Custom Development, Data Best Practices

The title may be a little misleading because this article encompasses a wider scope than just Expiration dates.  But I deliberately use that title because Expiration date is probably the most commonly found case of date-triggered state (or status) changes.  Of which this article is about.

Almost all business entities have a state associated with them at any given time.  Sometime these states are explicitly recorded via some kinds of Status field (column) for instance, sometimes they are implicit.  But all of them undergoes transitions from state-to-state that are governed by some business rules.

This concept is extremely important in the world of BI where transactional information are processed further to give out meanings — or even worse, to make decisions.  And in this article, we’ll see one way to represent it.

Pre-requisite: This article requires some familiarity with MDX as it is being utilized in the Pentaho BI stack (Mondrian OLAP engine, to be exact).

The Schema

Every MDX queries are run against the schema, so let’s talk about some ground rules that are not always easy to find online:

  1. Refresh this string of terminology in your mind: [Dimension.Hierarchy].[Level].<member>.<member-property>.  This is the terminology and hierarchy of a dimension in Mondrian-flavored MDX (which would be the ones we use in this article).
  2. A schema may have more than one Date/Time dimensions.  I didn’t know this, and had to find out about it through necessity.  This actually is quite a useful feature.
  3. A schema may have multiple dimensions that shares one single database table.  Sometimes you *have to have* these in order to display different columns of the table in the result using CrossJoin().  SIDE NOTE: I had an interesting conversation regarding this with Roland Bouman, who regards this as a peculiar limitation that is not imposed by any technical reasons.
  4. Members of two Time dimensions can be used as a condition specifier in functions that accept it.  For example: Filter( [TimeDim1].[Level].Members, [TimeDim].CurrentMember.Name > CurrentDateMember([TimeDim2], “<some-formatting>”).Lag(30).Name)
    IMPORTANT: Notice that the comparison is made between the ‘name’ property of the two Time dimension members.  You can’t seem to make the comparison at the member level directly.

The Problem At Hand

Let’s say your Customers have different states that they transition from and to based on some business rules.  For example customers that hasn’t ordered any gadgets from your company for more than two years will be given INACTIVE status — which by the way, is a red flag for your sales force that it’s time to go after them again.

Let’s say you want to show this in your salesperson’s Pentaho dashboard, so they can be reminded of this fact everyday they log into your system.

One Possible Solution

One way to tackle this is to create an ETL script that will attach a date column to the Customer dimension table.  Why the customer dimension? why not on the fact table itself?  Two reasons:

  1. Customer dimension may be used by more than one fact tables, if we can find out when a certain customer switches status using the dimension, we only have to do this once.
  2. Customer dimension has the correct granularity.  The status changes for a customer, not at the orders level, not at any other levels.  So the dimension is it.

After we attach the status change date information onto the Customer dimension, we then can use it in an MDX query.  Which makes sense if we are talking about any reports, cubes, or dashboards.

How do we use it? consider the following query:

select NON EMPTY {[Measures].[Sales]} ON COLUMNS,

NON EMPTY Crossjoin([Customer].[Name].Members, Filter([Status.Switch].[Date].Members, [Status].CurrentMember.Name > CurrentDateMember([Date.Timeline], “[“”Date.Timeline””].[“”Date””].[yyyymmdd]”).Lag(60.0).Name) ON ROWS

from [Sales]

Things to notice in the above MDX query:

  • The goal is to display customer names whose status has changed in the past 60 days.
  • The date when the customer status changes is recorded in the Customer dimension table, and defined in the schema as [Status] dimension, the hierarchy name is ‘Switch’, and the level is [Date], which is encoded as a string in YYYYMMDD format.  NOTE: I chose this format because it can be sorted ordinally.  That’s why it works for <, > or = conditions.
  • The powerful Filter() is used to evaluate the condition.
  • I defined two dimensions: [Customer] and [Status] that are keyed against the same customer dimension table.  This way I can use the [Status] dimension in the Filter() and use the [Customer] dimension to display the name of the customer.
  • It is very important to specify the correct Level in the first parameter to the Filter function, otherwise you’ll never get a match.  Read this sentence again, it took me almost one full day to figure this out.
  • I use the equally powerful CurrentDateMember() to get the member of yet another dimension [Date] that has the same format as the [Status] member.  To get to know this very useful function, read this primer by Diethard Steiner.
  • Then I use the .Name property of the two “YYYYMMDD” members in the condition.  This Member.Name facility is extremely powerful and flexible, but only after you understand what is going on.

The Result

Goal accomplished, now I have a list customers whose status has changed in the past 60-days.  A very useful list to be shown to a salesperson.  Of course I still have to wire this query into a dashboard or report, but that’s beyond the scope of this article.

And now you have a full know-how to tackle various reporting and query needs associated with date-sensitive state changes.  Again, Expiration date is just one of the most common instance of this problem.  In reality, there are tons of other scenarios that will fit this one.

Post a Comment

We serve businesses of any type and size
Contact Us Today