Business Owners, Are You Taking Care of Your Data?

Posted under BI and Data Warehousing articles on March 19th, 2012

by Will Gunadi

Background

Regardless the size of your business, when it has been running for a while, you will have at hand what all businesses accumulate over time.  No, in this case we are not talking about profits nor debts, rather, it is about: Data.

Just by the fact that they operate, businesses will gather various types of data.  More importantly, business data is the lifeblood of every business decision made.  The higher the data quality is, the better the decisions you could be making.

Problem is, when it’s not sufficiently reviewed and monitored, all accumulated data will start to lose its integrity and accuracy, resulting in misleading indicators and measurements, which will yield two of the most damaging factors for a business: Loss of opportunity and Hidden costs.

How does business data go bad?

Over time, any computerized business’ systems will start to have bad data.  This is not something mysterious, in fact, as a savvy business owner, you should be expecting it. Here are some of the ways for bad data to creep into a system:

  • Bad data entered by users that is not caught by the existing validation rules
  • Policy or business rule changes
  • Program bug due to the implementation of the policy changes
  • Structural changes to external data (eg. tax rate, zip code, area code, ISO specifications)

In essence, your computer systems change as your business changes due to modifications to internal policies or to external regulations.  It is inevitable that during these modifications, bad data is entered and processed into the systems.  Sure, we could minimize the negative impacts with good software quality testing practices, but we would not be able to catch them all.

The only effective way to combat bad data caused by these unavoidable changes, you need a system in place that will allow you to see the data comprehensively in order to locate these bad data early before they become real problems.

Meet OLTP and OLAP systems


In this day and age, Online Transaction Processing (OLTP) systems have become the norm, even for small to medium businesses.  Most people nowadays rely on some kind of computer-based ERP (Enterprise Resource Planning) system to run their businesses, maybe a consolidated email/calendar package, an accounting package, some combinations of order entry, invoicing, shipment, warehousing  modules, all of which are backed by a database system.  These are the typical components of an OLTP system.

This is where most small and medium businesses stop; which is fine apart from the fact that you only have one way to view your business data, and a non-intuitive one at that. Making data quality maintenance harder than it should be.

An OLAP (On-Line Analytic Processing) system was designed to provide you with an alternate way to view and analyze your business data; more accurately, a much more effective way.

Large corporations and big businesses spent billions of dollars every year to develop, utilize, and maintain OLAP systems to complement their OLTP ones.  They know that without a constant effort to monitor and measure business data, it can go out of hand very quickly.

Unfortunately, for most small to medium businesses, the value of setting up an OLAP system has not yet become clear. Why? Partly a mindset issue.  To understand the value, you have to:

  1. Realize the two distinct approaches in handling the same business data, one is geared towards data entry and processing (OLTP), the other, data analysis and decision-making (OLAP).
  2. Accept that analyzing your own data is not a waste of time and resource, rather, it is a vital part of not only running your business but – more importantly – improving it.

Adding insult to injury, the cost of implementing a typical OLAP systems is tantamount to long-term, expensive, and serious commitments.  While there is absolutely nothing wrong with long-term and serious commitment to improve data quality, the expensive part prevents a lot of businesses to even try to setup and utilize OLAP systems.  Which is a pity.

Why can’t I just use one system?

The difference between OLTP and OLAP starts from the underlying structure that holds your business data. OLTP systems has, as its main purpose, to capture in real-time, your business data such as order entries, invoice generations, shipment records, warehouse inventory movement, accounting journal entries, commission calculations, sales tax, etc.

A good OLTP system would have a solid data model that dictates how the above data are stored and processed.  Unfortunately, this very strength causes the data to be extremely efficient for computers to process, but it is not human observable.  At least not without a lot of necessary lookups and remembering bits and pieces of information all over the place.  Again, something that computers are designed to deal with.

Decreasing data quality happens when no one reviews the business data.  Any statistician would tell you that the accuracy of any record is proportional (up to a certain point) to the number of reviews that the record receives.

The same principal applies in this case also. The OLAP system was designed to allows you to review your business data in the manner that is effective for us (humans) to analyze, instead of computers.  However because it is a computerized system, it also provides the automation and number-crunching facility that we can use to generate meaningful reports.

Okay, so what can OLAP systems do for my business?

In a nutshell, a fully operational OLAP system should help you to:

  1. Discover hidden information→
    • Your business data resides everywhere, not just in your OLTP system database.  And it’s not always obvious how they relate to each other.
    • Discovering hidden data may give you an insight on certain aspects of your business that needed attention.
  2. Identify and fix data integrity problems →
    • Keep your business data accurate and up-to-date
    • Verify data relationships over long periods of time
  3. Plan for data expansion, archiving, and storage →
    • As your business grow, so will your data accumulation
    • Fulfill requirements to keep complete records (example: Legal or Tax auditing purposes)
    • Staging area for server movements, upgrades, and failure recovery
  4. View your data history, trends, and movement with improved clarity and accessibility →
    • Discover new opportunities revealed in existing data
    • Keep track of aggregate data which is not stored by the OLTP system
    • Forecasting based on trends
    • Better decision-making ability

Can my business afford one?

As mentioned above, you should view an OLAP system as a tool to grow your business.  It is the natural extension to an OLTP system which you have already used.

The good news is, with open source OLAP systems such as Pentaho BI Suite, you now have a good alternative to high-cost systems from SAS, Oracle, Microsoft or IBM.  Open source systems are characterized by freely available installations without any licensing restrictions.

Is it any good? As with OLTP systems, it depends on the implementation and subsequently, the implementor.  A well implemented Pentaho system should be perfect for small to medium businesses not only because of the zero entrance cost, but the complete set of tools that are customizable down to the source-code level.

Of course there is nothing that prevents big businesses to use Pentaho, but along with available budget, comes options to use the other systems.

Conclusion

If your goal is to run a healthy business, the importance of data quality surrounding it can no longer be dismissed as an overhead.  A well-planned implementation of an OLAP system should give you easy access to information that may be hidden in your OLTP systems.

Large corporations have known this for a long time.  OLAP systems are not only considered, it is a prominent part within their plans and budgets.  Today, with the advent of open source systems such as Pentaho BI Suite, the benefits of OLAP system has been made available to a wider range of business sizes.

There is not a single reason for not considering one.  Really.

Tags: , , , , , , ,

Re-mapping Keyboard Keys

Posted under Linux articles on February 18th, 2012

In Linux, it’s easy to remap one or more keys on the keyboard, in this article we’ll see how.

For remapping certain keys you need two tools: xev and xmodmap.

Start terminal window and run xev. Now it’s active and what’s for you pressing a key. Then press a key, it’s behavior you want to change. i.e. PgUp.

Notice the output of the terminal where you start xev from. You should see the following:

state 0x10, keycode 110 (keysym 0xff55, Prior), same_screen YES,

In this example Prior is the name of the action that the key is currently assigned to, remember the number after “keycode”, that’s the internal number for the key.

Now press another key i.e. PgDown, which should output:

state 0x10, keycode 115 (keysym 0xff56, Next), same_screen YES,

Remember the keycode number and the action name (Next in this case). Now let’s say you want to swap this both keys. Here we use the second program: xmodmap.

xmodmap -e "keycode 110 = Next"

This changes the action for the key with keycode 110 to “Next”. Do this for every keycode that you have remembered and apply any actions you wish.

NOTE: These change are for the current X session only and won’t survive a reboot. To do that, record all of the key mappings into a file called .Xmodmap using this command:

xmodmap -pke > .Xmodmap

Then enable this mapping from your .xinitrc (create one inside your home directory if you don’t have one already. Put this command in it:

xmodmap .Xmodmap

You’re done, now the new key mapping will be reactivated every time an X session is initialized for this user.

Tags: , , , , , ,

Android hacking the Linux-style

Posted under Linux articles on April 19th, 2011

Download adb by googling: “adb linux download”
It’s a tar file, untar it somewhere and make it an executable file.

If you’re like me, running 64-bit Linux, you have to install the missing 32-bit library required by adb manually:
For example:

apt-get install libc6-i386
apt-get install lib32stdc++6
apt-get install lib32ncurses5

TIP: use ldd /path/to/adb to see the list of the actual libraries required. Try to do this in Windows :)

Enabling the phone as a USB device.
Become UNIX root:

$vi /etc/udev/rules.d/50-android.rules

SUBSYSTEM==”usb|usb_device”, SYSFS{idVendor}==”0bb4″, MODE=”0660″, GROUP=”plugdev”
SUBSYSTEM==”usb|usb_device”, ATTR{idVendor}==”0bb4″, ATTR{idProduct}==”0c02″, SYMLINK+=”android_adb”
SUBSYSTEM==”usb|usb_device”, ATTR{idVendor}==”0bb4″, ATTR{idProduct}==”0c01″, SYMLINK+=”android_fastboot”


NOTE: The hexadecimal numbers above (0bb4:0c02) are the two-part ID of the phone when you use the lsusb command to list attached usb devices. It’s easy to recognize because the manufacturer and/or phone model should be listed along with the output.


$chmod a+r /etc/udev/rules.d/50-android.rules
$/path/to/adb devices  <-- this should tell you that the phone as been found
$/path/to/adb shell    <-- this will give you an actual UNIX shell

Now you can use adb to interact with your phone at system level, for example:

Backing up your sdcard is as simple as this:

./adb pull /sdcard/ /tmp/sdcard

If your phone has been rooted, you can do the following:

Always start by getting a root shell on the phone:

user@pc$ adb shell
$ su -
#

To remount /system read-write (adb remount):

# mount -o remount,rw /dev/block/mtdblock3 /system

To uninstall an application (adb uninstall):

# rm /system/app/PackageYouNoLongerWant.apk
# pm uninstall package.you.no.longer.want

To put a file somewhere other than the SD-card (adb push):
Put the file on the SD-card as usual.

# mv /sdcard/file /where/you/want/to/put/the/file

Tags: , , , , , , , , ,

User-sensitive Dashboard in Pentaho

Posted under BI and Data Warehousing articles on January 6th, 2011

This time I’d like to address a common task that most of us will encounter at one time or another when we deal with this Business Intelligence (BI) stuff.

The task: Creating info dashboard(s) that presents relevant information based on who the user is.

Implicit in that task is the importance to 1) present information for that user and 2) *only* information for that user.  A good example would be a dashboard for a salesperson.  It should contain information about contacts and customers as it pertains to the said salesperson. No less, not more.

Depending on which way you choose to build your dashboard, the method to accomplish the task differs a bit.  Now you may ask, what are the ways to create dashboards in Pentaho?  As of this writing, there are at least two common ways, using CDF or CDE.  CDF is the mainstay method which has been around for a while, CDE is a new interactive tool produced and contributed by the talents at Webdetail.pt (that’s a web site in Portugal).

NOTE: To read on, it’s best to familiarize yourself with one of the building blocks of Pentaho, the xaction.  In a nutshell, xaction is an XML-based declarative mini-language that allows us to fetch data either from a data source or from a mondrian cube, and outputs the data someplace else (the session object is a good example).

The Question of Identity

To be able to tailor the content of the dashboard with user-sensitive information, we have to get the currently logged in user from somewhere.  From within Pentaho, you can get this information (and more) by tapping into built-in objects provided by Spring Security which is the backbone of the Pentaho authentication system.  The next question is how do we access this information?

The Road Map

Before we answer the previous question, let’s lay out the steps to accomplish this task.  First, we need to decide if we need to do some lookup to map the username (the name users use to log in) into people-friendly name, for example username=jdoe into ‘John Doe’.  If your mondrian schema requires this (as do mine), there is an extra step that you need to do.  Otherwise, skip the next section, or read it anyway since it can also be useful in other scenarios.

Seeding the Session

While your login username will already be contained inside the ’security’ object, we need a place to put the people-friendly name.  A good one is inside the user session, which will retain this information as long as the session is still valid.  And fortunately for us, there is already a hook to put some xaction in Pentaho.  Let me introduce you to pentaho-solutions/system/sessionStartupAction.xml.

This file allows us to insert an xaction that will guaranteed to be executed when a user login.

Insert the following snippet:

<bean>
<property name=”sessionType” value=”org.pentaho.platform.web.http.session.PentahoHttpSession”/>
<property name=”actionPath” value=”Analysis/rules/salesman-username-to-name.xaction”/>
<property name=”actionOutputScope” value=”session”/>
</bean>

Explanation: We want the Pentaho biserver to execute the actions defined in salesman-username-to-name.xaction which is located inside a path starting from the solution called Analysis.  And the third line states that the output of the xaction will be available to be accessed in the ’session’ object.  This bean definition represents one action, if you need two, define two beans.

Next, we need to write the .xaction file.  By the way, the best way to know the mini-language in which you script these files, there are good and working examples under pentaho-solutions/bi-developers.

Here, following the example, I created a directory called ‘rules’ under my main solution path (which itself is right under the pentaho-solutions directory).

I will highlight the important parts (pentaho-solutions/Analysis/rules/biz-rule.xaction):

<inputs>
<user type=”string”>
<sources>
<security>principalName</security>
</sources>
</user>
</inputs>

Explanation: This section defines what would be available in the .xaction as the variable ‘user’ which is seeded with the current value from Spring Security’s principalName object.  The tag surrounding the principalName input defines the scope where the input will be searched within.

<outputs>
<userFullName type=”string”>
<destinations>
<session>fullName</session>
</destinations>
</userFullName>
</outputs>

Explanation: This section tells us that at the end of the execution of this .xaction, the session object will contain an attribute called ’salesmanName’ whose value will be determined by the section below.

<component-name>SQLLookupRule</component-name>
<action-type>fetch full name</action-type>
<action-inputs>
<user type=”string”/>
</action-inputs>
<action-outputs>
<query-result type=”string” mapping=”rsFullName”/>
</action-outputs>
<component-definition>
<query>SELECT name FROM some_transactional_table where user_login_id=’{user}’</query>
<live>true</live>
<jndi>JNDIDataSource</jndi>
</component-definition>

Explanation: This code defines a SQL query which accepts an input called ‘user’, and pass it into a sql query whose result set will be available to the rest of the .xaction as ‘rsFullName’.

<component-name>JavascriptRule</component-name>
<action-type>Extract </action-type>
<action-inputs>
<rsFullName type=”result-set”/>
</action-inputs>
<action-outputs>
<userFullName type=”string”/>
</action-outputs>
<component-definition>
<script>
var userFullName = rsFullName.getValueAt( 0, 0 );
userFullName= userFullName + ”;
</script>
</component-definition>

Explanation: Different from the previous code that is a SQL query, an .xaction can also contain javascript code that acts on the available variables.  The one above simply fetch the value out of the SQL query result, and assign it to a declared variable, which has to have the same name as the <output> section above.

There you have it, because we invoke this .xaction files in the startup of a user session, any dashboards created can use any values initialized by it.

Using the Session variables in Dashboards

Now we are ready to create the user-sensitive dashboards.  I am going to illustrate this using CDE which has a clean separation between layout, behavior and data source.

Using CDA (yet another project from Webdetails.pt) to manage the data source, we can simply use the session variables initialized by the above .xaction within the queries that made up the CDA contents.

If you are using MDX, it will look something like this:

select {[Measures].[Total Month Sales]} ON COLUMNS,
{[Region].[All Regions]} ON ROWS
from [Sales]
where [Salesperson].[${fullName}]

The resulting dashboard will dutifully display information that is relevant to the logged in user.  It’s pretty impressive when you think about it.

Tags: , , , , , , , , ,

Tackling Expiration Dates in Pentaho

Posted under BI and Data Warehousing, Dev Best Practices articles on December 16th, 2010

The title may be a little misleading because this article encompasses a wider scope than just Expiration dates.  But I deliberately use that title because Expiration date is probably the most commonly found case of date-triggered state (or status) changes.  Of which this article is about.

Almost all business entities have a state associated with them at any given time.  Sometime these states are explicitly recorded via some kinds of Status field (column) for instance, sometimes they are implicit.  But all of them undergoes transitions from state-to-state that are governed by some business rules.

This concept is extremely important in the world of BI where transactional information are processed further to give out meanings — or even worse, to make decisions.  And in this article, we’ll see one way to represent it.

Pre-requisite: This article requires some familiarity with MDX as it is being utilized in the Pentaho BI stack (Mondrian OLAP engine, to be exact).

The Schema

Every MDX queries are run against the schema, so let’s talk about some ground rules that are not always easy to find online:

  1. Refresh this string of terminology in your mind: [Dimension.Hierarchy].[Level].<member>.<member-property>.  This is the terminology and hierarchy of a dimension in Mondrian-flavored MDX (which would be the ones we use in this article).
  2. A schema may have more than one Date/Time dimensions.  I didn’t know this, and had to find out about it through necessity.  This actually is quite a useful feature.
  3. A schema may have multiple dimensions that shares one single database table.  Sometimes you *have to have* these in order to display different columns of the table in the result using CrossJoin().  SIDE NOTE: I had an interesting conversation regarding this with Roland Bouman, who regards this as a peculiar limitation that is not imposed by any technical reasons.
  4. Members of two Time dimensions can be used as a condition specifier in functions that accept it.  For example: Filter( [TimeDim1].[Level].Members, [TimeDim].CurrentMember.Name > CurrentDateMember([TimeDim2], “<some-formatting>”).Lag(30).Name)
    IMPORTANT: Notice that the comparison is made between the ‘name’ property of the two Time dimension members.  You can’t seem to make the comparison at the member level directly.

The Problem At Hand

Let’s say your Customers have different states that they transition from and to based on some business rules.  For example customers that hasn’t ordered any gadgets from your company for more than two years will be given INACTIVE status — which by the way, is a red flag for your sales force that it’s time to go after them again.

Let’s say you want to show this in your salesperson’s Pentaho dashboard, so they can be reminded of this fact everyday they log into your system.

One Possible Solution

One way to tackle this is to create an ETL script that will attach a date column to the Customer dimension table.  Why the customer dimension? why not on the fact table itself?  Two reasons:

  1. Customer dimension may be used by more than one fact tables, if we can find out when a certain customer switches status using the dimension, we only have to do this once.
  2. Customer dimension has the correct granularity.  The status changes for a customer, not at the orders level, not at any other levels.  So the dimension is it.

After we attach the status change date information onto the Customer dimension, we then can use it in an MDX query.  Which makes sense if we are talking about any reports, cubes, or dashboards.

How do we use it? consider the following query:

select NON EMPTY {[Measures].[Sales]} ON COLUMNS,

NON EMPTY Crossjoin([Customer].[Name].Members, Filter([Status.Switch].[Date].Members, [Status].CurrentMember.Name > CurrentDateMember([Date.Timeline], “[""Date.Timeline""]\.[""Date""]\.[yyyymmdd]“).Lag(60.0).Name) ON ROWS

from [Sales]

Things to notice in the above MDX query:

  • The goal is to display customer names whose status has changed in the past 60 days.
  • The date when the customer status changes is recorded in the Customer dimension table, and defined in the schema as [Status] dimension, the hierarchy name is ‘Switch’, and the level is [Date], which is encoded as a string in YYYYMMDD format.  NOTE: I chose this format because it can be sorted ordinally.  That’s why it works for <, > or = conditions.
  • The powerful Filter() is used to evaluate the condition.
  • I defined two dimensions: [Customer] and [Status] that are keyed against the same customer dimension table.  This way I can use the [Status] dimension in the Filter() and use the [Customer] dimension to display the name of the customer.
  • It is very important to specify the correct Level in the first parameter to the Filter function, otherwise you’ll never get a match.  Read this sentence again, it took me almost one full day to figure this out.
  • I use the equally powerful CurrentDateMember() to get the member of yet another dimension [Date] that has the same format as the [Status] member.  To get to know this very useful function, read this primer by Diethard Steiner.
  • Then I use the .Name property of the two “YYYYMMDD” members in the condition.  This Member.Name facility is extremely powerful and flexible, but only after you understand what is going on.

The Result

Goal accomplished, now I have a list customers whose status has changed in the past 60-days.  A very useful list to be shown to a salesperson.  Of course I still have to wire this query into a dashboard or report, but that’s beyond the scope of this article.

And now you have a full know-how to tackle various reporting and query needs associated with date-sensitive state changes.  Again, Expiration date is just one of the most common instance of this problem.  In reality, there are tons of other scenarios that will fit this one.

Tags: , , , , , , , ,

How can I download these gazillion links?

Posted under Dev Best Practices, Linux articles on October 6th, 2010

Don’t you hate it when you find yourself needing to download hundreds of files, by clicking on each HTML link, one-at-a-time?

For example, some publishers decided that it is cute to chop their big document/manual/manuscript/research paper/etc. per chapter into their own PDF files.  Not too cute for us who wants to save them all, is it?

Or your ace graphic artist in the Philippines refused to learn Zip and has instead given you access to the directory containing hundreds of images that you need for tomorrow’s demo to the client.

I bet one of your first thought is to view the source of the web page and copy out all of the links, right? That would work, but it still involves a hunt-and-peck method fishing out all the links amidst the hairy HTML/CSS/JS code in the page.

Sure it’s easy if you are a UNIX command line hacker who can spit out the exact grep/awk/perl script to automate this in a heartbeat.  But in this journal we want an easier approach that most of us sophisticated (read: lazy) geeks would want to use.

Well, here is *one* easy way to solve this problem:

Use Firefox (version 3.6 at the time of this writing) and install a nifty add-on called Link Gopher.  After installation and a restart, down at the bottom right corner of the browser there will appear a small word called Links (next to the window resizer).

Go to the web page that listed all the links.

Right-click that Link word, and select Extract All Links. Voila! all of the links are now appear in a neat, clean web page.  Simply copy and paste all the links that you want to download into a text file, and run this command on a terminal:

cat links_list.txt | xargs -P 4 wget

This command will pipe the content of links_list.txt (that is the name of the file you saved the links into, by the way) into xargs, which in turn will fire off 4 parallel processes of ‘wget’ command, each being handed one of the links (NOTE: That -P parameter is pretty slick).

That’s it! now all you need to do is wait.  The output of wget processes downloading each link is also fun to watch.

When all the processes finished, you’d be left with all the files you need, and a text file containing the list of the file in URL format.  Now that’s pretty easy, wouldn’t you say?

Tags: , , , ,

SSO With Pentaho Community Edition

Posted under BI and Data Warehousing articles on September 3rd, 2010

Introduction

Pentaho is an amazing system. Built upon countless man hours from all over the world, it is one of the testament to the effectiveness of the open-source SDLC paradigm.

But here’s the rub, those of us Community Edition users — who for various reasons cannot use the Enterprise Edition — are left on our own when it comes to the more “advanced” features … such as SSO integration.

After messing with this for the last week, with a lot of help from my colleague and probably one of the most useful and fun (yes, fun!) online user community that I’ve dealt with (the #pentaho irc channel), I finally cracked the proverbial nut.

So with my boss’ blessing, I decided to document what I had to do to make this work in the spirit of giving back to the community.  Plus with the rising awareness of the benefits of BI even for small to medium corporations, I have no doubt that this information would be useful for someone somewhere.

Due to variance in SSO setup, I am not implying that the way I set it up will work for yours. That’s all for the obligatory mini-DISCLAIMER.

The Need

If your organization does not have a Single-Sign-On implemented for your enterprise applications, then this writeup is irrelevant. The fact is, SSO is a useful, productivity-boosting feature for users (and developers too) that, while almost always a major pain to setup, the payback is usually worth the hassle.

In this writeup, my scenario revolves around trying to fit Pentaho (Community Edition), which I’ll refer to as PCE from here on, into an existing SSO implementation.

Version Information:
Pentaho Business Intelligence Server – 3.6.0-stable
Microsoft IIS – version 7.0.6000

Now, for those who are familiar with setting up an SSO system, the next question will be a basic one:

Which SSO implementation did you use?

The SSO Setup

The one that we setup at work utilizes Microsoft Active Directory to authenticate users coming from the website.

While there are some documentation on the Pentaho Wiki on plugging in Pentaho into SiteMinder or CAS, less can be found when you search for Microsoft Active Directory. Which is a shame, because despite being a Linux guy myself, I have to admit that when configured correctly, their implementation of SSO, from the users’ perspective, works fairly well.

With MSAD, once it authenticated you via the usual login screen, it will push information through AJP protocol (1.3) to the Tomcat server that hosts the Pentaho biserver-ce.

For the sake of brevity and clarity, we won’t be discussing how to setup AJP to work with IIS.  Suffice to say that it uses one of the ISAPI Filter extensions called: isapi_redirect.dll to accomplish this.

The first thing to do is to modify the conf/workers.properties and conf/uriworkermap.properties.  The way these works is routing by URL pattern.  It should be self-explanatory to modify, contact me if you need more info.

Gotcha(TM) #1: On my Ubuntu development server, somehow Tomcat AJP listener isn’t really listening to requests coming via tcp (that’s TCP for IP4 clients), rather it waits on tcp6.  And this is not obvious either, especially when you use netstat.  Netstat will show tcp6 whether Tomcat is listening to tpc *and* tcp6 OR just on tcp6.

So now what to do? After searching for answers online, I came onto a tip to specify the actual address where the server is listening on.   Somehow this forces Tomcat to listen to both tcp and tcp6 on 8009 (AJP protocol).  To be specific: add an attribute address=”your.server.ip.address” to the <Connector> tag that configures the 8009 port in the Tomcat’s server.xml

After being sidetrack by this, I finally was able to receive AJP requests  from IIS to Tomcat which in turn dutifully re-routes them to /pentaho where the PCE lives.

The Switch

At this point, all the PCE can do is to throw a fit because it does not know what to do with the AJP request coming from IIS (i.e the user) plus it has no idea that there is an authenticated user information within the request.

So we need to initiate the switch from the pre-installed JDBC-based authentication/authorization to the one based on LDAP, of which the MS Active Directory is an implementation.

To do this, you can follow the information in the link that I’m about to give you. But, come back here after you read it because while it listed the steps, it does not give you a clue on what those modifications are really for.  Well, unless you are an LDAP and Spring Security -expert.

Here’s the link.

In summary, here are the list of files you need to touch and modify:

Under biserver-ce/pentaho-solution/system directory:

  • pentaho-spring-beans.xml – the big switch, this is where you tell Pentaho to use LDAP instead of JDBC authentication/authorization system.
  • applicationContext-security-ldap.properties – this file basically is the center of the modification, we will talk about this file in depth on the next section.
  • applicationContext-spring-security.xml – this is where ACL (Access Control Level) is setup at URL level.  Search for <property name=”objectDefinitionSource”>.  Scarily, the actual URL patterns and permitted roles are defined within a hardcoded CDATA block (!!).  What’s wrong with another .properties file guys?  All you need to do here is to substitute the default Pentaho roles such as Admin, Authenticated, etc. with the new ones from LDAP.
  • applicationContext-spring-security-ldap.xml – this is where the majority of the values in the above .properties file are being used.  As far as I recall, I didn’t change this file at all, which is always a good thing.
  • applicationContext-pentaho-security-ldap.xml – this file contains the two queries that populates the Pentaho UI when we select assign permissions to Users or Roles.  See “The Exception” section below.
  • pentaho.xml – this file governs who can do what to the solutions in the repositories.  All you need to modify in this file is replacing the default roles with the ones you define in LDAP.  IMPORTANT: Anytime you modify the default settings in this file, always drop these tables from the hibernate database: pro_acls_list and pro_files (in that order), then restart the biserver, this will rebuilt the two tables with the new default permissions for the solutions.

Under biserver-ce/pentaho-solution/system/data-access-plugin directory:

  • settings.xml – this file governs who can do what to the defined data sources (you know, the all important source of  data for your ad-hoc reports and cubes).  All you need to modify in this file is replacing the default roles with the ones you define in LDAP.  A #pentaho irc channel community member (pstoellberger) helped me out on this one.  Without his quick source sleuthing, there’s no telling how many more hours I’d spend figuring this out.

Under biserver-ce/tomcat/webapps/pentaho/WEB-INF directory:

  • web.xml – this file configures the pentaho web application.  Since we are plugging Pentaho into an existing enterprise application, we need to configure it to reflect this.  All you need to do is to make sure that this section of the file is properly defined (see highlighted part below):
    • <context-param><param-name>base-url</param-name>
    • <param-value>http://www.yourwebsite.entry.point.com/pentaho</param-value>
    • </context-param>

Now let’s talk about some of these modifications in more depth.

The Property File

applicationContext-security-ldap.properties to be exact.  This is the only property file that you need to modify for this purpose.

The values in this file are being used in three Spring-Security bean definition files.  To clarify, Pentaho uses Spring-Security to implement their authentication/authorization layer.  A wise decision that pays rather handsomely as you can see later in this article.

Let’s walk through these values:

  • contextSource.providerUrl – this should contain your ldap: URL, between you and your sysadmin, this shouldn’t be a piece of cake to get.  Example: ldap://ldapsrv.acme.com:389
  • contextSource.userDn – set the value to the distinguishedName (DN) of the read-only user that you use for accessing LDAP tree.  Example: CN=LDAP Searchdog,CN=Users,DC=acme,DC=com
  • contextSource.password – the password of the read-only user above

This section takes care of the basic connecting to LDAP server.  This is used in various places as expected.

Next, we’ll fill out info for single user search (for authentication):

  • userSearch.searchBase – this value points to the root of the LDAP tree where you want the search to commence from. Example: DC=acme,DC=com
  • userSearch.searchFilter – this is the LDAP attribute that will be matched against the supplied parameter (typically the user name).  Example: (sAMAccountName={0})  <– the {0} is where the parameter would be substituted.

Next, we’ll specify how to fetch the roles of a given user (setup on LDAP):

  • populator.convertToUpperCase – when this value is ‘true’ the roles coming from LDAP will be converted into all upper case.  Not sure what this buys us, but it’s important to be consistent.  Don’t set this to true and then forgot to capitalize the roles wherever it’s defined.
  • populator.groupRoleAttribute – which LDAP attribute held the roles. Example: cn
  • populator.groupSearchBase – same as the userSearch.searchBase above
  • populator.groupSearchFilter – specifies the condition for the search, that is using the username  to get the roles he/she is associated with
  • populator.rolePrefix – if you need a prefix, I haven’t found out why would I need one.
  • populator.searchSubtree – another boolean value that indicates whether to search into the LDAP subtrees or not.

Lastly, we give the proper info for searching available roles in LDAP.  This is an important query that will actually populate the Pentaho UI where we select Roles to assign permissions to certain Reports or Cubes (or ‘Solutions’ if we use Pentaho’s lingo).

  • allAuthoritiesSearch.roleAttribute – which LDAP attribute held the value for the roles.  Example: cn
  • allAuthoritiesSearch.searchBase – where you’d want to search to begin.  IMPORTANT: the way my LDAP server is organized, when this property is set to the root of the tree (DC=acme,DC=com), the subsequent pentaho code failed to populate the UI control that allows us to select these roles.  Only when I specify a subtree that has only the roles, would this work.  Example: OU=Some Subgroup,DC=acme,DC=com
  • allAuthoritiesSearch.searchFilter – this is the criteria that is shared by all the roles we want to pull from the LDAP server.  Example: (objectClass=group)

The Exception

One LDAP query that you may want to disable is the allUsernamesSearch.  This query is defined in one of the xml files modified, called: applicationContext-spring-security-ldap.xml.

The reason that it is a good idea to disable this, is just common security/access control practice, you do not assign permissions at the users level, you define permissions with associations to roles instead.

So let’s disable the query, the way to do it is to make sure that the definition of the Spring bean points to a class that has been programmed to do nothing.  It will look something like this:

<bean id=”allUsernamesSearch” class=”org.pentaho.platform.plugin.services.security.userrole.ldap.search.NoOpLdapSearch”
/>

What this is saying is when the UI that allows the admin to assign permissions are displayed:

The users selection box is empty, thanks to the NoOpLdapSearch class defined above.  This means you can’t assign permission to an individual user.  In 99% of the cases, this is what you want.

The Usage

The last step that needs to be done after all the configuration above, is to actually use the roles defined in LDAP at the appropriate places.

‘Consistency’ is the keyword here, once you have defined a set of new roles in MS Active Directory to be used with Pentaho, then you *must* substitute default Pentaho roles (Admin, Authenticated, etc.) in the aforementioned configuration files with the appropriate new roles.

I don’t see the point of belaboring on this as the application would be unique to your own authentication/authorization needs.  Just be aware that a single typo will bring the system to a halt.  Involving some kind of version control is highly recommended when modifying these files.

The most unexpected and quite amazing fact in this whole thing is that Spring-Security automatically handles the authenticated user information that was sent from IIS to Tomcat without any intervention on my side.

Lesser security libraries would probably require some property tweaking or custom-written filters to do this seemingly trivial but important step.   This to me has proven one of the reasons for the maturity of Spring as one of the few Java frameworks that is truly enterprise -worthy.

The Loose Ends

Some miscellaneous random bits of info that would have saved me some time and effort had I known them before I started on this task:

  • The log file for Pentaho is located in: biserver-ce/tomcat/bin/pentaho.log
  • To find out about problems with your ISAPI Filters, view the log files located where the extension .dll file is.  In my case it’s called isapi_redirect.log
  • Turn the log4j.xml logging level for spring-security to INFO or even DEBUG to follow what’s happening if the modifications do not seem to take effect.  This is quite obvious, but when you’re busy pulling your hair out, it’s easy to forget.
  • Don’t forget to turn it back to WARN or ERROR when your modifications *do* work.
  • Oh, and Pentaho Administrator Console is useless once you switched to LDAP, it is only configured to work with the JDBC user/role management.
Tags: , , , , , , , , ,

Lost Windows 7 password

Posted under Microsoft Windows articles on July 15th, 2010

Ever feel the anguish of having something your need or want so close yet you can’t get to it?

Today I did. I forgot the password to the only user on my three-weeks old laptop. Forgot as in no way to remember it, and the hint didn’t help either. I am sure it’s not one of my regular ones because I was trying to be cute and came up with a new word.

And it so happen that in MS-Windows world, if you don’t have administrator account enabled, and you forgot your user password, you’re toast. I guess its the same thing with UNIX as well, except in UNIX, you will always have a root (same as administrator) user account.

Now what?

My help came from a rather weird source. Youtube. Where a lot of people apparently had solved the same problem. Most of them are .iso files which can be burned into a CD-RW and you boot into the CD instead of the hard drive. Ironically almost all of them are built around some kind of Linux distribution.

I tried one called Ophcrack which attempted to actually decode the Windows password hash. Of course after cranking the CPU to 100% for 15 minutes, it failed to retrieve my password. I didn’t say I came up with weak passwords :)

Finally I tried what’s called the Trinity Rescue Kit (just google it, it’s better than my linking to a website that may not be there when you needed it). A 149MB .iso files which I burned into a CD-RW disc.  It gave me a command prompt after booting on my laptop.

I proceeded to type:
winpass -u [my_username] and I was presented with a couple of options, one of which is to clear the password. I selected it and less than one second later it announced that my password has been cleared. Holding my breath, I issued shutdown -r now and waited.

And Voila! my laptop booted right into the account without asking for the password. Whew! that was a relief.

NOTE: Had I been able to enable the administrator account, I may have been spared from this exercise, but as I found out later, my version of Windows 7 that came with the laptop does not provide a way to enable the administrator account. Strange!

So  my thanks to those ‘White Hat’-hackers who wrote the winpass utility and to those distro-jockeys who made it available in the .iso format.

And this time I remembered to create a password reset usb-keychain drive.  Without it, I’d be repeating this story all over again.

Tags: , , , , , , , , ,

Accessing Intranet Sites Remotely

Posted under Dev Best Practices, Linux articles on June 21st, 2010

If you are like me, maintaining a server at home is like a hobby.  There is a certain satisfaction to be able to install whatever we like without having to ask for anybody’s permission.

And sometimes thanks to our tinkering, we discover good solutions that are applicable to the task given to us at work.  Think about it as giving our employers a bonus.

One of the most important rules in running a server is to never expose unnecessary information publicly.  Want an example? how about an obvious one, your router’s administration application.  This should never be accessible from outside of your home network for an obvious reason, obviously.

But the benefit of this approach is also its own downside.  Consider the following (highly likely) case:

You need to urgently change a setting on your router, while you are not at home.

Yep, you can’t.  Not without doing something extra anyways.

And that something extra is SSH tunneling.  Now, there are at least two ways that I know of on how to accomplish this.  For simplicity’s sake, let’s talk about one now:

If you are on a Windows machine, get yourself Putty and follow the steps on this website while replacing the forwarded port numbers with the one that you are trying to use.

http://www.cs.uu.nl/technical/services/ssh/putty/puttyfw.html

Basically, you are telling Putty to tunnel port number x from a machine within your home network to port number y on the machine where you are working on.

So in the above example, to access your router’s admin application, you can setup the tunnel from port 80 on your router’s ip-address, to let’s say port 8080 on the local machine (where you are working from).

In the UNIX world, that translates into the following:

ssh -L 8080:your_router_ip:80 your_username@your_home_server.net

After successfully loggin into your home serve remotely, you can start a browser (on the machine where you’re working on currently), then go to localhost:8080 and voila! you’ll see your router’s administration application as though you are at home.

Pretty handy, eh?

Running GUI Linux programs in Windows

Posted under Dev Best Practices, Linux, Microsoft Windows articles on June 3rd, 2010

Now why on earth would you want to run GUI Linux programs on Windows?

If you were to ask me that a week ago, I would not be able to come up with a good business scenario.  Other than the fact that I did it at home, because my main computer is a Windows box and my development stuff is on my Linux server.

But at the moment I am stuck with an old laptop at work — which, being a good corporate laptop, of course is running Windows XP — because the giant four-letter computer company has messed up the shipment for my new computer.

So to make lemonade out of a lemon, I have enlisted two powerful free software that can get me out of this jam: PuTTY and Xming.

The idea is to use a powerful Linux machine to host and run the GUI programs that I need to work with from my very resource-limited old laptop (Pentium 4 Mobile with 512 MB, anyone?).

What applications? you say…

Let’s pick a big one, how about the Eclipse IDE.

Step one:

You install Xming and fire up XLaunch.  Out of the three modes that Xming offered: 1) multiple apps in one window, 2) each app running in a separate window, 3) fullscreen mode, mode 2 is one that works the best in this scenario.

So I went ahead and created an .xlaunch file that basically fired up Xming in the background, ready to accept X-window connections on the old laptop. NOTE: Don’t forget to check a box that says “No Access Control” otherwise your Linux application will be rejected when it tried to connect to Xming.

NOTE: Xming has a file called X0.hosts (that’s X and zero, not the letter ‘o’) which is usually located in the Xming install directory.  This file may need to be modified to include the IP address of the linux box.  But only if it connects to Xming via another network domain (other than the primary one).

For example, I setup a virtual linux guest on my windows 7 machine, the local network subdomain looks like 192.168.56.x unlike the default network domain which is 192.168.1.x.  The latter is where Xming waits for connections from.  Therefore I need to add 192.168.1.101 (the IP address of the linux machine) to the end of X0.hosts file (a line by itself), then Xming will work as expected.

A subnote, on my machine, windows 7 forces me to become administrator user to edit this file.  Although not as elegant as UNIX’s sudo command, the Switch User facility in windows makes this not as painful as it could have been.

Next, I fired up PuTTy.  I followed the steps to configure a PuTTy session that will allow X-forwarding and all that good stuff on this website.

Once you are connected to the Linux server, in which I already have downloaded the appropriate Eclipse distribution, ready to go.

All I need to do once I connected via PuTTy is:

  1. Set the DISPLAY environment variable to point to the laptop’s Xming instance: export DISPLAY=192.168.x.x:10.0 (the 192.168.x.x being the laptop’s internal IP address, of course).
  2. Run Eclipse IDE: ./eclipse &

And voila! Eclipse is running on the Linux server with every bit of its UI tunneled via X onto the old laptop.  Of course the performance depends on the network connection, but it is surprisingly snappy on a typical 100Mbps Gigabit Ethernet.  And I am talking about Eclipse loaded with a large project.

Now if this article does not make you run to the storerooms in the back and pull out those old laptops and give those out to your developers, I don’t know what else will :)

Tags: , , , ,