Data Best Practices

5 Requirements of a Truly Useful Reports

Posted by Will Gunadi on January 31, 2016  /   Posted in BI and Custom Development, Business, Data Best Practices

Suppose you have followed my suggestion in my previous post where I presented the different levels of Analytics usage that any company can find themselves in: Impaired, Localized, Coordinated, Company-wide, and Competitive.

First, let’s put out again the definition of Analytics that we’re discussing here:

Analytics is the effort to discover and communicate useful patterns within data

You are now aware of the potential power of analyzed data. Here are 5 requirements that will help you to evaluate your existing reports to get them in shape for the next level of Analytics

#1 Reports Must Be Worth The Cost

Measure the time it takes to compile and generate a report. If the time is significant, calculate how much are you paying out of the company’s revenue for the resource to generate that report.

If the time is not significant, measure how long does it take to update the report the next time it’s needed. Add the two measurements together then calculate the cost as the above.

If both measurements added are still insignificant, than you have one of those “magic” report, whose benefits may need to be re-evaluated.

Do you have a budget for company-wide reporting? If the answer is no, it’s time to rethink the way your company does reporting.

#2 Reports Must Be Timely

A truly useful report must be up-to-date when the report consumer needs it. Sounds obvious, doesn’t it?

Unfortunately a lot of truly useful reports must be compiled from multiple data sources. Generating an always up-to-date report is manageable for one source. But how about four? Six?

Each of these sources usually has it’s own timing of when is the best time to retrieve the data, also if you have 5 data sources (example: 3 spreadsheets, and 2 databases), most likely you also have multiple ways to reliably retrieve it.

Without some kind of automation, how many hours of human resource must be dedicated to this?

#3 Reports Must Be Role-Sensitive

A single set of data may be processed and analyzed in many different ways. Before generating a report for a particular role within the company, be it the CEO, Managers, or even a Field Operator, it is prudent to “walk a mile in that person’s shoes.”

If done correctly, each of the role in the company will have their own specific reports that can truly help the person in the role to see more of what is happening and to anticipate and adapt to changes.

Bonus point if you have a screen for the person in the role to input changes in policies/regulations and process, so those can be used to tailor the updates of the said reports. Hint: You need a well designed Data Warehouse to accomplish this.

#4 Reports Must Be Digestible

What do you call a 23 Excel spreadsheets per day reporting? Some may call it impressive,  productive, or “Wow!”

But the receiver of these reports call it “useless.” Being the receiver of a mountain of non-aggregatedindigestible data does not make someone productive.  It makes the person look for an EXIT sign.

Producing a lot of reports must result in an equally significant increase in visibility and decision-making quality and speed. To do this, reports (or even better: Dashboards) need to be consolidated at the appropriate level.

Does your company have a process in place for this?

#5 Reports Must Have A Purpose

Reports that are truly useful have names, which indicate their sole purpose in life. Here are a few examples:

  • Bad Profits -report (Truthfully, I’d be impressed if you currently have this report always available to you as a business owner or CEO)
  • Unnecessary, Untracked, Unknown Expenses -report
  • High Maintenance, Low Revenue Customers -report
  • High Revenue, About-To-Leave Customers -report
  • Financial “Leaks-Busting” -report
  • Low Productivity Team -report
  • Fuel Consumption On Billable Vehicles or Machines -report
  • Where Is The Industry Going and How About Us -report
  • Time To Renegotiate Vendor Contracts -report

In your next Management Meeting, ask your executives and managers to come up with the list similar to the above.  And compare it with your existing reports, if many of the reports don’t have a name and don’t serve a singular, clear, actionable purpose, question their existence.

Be ruthless, life is too short for useless reports.

At nextCoder

We make new reports or tweak existing ones to meet the 5 requirements above. Our clients love truly useful reports that we compile and generate for them because through those reports, they see not only what is going on, but also where to go next. 

In summary, we save our client’s money and increase their company-wide efficiency by:

  1. Replacing expensive human resources with automated electronic “minions” (computer programs).
    So what happened to the human resource? They are now happier (because they don’t do tedious tasks better suited for computers anyway) and they contribute more significantly by doing tasks that is fit for human brains: Analytics.
  2. Using our highly tuned and constantly improved DataWarehouse platform that can serve multiple companies at the same time. Or if you like exclusivity, we’ll clone one just for yours.
  3. Design, and consolidate numerous and diverse data sources into one readable Dashboard. Yes it could be done, and No, it’s not as easy as it looks.

We invite you to contact us if your company is ready to make your data work harder for you.

Why Visibility is Useful

Posted by Will Gunadi on May 18, 2015  /   Posted in BI and Custom Development, Business, Data Best Practices

What is Visibility?

In my previous post I’ve established why visibility is very important in running and growing a business. And how the lack of visibility if both a vicious cycle that can result in the decline of a business, and that visibility is both ways, top-down and bottom-up.

Now let’s explore further what exactly do we mean by visibility? In other words, what needs to be visible and to whom?

The short (and useful) definition of visibility:

Being able to see at anytime what is going on in the business that is relevant to the question or task at hand

From Top to Bottom

Executives have the task of making strategic decisions that should affect the business positively. From this vantage point, visibility is being able to see what is going on to make not only good decisions, but timely ones.  In practicality, this translates to knowing what trends are in effect within the business. In other words where are things going based on the events that happens at certain periods of time.

NOTE: Sampling of data taken at certain intervals is a common way to get this kind of visibility. The cross-section of the company where the sample is taken can also provide important insights.

From Bottom – Up

Employees doing daily tasks are the main source of data in which the sampling can be performed. Every aspects of the tasks that can be measured should be measured. Visibility here means more than just being able to perform the task, but also to know why a task is required and how the task affect other areas across the company.

The more successful a company in providing this visibility, the more it can work as a unit rather than disparate departments busy protecting their own interests.

This is truly what is meant by the term Business Intelligence.

What do we do with visibility?

When we can see clearly what is going on in the business, we can decide which action to take and the timing of which the action is taken. So the main motivation to increasing visibility is to know what to do when certain things transpire in the business that needs attention, way before it becomes a problem.

More importantly, is the long term effect of visibility:
The increase in the overall performance of the company in conducting business. At the extreme level, this could mean becoming a game-changer in the industry. I don’t need to belabor why this is important, it’s what makes a company valuable, to anyone who deals with it.

Enough with Theory!

Let’s take an example, let’s say company A found out that the revenue is increased over the same period previously but so does the cost. Let’s use the concepts we learned above about visibility to find out why.

In this situation, the first step is to find out what caused the revenue increase. Let’s say that the increase of revenue is caused by more new customers purchase the products.

Then we take measurements across departments in the company on how much each department either contribute or is affected by the increased revenue (new customers).

NOTE: This is a good example of looking at things differently, instead of focusing on the Sales Department vs Accounting, we start to get the feel of what is taking place at other departments because of this known fact (increased revenue).

Upon analyzing the result of the measurement, we noticed that the Customer Service calls spiked up during the period and the procurement department also are busy sending out part replacements.

I turned out the new customers are not equipped with enough knowledge about the product, resulting in higher refund rate (dissatisfaction) or parts replacement due to damage.

Knowing that, the company tasked the marketing department to come up with materials that can be incorporated with the product packaging, so the customer has to go through them in order to enjoy the product.

So What?

The above scenario sounds like an easy and typical problem to solve. But consider this:

  1. What raised the red-flag and triggered the “investigation” ?
  2. How did the company found out quickly that the revenue increase is caused by new customers?
  3. How long would it take to take the measurements out of each department in the company? Without a conscious and concentrated effort to gather the data, this can take months or longer
  4. How long would it take for Accounting to approve the budget for the new marketing materials?

Interesting questions, isn’t it? And it’s a set of questions that is rarely talked about.  Let’s consider some answers:

  1. What triggered the investigation is the Executives having visibility to what happened to both revenue and cost over a period of time. Contrary to what most believe, this is not the job of the accounting department.  Without a conscious and concentrated effort, no one would have time to do this
  2. The company quickly can pinpoint the source of the revenue increase because it knows the customers. Only companies who took the time to know their customers via analyzing their data would be nimble enough to use the insight when the need arises
  3. The reason this company were able to analyze the situation is because of the visibility across different departments. Each of the department has indicators that everyone can see when their tasks (and cost) spiked up
  4. Lastly, the reason this company were able to address the problem quickly is because the accounting department has visibility to what the marketing department is capable of in terms of designing the new packaging, and the ability of the production department to dole it out.

As we can see clearly, without visibility, this whole process would be much slower if it happened at all.

In Short

Now let’s make this closer to home. Were the above story to happen within your company (or the company you work for), what would be the outcome?

Do you have enough visibility?

In a form of a list, this is Visibility for starters:

  1. Periodical sampling of data to discover trends
  2. Transparent cross-department measurements
  3. Customer tracking and scoring
  4. Employees performance scoring
  5. Vendors tracking and scoring
  6. Documented Time

Visibility is something that we don’t miss until bad things start to be noticeable. Companies who are proactively putting in place a system — Business Intelligence tools such as a Data Warehouse, Dashboards, fall into this category — that allows for visibility to be achieved, will find out that the more they use it, the more effortless for them to not only react, but also to anticipate changes and “roll with the punches” so to speak.

Next post, we’ll see what is the make up of a good system that provides the 6 aspects of Visibility listed above.

Change The Way You See Your Business

Posted by admin on October 02, 2014  /   Posted in BI and Custom Development, Business, Data Best Practices

What do we tend to do when we are too focused into something? That’s right, we tend to lose sight of what the big picture looks like, and lacking that crucial perspective can be costly when you are in charge to run and grow a business.

As the saying goes “Don’t miss the forest for the trees,” as business owners and managers, we are to be vigilant in maintaining a clear vision and a good sense of where our business currently are.  Only when we managed to do so, then we would be in the position to determine where to go next and how to get there.  Furthermore, it helps us to focus on the right detail, the ones that has a big impact on the business.

The good news is, there are a plethora of tools and techniques out there that can help us in this regard.

In this article, let me try to illustrate this concept using a simple example:

mini_theater

Take a look at the chart above. What did you notice? Bubbles, different shapes, and colors.  For those of us who are data-geeks, this is one variation of a bubble chart.

Now the story behind this chart: Once upon a time, there was an imaginary theater company, who performed in their own facility, three times a week, on Monday and Thursday afternoons, and on Saturday mornings (represented on the vertical axis of the chart).  Each performance slot is indicated on the horizontal axis.

So if we do a mapping between the time-slot and the day, we would find a single performance — a bubble and a number — representing the available seating for that particular performance.  In other words the seating capacity of the theater minus the ones that are already sold for the performance.

Now that you know what the chart is about, take a moment and look at the chart again and let’s see what jumps out at you.

All done? Now put yourself in the position of the business owner of this theater company and let’s compare notes:

  • The size of the bubble intuitively indicates the seats left (which means tickets unsold)
  • The color of the bubble indicates the type of seatings available (main seatings have more capacity than the balcony or the box)
  • The larger the bubble, the worst it is from business perspective, again, intuitively, you want to shrink the bubble to 0 available seats, which means a full-house performance
  • What about those large negative numbers on some of the bubbles? Those indicate overbooking.  Something that must be dealt with before the actual performance, otherwise two or more annoyed audiences will be the result — which in today’s world will likely translate into negative tweets and facebook posts (bad for our reputation)
  • The chart represents one week, it would be easy to produce one for next week, the next two weeks, etc.  In fact, the chart can be an effective planning tool for promoting the performances based on which time slot still have a lot of seatings available
  • Once we have multiple of these charts, we start to see the forest, which weeks of the months is the best time to promote a particular performance.  Is this the time to introduce a Season’s Pass ticket? How far are we from having to get a new building?

Do you see what we’ve accomplished? With a single chart, as a business owner, we can see:

  1. What is going on
  2. Where to focus our attentions to
  3. What bad things that would happen
  4. Which options are available for us
  5. How to create a strategy to do something about it
  6. What does the big picture look like

That is the power of Data Analytics and Visualization.

What About Real-world Scenarios?

Here is the kicker, although I  created a fictitious story for the chart, in actuality, it is based on a real-world visualization that we produce for one of our clients.  It is a real business tool which takes the concept of charting to a whole new level.  That is, we made it a user interface where the business owner can actually reserve a seat, or upgrade it right there on the chart, without having to go through extra screens.

Our client loves this view of their business because it gives them visibility way beyond hundreds of numbers in a spreadsheet that they are used to deal with.  And it make efficient use of both their time and strategic thinking, so they can arrive to better decisions, faster.

Ever wonder why military generals over the ages plan their next moves by analyzing a map?

Let us know what you think about this, and let’s work towards changing the way we see our businesses.

-WG

Business Dashboards – The Next Generation

Posted by admin on February 13, 2014  /   Posted in BI and Custom Development, Business, Data Best Practices

Dashboards Today

Unfortunately, most vendors that provide dashboard software have done little to encourage the effective use of this medium. They focus their marketing efforts on flash and dazzle that subvert the goals of clear communication.

They fight to win our interest by maximizing sizzle, highlighting flashy display mechanisms that appeal to our desire to be entertained. Once implemented, however, these cute displays lose their spark in a matter of days and become just plain annoying.

An effective dashboard is the product not of cute gauges, meters, and traffic lights, but rather of informed design: more science than art, more simplicity than dazzle. It is, above all else, about communication.

- Stephen Few, Information Dashboard Design, O’Reilly, 2006

Amazingly this comment is still pretty much true even today. But it doesn’t have to be. The next generation of effective dashboards are closer to a mixing panel of a music studio than plain, static dashboards we see today. And we at nextCoder are going to make sure that you business leaders are well-informed about it.

Recap: What is a Business Dashboard?

Nowadays you see fancy dashboards in cars that rival a desktop computer, almost. But what does the most basic dashboard in your car actually do? It tells you, the driver — in real time, at least these four information:

  1. How fast are you going (the odometer)
  2. How far can you go (tank full of gas or almost empty)
  3. Alerting you to pending breakage (weak battery, engine oil empty, engine needs checking)
  4. Alerting you of what is going on (turning signal, headlamp indicator)

Running a business, just like driving a car, requires your full attention in real time, but unfortunately, a business does not come with a built-in dashboard the way cars do.  Either you have to build one yourself, or have someone build it for you.

And just like a car dashboard that is connected at all times to the sensors that feeds it with data, a Business Dashboard is connected to the data sources located either within your business or coming from outside sources.

The information that you receive from a Business Dashboard is actually pretty similar:

  1. How fast is the business growing? (is it growing in the right direction?)
  2. How long can the business last given the current situation?
  3. Are we paying too much to our vendors/suppliers?
  4. Are we serving our customers the best we could?
  5. And many, many more

But … believe it or not, as useful as knowing all of these are, that list merely covers the basic usage of a Business Dashboard. If we stop at this level, we are missing the full potential of what the next generation of dashboards can do for us.

What is brewing?

Thanks to the ever-increasing popularity of game-changing web technologies such as jQuery, HTML5/CSS and the growing list of charting and/or visualization libraries, we enjoy the power of data visualization unlike any other computing era before.

The new way to develop dashboards allows us to produce dashboards that will rival web-applications in terms of information flow. Gone are the days of static, read-only dashboards. Say “Hello” to the new generation that takes interaction to a whole new level:

  • Built in mapping: With the numerous and Geocoding API available for us to use, it is unthinkable that a dashboard should be without one. Most businesses could benefit from geographically-mapped data. Imagine being able to visually see where your products are being purchased,  or your technicians on their service routes, or your suppliers to optimize material or component shipments.  And many more uses.
  • User Inputs: Ever seen a dashboard that is not read-only? If yes, you are in a good company. More and more executives, managers, customer support personnel ask that they are able to punch in data in real-time. Why do they have to switch to another application to do that? We at nextCoder agrees.
  • User-specific Business Rules: Lets face it, a business owner or CEO has his or her own “rules” that allow them to determine whether the business is doing okay or is it floundering.  These “rules” are not for everyone to see and for a very good reason: Panic prevention (just kidding… a little bit). But the fact remains, if a dashboard cannot even contain user-specific rules, then we are shortchanging the users. Plain and simple.
  • External Data: A lot of smaller businesses assume that just because their data volume is not gigantic, they do not have any use for aggregated data analysis. This was true in the past, not anymore. Today, there are volumes of data about any businesses, but it does not originate within the business itself, rather, the vast social-media network. You could be surprised at what your customers broadcast about your product or service to their friends and family, good and bad. And for the sake of your business, monitoring it is a good course of action.

These are just the tip of the iceberg when it comes to what a Business Dashboard is capable of serving with the current technology. Our goal is to deliver these features to our clients with each dashboard we build for them. In the next blog entries, we’ll have a peek on how to do just that.

Pentaho 5 CE Hands-on Review

Posted by admin on January 16, 2014  /   Posted in BI and Custom Development, Data Best Practices

One of the most exciting software release towards the end of 2013 is the Pentaho 5.0 CE (Community Edition) which was rolled out on November 18th 2013. While the EE (Enterprise Edition) was released a couple months prior, the CE version has always been my favorite both to work on and especially to be part of the community who is always full of new (and wonderful) ideas, and actually have the brain power to realize those.  Truly one of the most interesting Open Source communities.

As a BI Consultant, I had several requests to review this new release, so without further ado, let’s take a look.

As usual, I get the zip files for each Pentaho BI Suite components from here. This is what my folder looks like this when I was done downloading:

  • pad-ce-5.0.1-stable.zip – Pentaho Aggregation Designer (missing as of the time of this review, no idea where it went)
  • psw-ce-3.6.1.zip – Pentaho Schema Workbench
  • biserver-ce-5.0.1-stable.zip – Pentaho BI Server
  • pdi-ce-5.0.1-stable.zip – Pentaho Data Integration (Kettle and Spoon)
  • pme-ce-5.0.1-stable.zip – Pentaho Metadata Editor
  • prd-ce-5.0.1-stable.zip – Pentaho Report Designer

Unzipping any of these zip files will “install” the component. Simple as that.
I haven’t had the time to look at PAD or PME, so we’ll review this in the future. For now let’s start with PRD.

The new Pentaho Report Designer has a very convenient and useful item on the Wizard which you see when you started the report-designer.sh (or .bat on Windows) script. It’s called “What’s New”.

It’s basically a report that we can Preview and it listed all the new features in this 5.0 release; very handy to read about the improvements. What piqued my interest especially is they seem to improve the creation of interactive HTML reports, which now can serve links to other reports within Pentaho.  Maybe a new way to serve content that is somewhere between reports, dashboard, and wizard pages.

The popular Kettle (or Spoon or PDI) increase the number of steps including one that I have been waiting for: OpenERP Input and Output. Speaking of OpenERP, I need to contribute the custom OpenERP step that we developed last year into the community.

I’m also eager to try out the MongoDB steps as I started to use it for our projects.  I’ll have more to say about these two wonderful tools in upcoming articles.  These two are big enough to have their own reviews.

Pentaho BI Server

But the biggest changes are truly visible in the Pentaho BI Server itself. After unzipping the biserver-ce-5.0.1-stable.zip, dive into biserver-ce director and issue ./start-pentaho.sh if you are on UNIX or start-pentaho.bat if you are on Windows.

By starting the BI Server from this location, the starting scripts already set the memory allocation and other environment parameters to more reasonable values than the ones that comes default with Apache Tomcat.

After starting the server and wait for a while — or if you are familiar with Tomcat logging features, on UNIX do:

tail -f biserver-ce/tomcat/log/catalina.out

Which will allow you to see if the server starts correctly or failed with errors. On Windows, use the Tomcat Start/Shutdown application to see the logs. When you see the log files stops scrolling, bring up a browser (on the same computer) and try to hit the Tomcat server by entering http://localhost:8080/pentaho if you use the default settings. And you should see:

puclogin_custom_photo

Yes, that’s what our version of the Pentaho User Console (PUC) looks like after a couple of customization steps:

  • Change the login image:
    – user@server:~/pentaho5/biserver-ce/pentaho-solutions/system/common-ui/resources/themes/crystal/images$ mv ~/your-own-similarly-sized-image.jpg ./login-crystal-bg.jpeg
  • Change Pentaho to nextCoder
    – user@server:~/pentaho5/biserver-ce/pentaho-solutions/system/common-ui/resources/themes/images$ mv ~/your_logo.png puc-login-logo.png
  • Change the wordings of the Login page:
    – user@server:~/pentaho5/biserver-ce$ vi tomcat/webapps/pentaho/jsp/PUCLogin.jsp

Gone are the usual ‘joe’ user, replaced by ‘admin’ with the same default password ‘password’. Use these to get in and you’d be greeted by the Home screen:

home_screen

Again, with some modifications, you can tailor the Home screen to suit your purposes. In this case the customization step is:

  • Change the content of Home
    – beruin@yamato:~/pentaho5$ vi ./biserver-ce/tomcat/webapps/pentaho/mantle/home/content/welcome/index.html

If you notice, gone is the compartmentalized panes of the old PUC, replaced by a much better-flowing (plenty of white space) minimalistic-style layout.

Another paradigm switch is the central navigation (it says ‘Home’ in the above screenshot). When you click on it, a dropdown will be displayed showing the available mode.  The Home -mode is what you see above, next is the Browse File -mode that looks like this:

browse_file

This is another departure from the file-based pentaho-solution repository to this JCR-based one. What is JCR? Java Content Repository is a database-based content (files) repository specification that is implemented by among others Apache Jackrabbit project, which is the one being used here by Pentaho.

What does this all mean to users? In a way, it has its advantages being a database-based repository in terms of better control of metadata and versioning of the files without sacrificing ease of use, but which also means that we have to use a plugin if we want to synchronize this repository with a file system so we can Version Control our files on our own.  It remains to be seen if this switch will yield its fruit down the line.

One question for the Pentaho team: Why can’t I select multiple files and do some actions with them?

Speaking of plugins, which is source of productivity in the platform, the next mode we’ll talk about is the Marketplace -mode:

marketplace

In version 4.8 and before, we have to install plugins such as Saiku, CDE, CDA, CDF, etc. manually either by using the ctool-install.sh script or by unzipping files at the right folders and hope it’ll work.

The new Marketplace -mode provides a more organized way to manage plugins and their versions.  Although you still have to restart the server manually after installing or upgrading these plugins, it is still miles ahead and more importantly, just a month or two after release, we started to see plugins written by developers outside of Pentaho, which is wonderful and in-line with the spirit of the community.

Next up, is the Opened -mode, which is basically a mode where it retains all of the files we are working on (both editing or opening).  This mode is somewhat similar to the new Microsoft Office paradigm (starting with Office 2010).

The Scheduled -mode is an improved user interface to schedule ETL runs:

schedules

A new feature introduced is the ability to define block-out time, within which scheduled ETL will not be run.  This is useful for scheduled downtime or maintenance for the host servers.

The last mode is the Administration -mode:

admin_screen

This is the answer to “Where is PAC?” The old Pentaho Administration Console is gone, it is now reborn as this mode. I can’t tell you how many times (with the previous version) I received raised eyebrows or dumbfounded-look when I had to explain that you have to run another server just to create a new user or assign roles. This is definitely a very welcome improvement!

Now, how about some real work. The plugins now take center stage as Pentaho CE matures as a real platform. Old favorites like CDE:

cde_dashboard_edit

Improved with the much more professional-looking “Crystal” theme as the default. You could still switch to the old “Onyx” theme if you like.

Another good tool is Saiku Analytics, returned also thankfully:

saiku_graphsThe charting ability of Saiku Analytics has been improved tremendously. I  almost couldn’t believe my eyes when I see the various charts glide along visualizing the data effortlessly.

A promising newcomer in the Analytics tool called Pivot4J is also available to install through the Marketplace -mode:

pivot4j_editing

The Pivot4J has one thing that has been missing in all of the Pentaho Analytic tools, the ability to render Aggregates at the last row or column.  You have no idea how many times this little feature is asked by my clients.  Yes, business people loves their totals, those helped them to make better decisions.  So good job for this Pivot4J team!

Is there any negatives? Yes, the charting in Pivot4J is not intuitive to me. Take a look at the above screenshot, you see four columns. When you click the interface that will generate the bar chart representation of the table, what would you expect? I expect one bar chart, with four bars each representing the columns.  What did Pivot4J gave me? Four bar charts. Why?? And I don’t see any ways to merge them or change those in any way.

Summary

In summary, I couldn’t be happier with this new 5.0 release of the Pentaho CE. There is enough new features here that warrants companies to consider upgrading their Data Warehouses.  What is the most exciting trend for me is the third-party plugins that starts to become available through the Marketplace.  This can signal a real growth in quantity and quality of what is already one of the most useful BI suites in the market.

So to Mr. Pedro Alves and his team, big kudos, thank you, and good job. 2014 is looking like another stellar years for Open Source BI, starting with Pentaho 5.0 CE.

What You Should Expect from Your Data Warehouse

Posted by admin on October 01, 2013  /   Posted in BI and Custom Development, Business, Data Best Practices

When it comes to the benefits of having a Data Warehouse (DW from here on), most of our clients are already cognizant of its critical role as part of a business system.  What many still don’t realize is that there are good DWs and not so good ones.  Of course everyone is aware of their existence (good vs not so useful), but not everyone has the time thinking of the ways to distinguish between those two.

A Good Data Warehouse Goes (Far) Beyond a Database

The number one question that I get when I talked to anyone about the services we provide to our customers is the difference between a Database and a Data Warehouse.  A lot of people associate a data warehouse with storing data (just like a physical warehouse)  — and that’s about it.

Truth is, a good DW serves the business in at least five areas in addition to hosting business data:

  1. Automatically apply business rules
  2. Automatically generate reports (both on-demand and periodical)
  3. Automatically notify parties of data events of interest
  4. Allows data owners to look into their data in multiple dimensions
  5. Allows for data error detection, investigation, and corrections

When you are building a DW, the above functions must be present in the planning, implementations, and testing phases.  Otherwise, you’d be left with basically just a database.

A Good Data Warehouse Grows into Maturity

In the next blog entry, I’ll be discussing in specifics, the different levels of DW maturity. A lot of misconception out there regarding how to assess the reliability and usefulness of a DW. Most people expect a DW to become mature at the end of the initial implementation project. This is not realistic given the typical complexity of meta-data, data, and business rules surrounding it. The correct way to proceed with a DW-building project is to realize that there are certain phases that a DW goes through which brings it along the path into maturity.

A Good Data Warehouse Is Redundant

An experienced BI consultant would build one or more copies of the DW that can be switched at any given time.  Not only will this provide a good development and testing environment for analytic outputs (reports, dashboards, ETLs), but it would also

A Good Data Warehouse Highlights Data Problems

Due to its proximity with people who did the accounting, and forecasting in the organization, a good DW is usually the first place where these people notice data inconsistencies.   A good way to take advantage of this characteristics, is to put in place monitoring ETLs that runs periodically, executing queries that is designed to check the consistencies of data.

A Good Data Warehouse Reduces the Load on the Transactional System

A lot of the systems that maintains the transactional data input from users are inundated with processes that shouldn’t belong there in the first place.  These processes typically does checking, applying business rules, and monitor the validity of the data, which takes away resources from what Transactional Systems should be mainly about: Collecting data and preserving its integrity.

Now the follow up question is: How do we know which part of the Transactional System should be offload to the DW?

A good boundary rule-of-thumb is whenever a process can be executed against any data in batch-mode. This is an indication that it should be taken off the transaction system and moved to the DW. And by the way, it is not a taboo for a DW to pump data back into the Transactional System when a valid use-case can be made.

In Summary
There are many other ways to assess a DW to see if it’s “good” or “not good enough,” but all of them revolves around the relationship between the DW and the existing Transaction Systems and the other flip side of the coin, the decision maker team. A good DW should always bring clarity and bridge the communication barrier caused by incongruent view of data held by different parts of the business. Hopefully this can get you started in the correct line of thinking as you embark on your DW projects.

Seven Tips to Successfully Start Your BI System

Posted by admin on July 31, 2012  /   Posted in BI and Custom Development, Business, Data Best Practices

Given a mountain of data that comes from different sources in different flavors, it could be a daunting prospect to get them all in line.  Here are seven simple steps that could help you wade through the first phase of implementing a successful BI process that will not only solve the immediate problems, but also grow with your company.

  1. Trace your steps backwards, begin with the reports that you have to have in order to run your business.  Do not start with the data itself.  A lot of BI implementations make this mistake, which will soon overwhelms anyone involved and contributes to a slow-start.
  2. Identify two types of reports: Those that are very important for strategic decision making (annual bank loan reports, ISO certifications, etc.) and those that are important for tactical everyday decisions (daily status report, daily manufacturing job schedule, warehouse inventory, customer returns, etc.)
  3. Split your attention into the two types of report you identify above.  Pick one report from each type, and work on both at the same time.  By working on two different sets of data types simultaneously, you will gain more insight on how to transform the existing data to satisfy both types of reports.  Failing to do this, will cause you to backtrack a lot, and that means wasted effort.
  4. Use of your Time Dimension table. This is the lookup table consisting of different facts about one measure of time (usually a day).  In a lot of BI projects, people neglected to use this very useful table, opting to create complex SQL queries, which ends up being error prone, difficult to maintain, and a big time waster.
  5. Be mindful of your database’s idiosyncrasies.  Know what NULL values mean in a field, especially the ones that will be part of a calculation. Know what it means to say BETWEEN two dates.  And no, Non-SQL databases are not immune to these quirks.  They are just as susceptible to unfamiliar and untested usages.
  6. Prepare two identical databases that contain your BI tables. Setup two instances of your ETL building each database separately. This way you can switch back and forth when you are testing or having to re-run the ETL in the middle of a busy day.
  7. You will not catch every bug, and there is no perfect the BI system. Test, and re-check your tests, that’s why you have two identical databases.  BI is an ongoing process, therefore it is important to come up with a good system and an implementer who understands this.

By following these simple steps and precaution, you will minimize a lot of backtracking in the middle of the project.  And more importantly, by starting the project the right way, you will avoid burn out and frustration that will have more adverse impact on the quality of the data at the end of the process.

Tackling Expiration Dates in Pentaho

Posted by admin on December 16, 2010  /   Posted in BI and Custom Development, Data Best Practices

The title may be a little misleading because this article encompasses a wider scope than just Expiration dates.  But I deliberately use that title because Expiration date is probably the most commonly found case of date-triggered state (or status) changes.  Of which this article is about.

Almost all business entities have a state associated with them at any given time.  Sometime these states are explicitly recorded via some kinds of Status field (column) for instance, sometimes they are implicit.  But all of them undergoes transitions from state-to-state that are governed by some business rules.

This concept is extremely important in the world of BI where transactional information are processed further to give out meanings — or even worse, to make decisions.  And in this article, we’ll see one way to represent it.

Pre-requisite: This article requires some familiarity with MDX as it is being utilized in the Pentaho BI stack (Mondrian OLAP engine, to be exact).

The Schema

Every MDX queries are run against the schema, so let’s talk about some ground rules that are not always easy to find online:

  1. Refresh this string of terminology in your mind: [Dimension.Hierarchy].[Level].<member>.<member-property>.  This is the terminology and hierarchy of a dimension in Mondrian-flavored MDX (which would be the ones we use in this article).
  2. A schema may have more than one Date/Time dimensions.  I didn’t know this, and had to find out about it through necessity.  This actually is quite a useful feature.
  3. A schema may have multiple dimensions that shares one single database table.  Sometimes you *have to have* these in order to display different columns of the table in the result using CrossJoin().  SIDE NOTE: I had an interesting conversation regarding this with Roland Bouman, who regards this as a peculiar limitation that is not imposed by any technical reasons.
  4. Members of two Time dimensions can be used as a condition specifier in functions that accept it.  For example: Filter( [TimeDim1].[Level].Members, [TimeDim].CurrentMember.Name > CurrentDateMember([TimeDim2], “<some-formatting>”).Lag(30).Name)
    IMPORTANT: Notice that the comparison is made between the ‘name’ property of the two Time dimension members.  You can’t seem to make the comparison at the member level directly.

The Problem At Hand

Let’s say your Customers have different states that they transition from and to based on some business rules.  For example customers that hasn’t ordered any gadgets from your company for more than two years will be given INACTIVE status — which by the way, is a red flag for your sales force that it’s time to go after them again.

Let’s say you want to show this in your salesperson’s Pentaho dashboard, so they can be reminded of this fact everyday they log into your system.

One Possible Solution

One way to tackle this is to create an ETL script that will attach a date column to the Customer dimension table.  Why the customer dimension? why not on the fact table itself?  Two reasons:

  1. Customer dimension may be used by more than one fact tables, if we can find out when a certain customer switches status using the dimension, we only have to do this once.
  2. Customer dimension has the correct granularity.  The status changes for a customer, not at the orders level, not at any other levels.  So the dimension is it.

After we attach the status change date information onto the Customer dimension, we then can use it in an MDX query.  Which makes sense if we are talking about any reports, cubes, or dashboards.

How do we use it? consider the following query:

select NON EMPTY {[Measures].[Sales]} ON COLUMNS,

NON EMPTY Crossjoin([Customer].[Name].Members, Filter([Status.Switch].[Date].Members, [Status].CurrentMember.Name > CurrentDateMember([Date.Timeline], “[“”Date.Timeline””].[“”Date””].[yyyymmdd]”).Lag(60.0).Name) ON ROWS

from [Sales]

Things to notice in the above MDX query:

  • The goal is to display customer names whose status has changed in the past 60 days.
  • The date when the customer status changes is recorded in the Customer dimension table, and defined in the schema as [Status] dimension, the hierarchy name is ‘Switch’, and the level is [Date], which is encoded as a string in YYYYMMDD format.  NOTE: I chose this format because it can be sorted ordinally.  That’s why it works for <, > or = conditions.
  • The powerful Filter() is used to evaluate the condition.
  • I defined two dimensions: [Customer] and [Status] that are keyed against the same customer dimension table.  This way I can use the [Status] dimension in the Filter() and use the [Customer] dimension to display the name of the customer.
  • It is very important to specify the correct Level in the first parameter to the Filter function, otherwise you’ll never get a match.  Read this sentence again, it took me almost one full day to figure this out.
  • I use the equally powerful CurrentDateMember() to get the member of yet another dimension [Date] that has the same format as the [Status] member.  To get to know this very useful function, read this primer by Diethard Steiner.
  • Then I use the .Name property of the two “YYYYMMDD” members in the condition.  This Member.Name facility is extremely powerful and flexible, but only after you understand what is going on.

The Result

Goal accomplished, now I have a list customers whose status has changed in the past 60-days.  A very useful list to be shown to a salesperson.  Of course I still have to wire this query into a dashboard or report, but that’s beyond the scope of this article.

And now you have a full know-how to tackle various reporting and query needs associated with date-sensitive state changes.  Again, Expiration date is just one of the most common instance of this problem.  In reality, there are tons of other scenarios that will fit this one.

How can I download these gazillion links?

Posted by admin on October 06, 2010  /   Posted in Data Best Practices, Linux

Don’t you hate it when you find yourself needing to download hundreds of files, by clicking on each HTML link, one-at-a-time?

For example, some publishers decided that it is cute to chop their big document/manual/manuscript/research paper/etc. per chapter into their own PDF files.  Not too cute for us who wants to save them all, is it?

Or your ace graphic artist in the Philippines refused to learn Zip and has instead given you access to the directory containing hundreds of images that you need for tomorrow’s demo to the client.

I bet one of your first thought is to view the source of the web page and copy out all of the links, right? That would work, but it still involves a hunt-and-peck method fishing out all the links amidst the hairy HTML/CSS/JS code in the page.

Sure it’s easy if you are a UNIX command line hacker who can spit out the exact grep/awk/perl script to automate this in a heartbeat.  But in this journal we want an easier approach that most of us sophisticated (read: lazy) geeks would want to use.

Well, here is *one* easy way to solve this problem:

Use Firefox (version 3.6 at the time of this writing) and install a nifty add-on called Link Gopher.  After installation and a restart, down at the bottom right corner of the browser there will appear a small word called Links (next to the window resizer).

Go to the web page that listed all the links.

Right-click that Link word, and select Extract All Links. Voila! all of the links are now appear in a neat, clean web page.  Simply copy and paste all the links that you want to download into a text file, and run this command on a terminal:

cat links_list.txt | xargs -P 4 wget

This command will pipe the content of links_list.txt (that is the name of the file you saved the links into, by the way) into xargs, which in turn will fire off 4 parallel processes of ‘wget’ command, each being handed one of the links (NOTE: That -P parameter is pretty slick).

That’s it! now all you need to do is wait.  The output of wget processes downloading each link is also fun to watch.

When all the processes finished, you’d be left with all the files you need, and a text file containing the list of the file in URL format.  Now that’s pretty easy, wouldn’t you say?

Accessing Intranet Sites Remotely

Posted by admin on June 21, 2010  /   Posted in Data Best Practices, Linux

If you are like me, maintaining a server at home is like a hobby.  There is a certain satisfaction to be able to install whatever we like without having to ask for anybody’s permission.

And sometimes thanks to our tinkering, we discover good solutions that are applicable to the task given to us at work.  Think about it as giving our employers a bonus.

One of the most important rules in running a server is to never expose unnecessary information publicly.  Want an example? how about an obvious one, your router’s administration application.  This should never be accessible from outside of your home network for an obvious reason, obviously.

But the benefit of this approach is also its own downside.  Consider the following (highly likely) case:

You need to urgently change a setting on your router, while you are not at home.

Yep, you can’t.  Not without doing something extra anyways.

And that something extra is SSH tunneling.  Now, there are at least two ways that I know of on how to accomplish this.  For simplicity’s sake, let’s talk about one now:

If you are on a Windows machine, get yourself Putty and follow the steps on this website while replacing the forwarded port numbers with the one that you are trying to use.

http://www.cs.uu.nl/technical/services/ssh/putty/puttyfw.html

Basically, you are telling Putty to tunnel port number x from a machine within your home network to port number y on the machine where you are working on.

So in the above example, to access your router’s admin application, you can setup the tunnel from port 80 on your router’s ip-address, to let’s say port 8080 on the local machine (where you are working from).

In the UNIX world, that translates into the following:

ssh -L 8080:your_router_ip:80 your_username@your_home_server.net

After successfully loggin into your home serve remotely, you can start a browser (on the machine where you’re working on currently), then go to localhost:8080 and voila! you’ll see your router’s administration application as though you are at home.

Pretty handy, eh?


We serve businesses of any type and size
Contact Us Today