Wednesday, July 18, 2012

What Do You Prefer - Inmon or Kimball?

I was asked that question during an interview for a position on a client project using Microsoft BI tools.  The client interviewer asked “Which do you prefer -Inmon or Kimball?”  

The question was easy for me to answer.  I could talk about how the Kimball data modeling technique called for using fact and dimension tables, star schema, conformed dimension to link different fact tables, relied on data marts, and was considered a bottom up approach. On the other hand, the Inmon data modeling technique called for a more top down approach using enterprise wide data warehouse using normalized tables. 

The client interviewer pressed on.  “Which would you choose?”   

I thought to myself.  I never really considered one over the other.  They both seemed like acceptable approaches to accomplish the same thing. 

I was first exposed to dimensional modeling and the Kimball technique in graduate school and have used it on a number of projects since then.  Building OLTP cubes in IBM Cognos Transformer and Microsoft SQL Server Analysis Services both seemed to favor using the Kimball technique.  On a few projects I used the Inmon technique.  These were typically projects I inherited or were large, very complex data models.            

I started off saying “It will depend on the project, the client......” and was abruptly cut off.  

“Which one do you prefer?” asked the client interviewer in a challenging tone.

I had to pick one. I said something like “I used both techniques in the past, but would favor the Kimball method”.

The client interviewer responded “Good!” and moved on to the next questions. 

Today in 2012 vendors have more products that use column oriented database technology that are within the price point of most companies.  Microsoft has implemented the column store index in SQL Server 2012 and uses column store technology as the engine behind Power Pivot for Excel and SharePoint 2010, and SQL Server Analysis Services in tabular mode.  Oracle has Endeca which supports ultra quick searches of structured and unstructured data.  Endeca uses a column oriented technology for its MDEX Engine.   

With column store technology so available in today’s products and CPU speed and memory size for servers continues to grow at mind numbing rates, it seems we are approaching the point that CPU and memory is not a primary constraint for many projects that have a dataset size of less than 1 terabyte. 

I began to think that I may not have the same response to the client interviewer if I was asked today .  The Inmon technique seems to have merit when considering products that use column store technology.    The Kimball star schema may not be entirely necessary with column store technology.

We also may be looking at a new approach entirely.   Perhaps a simplified variation of the Inmon technique will evolve over time.  I can see merit in modeling a few related entities that contain denormalized (repeated attributes) tables.  In data sets under 1 terabyte this approach may be something to consider as it may be faster to develop and deliver a BI solution initially. It may also help a company be more agility by supporting quicker changes and enhancement requests.

Tuesday, July 17, 2012

What is Predictive Analytics?

I wanted to pass on four articles by Alex Guazzelli I found on the IBM - DeveloperWorks site that gave a nice overview of predictive analytics that I found interesting and easy to read.

Part 1: What is predictive analytics?

Part 2: Predictive modeling techniques

Part 3: Create a predictive solution

Part 4: Put a predictive solution to work