When it comes to Data Modelling you cannot go past the contributions of Ralph Kimball and Bill Inmon. There are numerous articles/white papers that go to great detail in dissecting their modelling techniques. There are serious fanatics on both camps. Rather than getting caught up in the debate, this post gives a snippet of both Kimball and Inmon’s model and let you choose which model is right for you next BI project.
The basis of this post is the illustration shown in Figure 1, pay special attention to the definition of Data Warehouse and Datamarts in both these models.
- Kimball’s model follows a bottom-up approach. The Data Warehouse (DW) is provisioned from Datamarts (DM) as and when they are available or required.
- The Datamarts are sourced from OLTP systems are usually relational databases in Third normal form (3NF).
- The Data Warehouse which is central to the model is a de-normalized star schema. The OLAP cubes are built on this DW.
- Inmon’s model follows a top-down approach. The Data Warehouse (DW) is sourced from OLTP systems and is the central repository of data.
- The Data Warehouse in Inmon’s model is in Third Normal Form (3NF).
- The Datamarts (DM) are provisioned out of the Data Warehouse as and when required. Datamarts in Inmon’s model are in 3NF from which the OLAP cubes are built.
Strengths and Weakness
Both these models have their own strengths and weakness. Kimball’s model is more scalable because of the bottom-up approach and hence you can start small and scale-up eventually. The ROI is usually faster with Kimball’s model. Because of this approach it is difficult to created re-usable structures/ ETL for different datamarts. On the other hand Inmon’s model is more structured and easier to maintain while it is rigid and takes more time to build. The significant advantage of Inmon’s model is because the DW is in 3NF; it is easier to build data mining models.
Both Kimball and Inmon models agree and emphasise that DW is the central repository of data and OLAP cubes are built of de-normalized star schemas.
In conclusion, when it comes to data modelling, it is irrelevant which camp you belong to as long as you understand why you are adopting a specific model. Sometimes it makes sense to take a hybrid approach.