6 Steps to creating your own data warehouse

Leke Seweje
4 min readApr 29, 2019

--

Essentially, a data warehouse is a large data pool containing data from various operational sources such as applications, functions, departments, sensors, etc. usually for the purpose of analysing this larger data set for analytics, studying patterns, digging information and top level decision making.

For organisations/departments that have administrative roles, a data warehouse is a very important tool as it helps to converge and organise data in a way that it is useful for monitoring and evaluation that leads to intelligent management decision making, proper and cost-effective allocation of resources, organizational direction, sales forecasts, growth benchmarking, etc.

So how can we develop such a useful tool?

  1. Understand the problem

The development team must first understand and define a clear problem statement that will guide what solution will be developed and how it will be developed. What data need to be made available, the organisation and transformations necessary to be done on data, etc. Only with a clear set of requirements like this can a good result can be achieved at the end of the day.

2. Identify the data sources/marts

A data warehouse typically pulls data from various sources (a.k.a. Data Marts). You must identify all the necessary sources of data that will contribute to provide the data you need to achieve your goals and pick the necessary data points/elements from them. Data sources can be of any type — other databases (SQL/NoSQL), applications, social media, surveys, sensors/IoT, Excel/CSV files, operational forms, etc. If the data is needed, it should be fed into the warehouse.

3. Create the data model

Since all the data sources and data elements are now defined, it is time to create a central database for all the elements to form our warehouse. A database model illustrates all the entities and/or objects that will go into the data warehouse and their properties. A good data modelling tool will also help to engineer the model into a database schema in your RDBMS of choice.

You create the diagram of the entities/objects and the relationships between them in the modeller and export to your database to set things up. Choose a tool that can easily integrate or generate the schema SQL for the RDBMS that you will be using.

For example, Oracle SQL Developer Modeller can directly create schemas in your Oracle database, MySQL Workbench can do same for MySQL. A more general purpose modeller is Erwin which integrates with almost all popular databases.

4. Extract, Transform, Load (ETL)

So now we have identified the data sources and data elements on the one hand and the warehouse database on the other. How then do we get the data into the database for analysis. The process of doing this is called Extract-Transform-Load (ETL). We extract the data from the sources and load into the warehouse database.

Extraction is more easily done with APIs and data files that can be imported into an ETL tool such as Talend. You connect/integrate data elements to pull automatically from all sources at intervals and directly feed this into the database. Hence, the ETL tool connects the data sources and the database and loads the data from the sources into the database.

5. Documentation

This is more operational than technical. A successful data warehouse has to be a functional part of an organization’s operations and evolve as the business and data sources evolve too. Hence a good documentation of how things were set up, policies and conventions for further development is essential to ensure continuity and easy maintenance.

6. Business Intelligence(BI)/Analytics

BI is the primary derivative of a data warehouse. At this stage, all the data you need to analyse are in the database and now you need to create charts, grids, tables, etc. to visualize the data for making informed decisions. The visualisations required are usually defined as a part of your problem statement in Step 1. However, a number of tools are worthy of mention to help with this task depending on your environment, configuration and budget e.g. SAP BI, Oracle BI, Pentaho, PowerBI, Tableau, etc. You could also develop a custom one if you so prefer.

Conclusion

The above steps give much simplified details of each stage in creating a data warehouse but understanding these steps and tools necessary at each stage will start you well up in the direction of developing a reliable data warehouse that can help with strategic and reliable decision making in your organization. One wrong decision can make a company lose millions of dollars so having a data warehouse that helps with decision making is a really good idea.

For more information, you can contact me at sewejeolaleke[at]gmail.com.

Thanks for reading.

--

--