Is your data running hot and cold?
Perhaps it should be.
A warehouse is a place to store things. Yes, but not forever. A business wants to store goods for an appropriate amount of time and then distribute them for sale. A business that stores goods in their warehouse but never uses those goods and continually adds more, will very obviously go out of business quickly.
What is an appropriate amount of time? That depends on what you are warehousing and how it is warehoused.
Blue cheese, for example, has a short shelf life and needs to be stored cold. So it needs to be stored in a climate controlled environment for as short as time as possible.
Bricks don’t care much about the environment and don’t go off, they can be warehoused outside.
Gold bars don’t care much about the environment and don’t go off either but they are not going to be stored in the same place as bricks.
Data storage also needs different approaches
There are different types of data.
- Some needs to be stored carefully and accessed quickly, like cheese
- Some can be stored anywhere and will get used eventually with manageable lead times, like bricks
- Some data we may be keeping for a rainy day but it is valuable (or sensitive) so needs to be highly secure, like gold
What we see
Way too often we see a data warehouse system gradually evolve into a data archiving system.
Generally this happens by stealth but sometimes it becomes a case of needing to keep old (cold) data and the data warehouse becomes the default solution.
We also see that a data warehouse can become like the junk drawer at home. ‘Stuff’ that may or may not ever be needed is being kept there just in case. To continue the analogy if something is eventually needed from the junk drawer it is easy to forget that it was ever put there so you search for it in other places or source it again. A junk drawer can become so full that it overflows, the other drawers do not open or the floor of the drawer cannot handle the volume or weight and it breaks. Keeping ‘just in case’ data in a data warehouse can have similar effects.
This can lead to:
- Higher resourcing to manage the data warehouse
- Degraded data warehouse performance causing user frustration and perhaps even perpetuate data silos
- Higher infrastructure costs
- Unnecessary complexity within the data warehouse structures
How would you know if your data warehouse may need some attention?
Some telltale signs to look out for:
- You hear that it takes a long time for regular reports to run
- You are being asked for budget to upgrade infrastructure too regularly
- Changes to the data warehouse structures are long and expensive projects
What could you do about it?
The DWS Analytics Practice would suggest you taking a hot and cold data approach.
Examples of Hot Data
- New data, straight out of the source systems
- Regularly used data
- Reference data such as products, students, customers, suppliers etc
Examples of Cold Data
- Point in time data that was important last week/month/quarter/year
- Detailed data, aggregates are needed for reference data but we don’t need the detail anymore
- Historical regulatory data
Create a set of rules to categorise hot and cold data. Move cold data off to cheaper storage. Free up resources on the data warehouse(s). Make relevant data easier and quicker to find. Focus on the data that matters.
A few questions you could ask your team.
- What is our data archiving strategy?
- Do we monitor or have a method of knowing when data was last accessed?
- How much does it cost to store the data in our data warehouse? How much is that rising every year?
We worked with a large utilities company to help define hot and cold for their organisation. Cold data was then moved from expensive in-memory storage to much cheaper disk storage. Not only did the organisation realise significant savings on infrastructure costs but also increased the performance and lowered the complexity of gaining insights from the hot data.
A large financial institution had many databases using old, unsupported database software. This was costing too much money to manage, support and the upgrade cost was prohibitive. Being highly regulated they needed to gain an understanding of what data was being used then a way to archive the data in an audit-able, reversible way. A visual interface was built where authorised users could identify cold data to move to a data archive system. Any data moved was logged and the entire process was integrated with the organisations formal change management processes. The interface could then restore data and any relationships if ever needed. The result was saving through infrastructure while being able to restore any data if required for business or regulatory reasons.