Data Warehouse Training — Episode 1 — What is Data Warehouse

Data Science Earth
Data Science Earth
Published in
6 min readMar 18, 2021

What is Data Warehouse?

Data warehousing is a technology that aggregates structured data from one or more sources so that it can be compared and analyzed for greater business intelligence. They store current and historical data in one single place that are used for creating analytical reports for workers throughout the enterprise. It is a process of transforming data into information and making it available to users in a timely manner to make a difference.

You many know that a 3NF-designed database for an inventory system many have tables related to each other. For example, a report on current inventory information can include more than 12 joined conditions. This can quickly slow down the response time of the query and report. A data warehouse provides a new design which can help to reduce the response time and helps to enhance the performance of queries for reports and analytics.

How Data Warehouse Works?

A data warehouse may contain multiple databases. Within each database, data is organized into tables and columns. Within each column, you can define a description of the data, such as integer, data field, or string. Tables can be organized inside of schemas, which you can think of as folders. When data is ingested, it is stored in various tables described by the schema. Query tools use the schema to determine which data tables to access and analyze.

Data may be:

  1. Structured
  2. Semi-structured
  3. Unstructured data

Data warehousing makes data mining possible. Data mining is looking for patterns in the data that may lead to higher sales and profits.

Who needs Data warehouse?

Data warehouse is needed for all types of users like:

  • Decision makers who rely on mass amount of data
  • Users who use customized, complex processes to obtain information from multiple data sources.
  • It is also used by the people who want simple technology to access the data
  • It also essential for those people who want a systematic approach for making decisions.
  • If the user wants fast performance on a huge amount of data which is a necessity for reports, grids or charts, then Data warehouse proves useful.
  • Data warehouse is a first step If you want to discover ‘hidden patterns’ of data-flows and groupings.

What Is a Data Warehouse Used For?

Here, are most common sectors where Data warehouse is used:

Airline:

In the Airline system, it is used for operation purpose like crew assignment, analyses of route profitability, frequent flyer program promotions, etc.

Banking:

It is widely used in the banking sector to manage the resources available on desk effectively. Few banks also used for the market research, performance analysis of the product and operations.

Healthcare:

Healthcare sector also used Data warehouse to strategize and predict outcomes, generate patient’s treatment reports, share data with tie-in insurance companies, medical aid services, etc.

Public sector:

In the public sector, data warehouse is used for intelligence gathering. It helps government agencies to maintain and analyze tax records, health policy records, for every individual.

Investment and Insurance sector:

In this sector, the warehouses are primarily used to analyze data patterns, customer trends, and to track market movements.

Retain chain:

In retail chains, Data warehouse is widely used for distribution and marketing. It also helps to track items, customer buying pattern, promotions and also used for determining pricing policy.

Telecommunication:

A data warehouse is used in this sector for product promotions, sales decisions and to make distribution decisions.

Hospitality Industry:

This Industry utilizes warehouse services to design as well as estimate their advertising and promotion campaigns where they want to target clients based on their feedback and travel patterns.

Steps to Implement Data Warehouse

The best way to address the business risk associated with a Datawarehouse implementation is to employ a three-prong strategy as below

  1. Enterprise strategy: Here we identify technical including current architecture and tools. We also identify facts, dimensions, and attributes. Data mapping and transformation is also passed.
  2. Phased delivery: Datawarehouse implementation should be phased based on subject areas. Related business entities like booking and billing should be first implemented and then integrated with each other.
  3. Iterative Prototyping: Rather than a big bang approach to implementation, the Datawarehouse should be developed and tested iteratively.

Here, are key steps in Datawarehouse implementation along with its deliverables.

Best practices to implement a Data Warehouse

  • Decide a plan to test the consistency, accuracy, and integrity of the data.
  • The data warehouse must be well integrated, well defined and time stamped.
  • While designing Datawarehouse make sure you use right tool, stick to life cycle, take care about data conflicts and ready to learn you’re your mistakes.
  • Never replace operational systems and reports
  • Don’t spend too much time on extracting, cleaning and loading data.
  • Ensure to involve all stakeholders including business personnel in Datawarehouse implementation process. Establish that Data warehousing is a joint/ team project. You don’t want to create Data warehouse that is not useful to the end users.
  • Prepare a training plan for the end users.

Why We Need Data Warehouse? Advantages & Disadvantages

Advantages of Data Warehouse:

  • Data warehouse allows business users to quickly access critical data from some sources all in one place.
  • Data warehouse provides consistent information on various cross-functional activities. It is also supporting ad-hoc reporting and query.
  • Data Warehouse helps to integrate many sources of data to reduce stress on the production system.
  • Data warehouse helps to reduce total turnaround time for analysis and reporting.
  • Restructuring and Integration make it easier for the user to use for reporting and analysis.
  • Data warehouse allows users to access critical data from the number of sources in a single place. Therefore, it saves user’s time of retrieving data from multiple sources.
  • Data warehouse stores a large amount of historical data. This helps users to analyze different time periods and trends to make future predictions.

Disadvantages of Data Warehouse:

  • Not an ideal option for unstructured data.
  • Creation and Implementation of Data Warehouse is surely time confusing affair.
  • Data Warehouse can be outdated relatively quickly
  • Difficult to make changes in data types and ranges, data source schema, indexes, and queries.
  • The data warehouse may seem easy, but actually, it is too complex for the average users.
  • Despite best efforts at project management, data warehousing project scope will always increase.
  • Sometime warehouse users will develop different business rules.
  • Organisations need to spend lots of their resources for training and Implementation purpose.

The Future of Data Warehousing

  • Change in Regulatory constrains may limit the ability to combine source of disparate data. These disparate sources may include unstructured data which is difficult to store.
  • As the size of the databases grows, the estimates of what constitutes a very large database continue to grow. It is complex to build and run data warehouse systems which are always increasing in size. The hardware and software resources are available today do not allow to keep a large amount of data online.
  • Multimedia data cannot be easily manipulated as text data, whereas textual information can be retrieved by the relational software available today. This could be a research subject.

Data Warehouse Tools

There are many Data Warehousing tools are available in the market. Here, are some most prominent one:

  1. MarkLogic:

MarkLogic is useful data warehousing solution that makes data integration easier and faster using an array of enterprise features. This tool helps to perform very complex search operations. It can query different types of data like documents, relationships, and metadata.

https://developer.marklogic.com/products/

2. Oracle:

Oracle is the industry-leading database. It offers a wide range of choice of data warehouse solutions for both on-premises and in the cloud. It helps to optimize customer experiences by increasing operational efficiency.

https://www.oracle.com/index.html

3. Amazon RedShift:

Amazon Redshift is Data warehouse tool. It is a simple and cost-effective tool to analyze all types of data using standard SQL and existing BI tools. It also allows running complex queries against petabytes of structured data, using the technique of query optimization.

https://aws.amazon.com/redshift/?nc2=h_m1

Here is a complete list of useful Datawarehouse Tools.

Alperen Kezay

--

--