The internet and large-scale technological development have caused the explosive growth of data in today's world. Enterprise decision-makers want to study the relationship between data, tap the hidden features of data and analyze and explore the deeper levels of data.
But data sharing is impossible between different databases of enterprise, due to multiple databases in the same enterprise, the integration between databases has big challenges especially in terms of merging and storage of huge data
The operational databases may be scattered around Microsoft SQL Server database or Oracle database, the data warehouse goal is to extract multiple databases and accumulate the comprehension from hundreds of gigabytes of data for processing and according to the required format, process into necessary conversion, cleaning, and finally loaded the data into the warehouse.
According to IBM researchers (Barry Devlin and Paul Murphy) “Data warehouse is a subject-oriented, integrated, relatively stable data collection that reflects historical changes, used to support decision-making in management”
- By nature, a data warehouse is used to complement the decision-making for management, and it is used as analytics of business data operation, but it is different from the operational database of the enterprise.
- A data warehouse is an effective integration and management of multiple heterogeneous data sources into one repository, it is organized in terms of historical data, and in a data warehouse, there is no need for transactional modification of the data.
After the emergence of data warehouses, the information needs of enterprise has moved away from relational databases to a decision support system. This decision support system is actually what we refer to as Business Intelligence (BI).
As compare with data warehouse data mart can be understood as a “small data warehouse”, it is not depending on heterogeneous databases but only on a single instance of operational database and the scope of data is not wide enough.
The data mart is specifically targeted at a specific business operation(sales, production) users of the data mart quickly find the data they need, in data mart you only need to design and build database tables, populate the database tables with relevant data, and decide who can access the data set.
Data marts can be divided into two types:
- The first one is independent data mart, the ETL architecture, and source of the database belong to a single entity.
- The second one is dependent data mart, in this type of data mart the incoming data arrived from other sources mainly from the data warehouse. Datamart can simply provide users with a subset of information of the data warehouse.
It is much like the flowing water in a natural state, data flows from the multiple source system to this lake, users can obtain specific data they want, verify, mangling and other BI task will be carried out outside the data lake.
The data lake can develop in a way that the following features will be implemented
- It will import all data from the source systems, there is no data loss from source systems.
- The data is stored in its original state without conversion of original data.
- Data lake schema accurately meets data analysis requirements.
- A data lake has locks, control, and governance
Operational Data Storage
Operational Data Storage (ODS) is a database that is used for transactional processing data, the data in ODS is mainly the raw data, the data from ODS always move out to the data warehouse or data mart for further processing. In ODS, you can query data, and only have access to the latest development in business operation