Current Analytical Architecture

Priyanka Bhagat
5 min readJul 17, 2020

Analytics architecture refers to the systems, protocols, and technology used to collect, store, and analyze data. The concept is an umbrella term for a variety of technical layers that allow organizations to more effectively collect, organize, and parse the multiple data streams they utilize.

When building analytics architecture, organizations need to consider both the hardware — how data will be physically stored — as well as the software that will be used to manage and process it.

Analytics architecture also focuses on multiple layers, starting with data warehouse architecture, which defines how users in an organization can access and interact with data. Storage is a key aspect of creating a reliable analytics process, as it will establish both how your data is organized, who can access it, and how quickly it can be referenced.

Structures like data marts, data lakes, and more standard warehouses are all popular foundations for modern analytics architecture. On the user side, creating easier processes for access means including tools like natural language processing and ad-hoc analytics capabilities to reduce the need for specialized workers and wasted resources. When seen as a whole, analytics architecture is a key aspect of business intelligence.

How can I Use Analytics Architecture?

No matter what kind of organization you have, data analytics is becoming a central part of business operations. The fast-rising amount of data your multiple touch points collect means that using a simple spreadsheet is quickly becoming unfeasible.

Analytics architecture helps you not just store your data but plan the optimal flow for data from capture to analysis. Understanding these steps can give you a better idea of your hardware and logistics needs and clue you in on the best tools to use.

One important use for analytics architecture in your organization is the design and construction of your preferred data storage and access mechanism. Many companies prefer a more structured approach, using traditional data warehouses or data mart models to keep data more organized and easily sorted for access later.

Others prefer to keep data in a single storage structure such as a data lake, which comes with its own benefits but makes data slightly less accessible and organized. Regardless, your analytics platform architecture will largely define how your organization interacts with data, as well as how you gain insights from it.

There is need of workspace to Data Science projects which are basically built for experimenting with data,with flexible as well as agile data architectures.

Number of organizations still posses data warehouses which give excellent support for reporting in traditional way and simplified data analysis activities but problems arise when there is need of more robust analysis.

Fig 1 . illustrates typical data architecture as well as various challenges it present to data scientist and other users who are trying to implement advanced analysis.This section examines the data flow to the Data Scientist and how this individual fits into the process of getting data to analyze on projects.

Fig 1. Typical analytic architecture

For the purpose of data sources to be loaded into the data warehouse , there is need that the data should be well understood , normalized with the suitable data type definitions and in structured format.Although this kind of centralization enables security, backup, and failover of highly critical data, it also means that data typically must go through significant preprocessing and checkpoints before it can enter this sort of controlled environment, which does not lend itself to data exploration and iterative analytics.

As a result of this level of control on the EDW, additional local systems may emerge in the form of departmental warehouses and local data marts that business users create to accommodate their need for flexible analysis. These local data marts may not have the same constraints for security and structure as the main EDW and allow users to do some level of more in-depth analysis.However, these one-off systems reside in isolation, often are not synchronized or integrated with other data stores, and may not be backed up.

Once in the data warehouse, data is read by additional applications across the enterprise for BI and reporting purposes. These are high-priority operational processes getting critical data feeds from the data warehouses and repositories.

At the end of this workflow, analysts get data provisioned for their downstream analytics.Because users generally are not allowed to run custom or intensive analytics on production databases, analysts create data extracts from the EDW to analyze data offline in R or other local analytical tools. Many times these tools are limited to in-memory analytics on desktops analyzing samples of data, rather than the entire population of a datasets. Because these analyses are based on data extracts, they reside in a separate location, and the results of the analysis — and any insights on the quality of the data or anomalies — rarely are fed back into the main data repository.

Because new data sources slowly accumulate in the EDW due to the rigorous validation and data structuring process, data is slow to move into the EDW, and the data schema is slow to change.

Departmental data warehouses may have been originally designed for a specific purpose and set of business needs, but over time evolved to house more and more data, some of which may be forced into existing schemas to enable BI and the creation of OLAP cubes for analysis and reporting. Although the EDW achieves the objective of reporting and sometimes the creation of dashboards, EDWs generally limit the ability of analysts to iterate on the data in a separate nonproduction environment where they can conduct in-depth analytics or perform analysis on unstructured data.The typical data architectures just described are designed for storing and processing mission-critical data, supporting enterprise applications, and enabling corporate reporting activities. Although reports and dashboards are still important for organizations, most traditional data architectures inhibit data exploration and more sophisticated analysis. Moreover, traditional data architectures have several additional implications for data scientists.

● High-value data is hard to reach and leverage, and predictive analytics and data mining activities are last in line for data. Because the EDWs are designed for central data management and reporting, those wanting data for analysis are generally prioritized after operational processes.

● Data moves in batches from EDW to local analytical tools. This workflow means that data scientists are limited to performing in-memory analytics (such as with R, SAS, SPSS, or Excel), which will restrict the size of the datasets they can use. As such, analysis may be subject to constraints of sampling, which can skew model accuracy.

● Data Science projects will remain isolated and ad hoc, rather than centrally managed. The implication of this isolation is that the organization can never harness the power of advanced analytics in a scalable way, and Data Science projects will exist as nonstandard initiatives, which are frequently not aligned with corporate business goals or strategy.All these symptoms of the traditional data architecture result in a slow “time-to-insight” and lower business impact than could be achieved if the data were more readily accessible and supported by an environment that promoted advanced analytics. As stated earlier, one solution to this problem is to introduce analytic sandboxes to enable data scientists to perform advanced analytics in a controlled and sanctioned way. Meanwhile, the current Data Warehousing solutions continue offering reporting and BI services to support management and mission-critical operations.

I hope you found this blog informative enough. Got a question for us? Please mention it in the comments section and we will get back to you.

--

--