Data as a Service

Keerthipriyan
Walmart Global Tech Blog
5 min readNov 13, 2020

A Data lake holds a vast amount of data in its native format. Data Engineering teams cleans and enrich the raw data into the structured data and use it across the business for ad hoc analysis or used by the Data scientists for Machine Learning purposes . Further these data flows through the different downstream teams ,transformed and stored in its own Infrastructure and various applications are built on top of it based on the business requirements.

Image by : https://medium.com/@keerthipriyan

Though,most of the Key Performance Indicators and Insights are common across the applications, the downstream teams are storing the data in their own infrastructure and they have to follow the same set of business rules, governance and compliance that are applied on the source to meets all of the required criteria. This may lead to an additional cost and resources constraints to procure, maintain and follow all these processes.

Data and the process are always evolving and any changes such as adding a new data asset or modifying the existing data asset or changes to the business rules , should be communicated and implemented across the all the teams. The source team has to ensure the backward and forward compatibility of the changes as all the teams might not implement these changes at the same time. It often delays the whole process as we have to support older versions till all the the teams adhere to the new changes.

Data as a Service (DaaS) provides a solution by leveraging the software as a service model and offers the data and Insights as an API. It utilizes its own infrastructure to process huge volume of data and provides the insights with low latency. It abstracts the usage of data from the cost of a specific software environment or platform and centralizes the quality, security, compliance and governance of the data. It provides unified way to access and control the data.

Image by : https://medium.com/@keerthipriyan
Data as a Service Design

Let’s take a look at the benefits of Data as a Service

Consistent Business Rules

In an organization, the data stewards will be owning a data asset and responsible for governance of the data and defines the business rules and its access controls and those are applicable to all the other teams consuming the data.

Let’s take an example of sales metric which is consumed by other teams and the same business rules are applicable across to maintain the data consistency. If any updates happens to the reporting structure of sales, then all these teams has to implement the changes and should be always in sync with the source team . More often the rules change,it requires an additional effort and time.

Data as a service, wraps up these business rules in the API and provide better governance so that the data will be consistent across the applications. It also provides details on the underlying business rules and capabilities to add further rules as per the application requirements.

Reduced Data Duplication and Infrastructure cost

Most of the teams does an ETL from Big data systems to relational databases such as Azure SQL , MariaDB for low latency reads . It leads to multiple copies of the data across the teams and it might have an impact on performance as data volume increases day by day. Further the data has to be indexed to allow faster reads but it will slower the writes and impacts ths SLA. The cost of maintaining the infrastructure is huge and the business requirements nowadays are expecting to process larger volume of data with faster performance and it oftens leads to the upgradation of the infrastructure which involves effort and time.

Data as a service reduces data duplication across the domains and can be operated with low Infrastructure cost. It runs on top of software as a service and provides data and insights based on pay per use. With the advent of Microservices Architecture and container based loads , the teams can consume these API and use it in the applications. Most of the cloud vendors are also providing storage API running on HTTP/2 which can be used for ad hoc analysis.

Accelerated response and Real time data

Data as a Service leverages the acceleration layers like Apache druid,clickhouse etc., to provide low latency responses . It also leverages distributed SQL query engines like Presto or cloud data warehouse for ad hoc analysis. The Near Real Time data can be consumed by to the end users without waiting to load and enrich them in various repositories . It provide a way for processing the big data,with better performance.

Data Asset Development

As data as a service centralizes the governance and compliance, the data stewards can focus on value creation from the data asset and derive Insights that helps the business. In addition to that maintaining the metadata information , user documentation at one place provides better understanding of data with seamless Integration to the consumers. It speed up the process of putting the data into action.

Descriptive Analytics to Prescriptive Analytics

Data as a service not only supports, low latency response or ad hoc analysis , it is more flexible for prescriptive analytics. The critical business decisions are based on real time predictions. There as so many examples such as fraud deduction , sales forecast, product offerings, product pricings etc., and all of them uses API. Now most of household are using Virtual assistants, Natural language Processing and IoT. The data inflows keep increasing day by day and data as a service provides capabilities to process the data and quicker response thus improving customer experience.

Cloud Native

Cloud native, containerized workloads deployed on public clouds such as Microsoft Azure, Google cloud platform provides Reliable, scalable and maintainable API. Cloud native applications has tendency to keep up the performance under various load conditions by scaling horizontally and release the resources if not required. It supports to build fault tolerance systems , provides software upgrades, maintaining high availability of the resources, disaster management to provide reliable services. It comes with the necessary tools to troubleshoot & monitoring for example, Splunk for Application Logs, Dynatrace for Application Performance Monitoring etc.,

Mobile and Web applications

Data as a service API can be used in both Mobile and Web applications as it decouples the business logic from the presentation layer. As the data is cached in the acceleration layers ,response will be much better at higher loads and various performance improvements such as respond only with the metrics requested by the user reduces the latency and makes it optimal for mobile applications . JSON format for smaller payloads has the flexibility to support different formats of representation and It can be easily integrated with libraries using D3.js ,Plotly to build Dashboards.

--

--

Keerthipriyan
Walmart Global Tech Blog

Staff Software Engineer , Finance Data Factory, Walmart Global Tech