Thriving in the On-Demand Economy with DataOps
Analytics at Amazon Speed: Part III
In our latest blog series, we explore data analytics in the on-demand economy. Companies like Amazon and Google have turned instant fulfillment into competitive advantages and are being rewarded in the marketplace. As consumers adapt to this “new normal,” the expectation of instant delivery is crossing into other domains. For example, data analytics users can’t or won’t wait weeks or months for new analytics. Data analytics teams that can successfully meet the requirements for rapid delivery of new analytics will play a high-visibility role in helping their organizations compete in the on-demand economy. Improving the speed and robustness of analytics can be achieved using a process and tools approach called DataOps. This blog is part 3 of a 4-part series.
At Google, over 23,000 R&D employees build over 5,000 different services (software components) such as login, storage, and indexing. These services are shared among Google’s wide array of products, which are continuously evolving. A large number of Google services are released to users multiple times in a single week. The Google consumer surveys group deploys code to customers eight minutes after a developer completes writing and testing it. Google maintains two instances of their system, one for production and one for testing. Over one hundred million automated test scripts are run per day to make sure that the new features that are released by developers work cohesively with the rest of the services. Product managers at Google deploy a feature to a small percentage of users before rolling it out to everyone. This enables them to receive feedback from users before going fully live. The continuous releases update the feature until product managers are sure that it is robust and enhances the user experience. This allows them to keep improving the software services that form the components of their product. As a feature matures, they release it to larger and larger segments of the customer base, ensuring that the new service integrates with the existing user experience. This powerful tool, enabled by continuous delivery, reduces risk and keeps the product teams focused on the needs of their customers.
Continuous Delivery in Data Analytics
Most analytics teams are back in the waterfall world of painstaking bureaucracy: writing detailed specifications, planning sequential development schedules and taking months to implement changes.
Continuous delivery has become one of the core competencies that will determine which companies will thrive and which will be left behind. The demands of the on-demand economy will fall squarely on the shoulders of data analytics teams who, in this new environment, are expected to deliver analytics at Amazon speed. In a recent survey, the technology research firm Gartner found that only about half of all chief data officers (CDO) in large organizations are considered to be successful in their roles. These CDO’s work for companies with a rocky future — without a responsive analytics function, how can companies expect to compete and win in rapidly evolving markets? In the last 15 years, 52% of Fortune 500 companies have simply disappeared.
To compete in the new economy, a company needs information about customers, trends, and markets that only the data analytics team can provide. Analytics can become the core competency that enables companies to successfully navigate the rocky waters of the on-demand economy. In this new world, the CDO and the data analytics team have unique visibility. Successful analytics teams will help lead their companies to bright futures. Those who ignore the changes afoot will fade into oblivion like most former Fortune 500 companies.
CDO’s and data analytics teams can improve their performance and quality by instituting process and tools changes that have been shown to work in software development and lean manufacturing. These changes, called DataOps, can help a team deliver analytics at Amazon speed while ensuring that new analytics do not disrupt operations.
DataOps is an approach that enables data analytics teams to thrive in the on-demand economy. It incorporates the speed of Agile software development, and the responsiveness of DevOps and continuous delivery, into data analytics. Like the continuous delivery process at Google, DataOps places a great deal of emphasis on automated testing at each stage of the data analytics pipeline, in order to ensure quality. This testing supports statistical process control (SPC) that is so important to manufacturing quality improvement.
Lean Data Analytics
Lean manufacturing is a key part of the intellectual heritage of DataOps. Like a manufacturing process, data-analytics progresses through a series of steps to produce the desired output in the form of reports, models, and views. At an abstract level, the data-analytics pipeline is analogous to a manufacturing process. Statistical process control (SPC) is a well-known method used to improve manufacturing quality. If key measures are found to be within specific limits, then the process is known to be functioning within its expected bounds. When SPC is applied to the data-analytics pipeline, it can help to assure quality as well as warn the analytics team of unexpected patterns in the data, enabling them to update the analytics and/or develop more robust tests. The emphasis on testing in DataOps reflects the importance of SCP in achieving adaptive, robust analytics. Companies like Google have implemented continuous-delivery pipelines with automated testing with great success.
Data Analytics in a DataOps World
By applying SCP, DevOps and Agile Development to data analytics, DataOps maps well to analytics in the on-demand economy. DataOps requires changes in both processes and tools that deliver analytics. With DataOps, CDO’s and data analytics teams are able to respond to requests for changes with previously unfathomable speed. When a C-level executive asks for a new view of a data set, the data analytics team responds that same day. This unlocks the productivity and creativity of decision makers. It allows key contributors to experiment with analytics, seeking new patterns and trends. Good companies that master their analytics become outstanding companies.
The data analytics pipeline in DataOps is automated so changes flow through to the users rapidly and continuously. Testing ensures that any updates are implemented without errors. No more waking up on Saturday morning after a long week wondering if a change the day before broke something unanticipated. No more enterprise-critical IT alerts at off hours.
Utilizing DataOps, Data analytics engineers, analysts, and scientists can work on changes without getting in each other’s way. All changes to the analytics are captured, managed and backed up so that they are easily reproducible. The diffuse bits and pieces of analytics that are spread across the many hard drives of individual employees are collected into one coherent repository. Team members can work on their own private copies of the data, so no more clashes with production and interfering with live, business-critical systems.
DataOps allows the data analytics team to share code and methods with each other. No more reinventing the wheel. Changes can be shared and adopted across the whole team. Complex processing is encapsulated and isolated so that modularity is improved across the data analytics code base.
DataOps enables the data analytics pipeline to be flexible enough to adapt to the run-time conditions that frequently recur. You can filter a database according to any required criteria and include or exclude specific steps in the workflow. If a new model is released, the data analytics team can make both the new and old models simultaneously available. If an analysis step is interrupted, it can be restarted without losing hours of batch processing time.
DataOps stores raw data in data lakes with purpose-built data marts and data warehouses serving the organization’s day-to-day analytics needs. When modifications are required, the code that generated the data warehouses and data marts can be modified to accommodate the new requirements. Cloud resources allow the new, modified data warehouse to be spun up in a matter of a few minutes without impacting operations.
How then can you implement DataOps in your data analytics pipeline? It can be carried out in seven simple steps with your existing tools.
This blog series explores how data analytics teams can cope with delivering analytics at Amazon speed. In our next installment, we provide a simple plan for implementing DataOps in your enterprise.