Serverless BI: A data-driven path to digital transformation

Adopting a Serverless approach simplifies an organisation’s path to data-driven Business Intelligence.

“69% [of executives] report that they have not created a data-driven organization”

The reasons many executives haven’t been able to transform their organisations into being data-driven are manifold. Some of these are, of course, organisational — larger companies often lack the agility required to transform the process and mindsets for their organisations to become data-driven. There are also many failed transformations due to technology, with existing IT teams unable to create the tooling needed to allow companies access to accurate, reliable and up-to-date data.

Where does Serverless come in?

Serverless is a broad and polymorphic space, and is rapidly evolving. As a term, it’s widely used, but not well understood. For the purposes at hand, let’s think of Serverless as allowing applications and services to be run without having to manage the underlying infrastructure, as well as using a “buy-not-build” approach to services that have been commoditised.

Third-Party Services

In a Serverless approach, companies rarely build their own Serverless web applications; instead, they consume them as a service (e.g., AWS Cognito, Okta, Auth0). This leveraging of third-party services, as well as its pay-per-use model, makes Serverless an ideal fit for BI systems and data pipelines, as the Total Cost of Ownership (TCO) and time to market is reduced.

Security

There are security benefits as well. The underlying infrastructure and many critical security functions are managed by the cloud provider, making critical data security simpler and easier to audit and manage.

Data governance

Data governance concerns can also be simplified with a Serverless approach by employing a repeatable architecture that easily can be deployed across many regions. While Infrastructure as Code is not a new concept, Serverless architectures take it to the next level as more of the application infrastructure is simply the orchestration of cloud resources. The abstraction provided by Serverless services and the increased usage of cloud-native solutions makes multi-region deployments more easily manageable and maintainable.

Pay-Per-Use

Finally, pay-per-use is a natural fit for many BI needs, as reports are run periodically and access tends to be sporadic. This allows cost-efficient solutions for large organisations as well as affordability for startups.

Practical Serverless BI — In 4 Steps

For the purpose of this exercise, let’s take a simple case of an e-commerce company selling books. This company’s existing application happens to have a 100% Serverless architecture on AWS, and until now, the only analytics have been provided by Google Analytics. Other than that, executives have had to hunt down developers to run manual data extracts from the live databases. Let’s call our company “BookLess”.

Step 1: Storage — The Serverless Data Lake

Where the data is stored is the first concern of any BI strategy. Traditionally, many BI solutions work off traditional relational databases using SQL as their lingua franca, in combination with other less structured sources. Sometimes data is aggregated in a data warehouse. Other times, it’s pulled out of a range of application-specific databases, or from an unstructured collection of data known as a “Data Lake”.

A serverless data pipeline showing data flow from DynamoDB through Lambda via Kinesis Firehose to the S3 bucket.
Data streaming into S3 Data Lake
Data flowing from laptop with Amplify through Kinesis Firehose into the S3 Bucket data lake.

Step 2: Query — Getting Data From The Lake

We now have the data in our lake, and the business stakeholders are keen for insights. As the data is stored in an efficient format for query, thanks to Parquet (an open-source & performant flat data format), we can begin to gather insights using Amazon Athena (an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL).

Same queries of the Serverless data lake, with Redshift in place of Athena

Step 3: Insights and Analytics

So far, we’ve moved data around and formatted it for developers and data engineers to be able to do ad-hoc queries. Now, we need to put the data in the hands of the business.

Data-driven organisations need everyone at every level of the organisation to have access to the right data at the right time.

We need the ability for data to be queried by stakeholders without SQL skills, and we need to combine multiple data sources, along with data visualisation, dashboard creation, sharing and automated reports.

Mobile app showing quicksight graphs and metrics.
https://aws.amazon.com/blogs/big-data/announcing-the-new-mobile-app-for-amazon-quicksight/

Step 4: Artificial Intelligence — Discovery and Future-proofing

Companies look to gather clean data, as it aids not only their day-to-day reporting, but it also helps to leverage AI and machine learning (ML) to compete with industry competition in the future.

Conclusion

A Serverless BI solution ensures that investment is focused on getting the right data to the right people at the right time. The pay-per-use model ensures that large companies don’t waste money on licences for unused systems, and it lowers the barrier to entry for smaller companies to become data-driven, essentially democratising the data-driven approach to business intelligence.

--

--

Tools, techniques, and case studies of using serverless to release fast and scale optimally.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store