Analytics Vidhya
Published in

Analytics Vidhya

Data Lakehouse Architecture — Azure Synapse Serverless SQL Pools

Data Lakehouse is the new buzzword in the current data analytics world.

Today most businesses rely on data to make smarter business decisions and this data is coming from various sources and in various forms, in various sizes, and at varying frequencies. Companies are looking beyond the limitations of the traditional data warehouses architecture to enable advanced analytics, data science, and machine learning on all of this data. Data Lakehouse is one such architecture that addresses many of the traditional data warehouses architecture limitations.

Now let's see How we can build a Data Lakehouse architecture using the services that Azure offers.

The Serverless SQL Pools enable various data sources to be queried ( Data Lake Files, Spark Tables, and Cosmos DB) without extracting, transforming, and loading the data into another data store. This eliminates the additional data stores to hold curated data.

Data Flow

  1. Bring together all your structured, unstructured, and semi-structured data (logs, files, and media) using Synapse Pipelines to Azure Data Lake Storage.
  2. Use Serverless SQL pools to clean and transform the structureless datasets and store them to Data Lakehouse using CETAS.
  3. Use Serverless SQL pool endpoint to connect to the Power BI and create beautiful visuals
  4. Use scalable machine learning/deep learning techniques, to derive deeper insights from this data using Python, Scala, or .NET, with notebook experiences in Apache Spark pool.

Components

  • Azure Synapse Pipelines allows you to create, schedule, and orchestrate your ETL/ELT workflows.
  • Azure Data Lake is Massively scalable and cost-effective storage for any type of unstructured, semi-structured, and structured data,
  • Azure Synapse Serverless SQL Pools is an auto-scale compute environment that uses T-SQL to query the data lake directly (no need to copy or load data into a specialized store). Serverless SQL pool is serverless, hence there’s no infrastructure to set up or clusters to maintain. A default endpoint for this service is provided within every Azure Synapse workspace, so you can start querying data as soon as the workspace is created
  • Azure Data Lakehouse A Logical Data Warehouse built on top of Azure Data Lake storage using Serverless SQL Pools. This allows data from disparate systems to be viewed without movement or transformation
  • Power BI is a suite of business analytics tools that deliver insights throughout your organization. Connect to hundreds of data sources, simplify data prep, and drive unplanned analysis. Produce beautiful reports, then publish them for your organization to consume on the web and across mobile devices.
  • Azure Machine Learning is a cloud-based service for creating and managing machine learning solutions. It’s designed to help data scientists and machine learning engineers leverage their existing data processing and model development skills & frameworks

That’s it for now folks! ! Hope you enjoyed the article. See you in my next article (how to build a Data Lakehouse using Serverless SQL Pools) until then stay healthy and happy learning.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store