Build a Medallion architecture in MS Fabric Real time analytics
How to build Realtime analytics with medallion architecture?
Summary
This article speaks about how to implement medallion architecture in real time analytics using Microsoft Fabric. I broken down this exercise into 2 part, In this article you will find,
- Architecture diagram of Medallion architecture for Fabric real time analytics.
- How create EventHub and generate a synthetic data to mock the real time data ingestion?
- How to connect EventHub to KQL DB in your EventHouse?
- What is the use of onelake availability?
You will find last 2 points of this exercise in next article — https://medium.com/@suryaprakashmcetit/build-a-medallion-architecture-in-ms-fabric-real-time-analytics-ii-4e462dec425c
- How to promote raw data to silver layer using Update polices?
- How to aggregate data using Materialized view and promote that to gold layer?
Architecture Diagram
Creating EventHub and generating synthetic data:
Navigate to azure portal, and create Eventhub Namespace under your resource group.
Please follow the Azure link to see how to create EventHub — https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-create
Create event hub instance under the new resource.
Once you created the event hub, Create Shared access token with managed policy. This will be used later for integrating the Eventhub
Find the Generate data (preview) blade, Select Stocks data in the option. Preview you the JSON dataset that your KQL will receive.
Integrate EventHub with EventHouse in MS Fabric:
Navigate to MS Fabric portal, Go to your Workspace. Create new EventHouse
It will create KQL DB along with EventHouse. Click on it.
Once inside the KQL DB, Onelake availability is optional for our exercise feel free to skip it. I added here for educational purpose only.
Select OneLake availability, I already activated it. So it says active in the below picture.
What is the use of OneLake availability?
Enabling data availability of KQL Database in OneLake means that customers can enjoy the best of both worlds: they can query the data with high performance and low latency in their KQL database and query the same data in Delta Lake format via any other Fabric engines such as Power BI Direct Lake mode, Warehouse, Lakehouse, Notebooks, and more.
KQL Database offers a robust mechanism to batch the incoming streams of data into one or more Parquet files suitable for analysis. The Delta Lake representation is provided to keep the data open and reusable. This logical copy is managed once, is paid for once and users should consider it a single data set.
Back to EventHub Integration :)
Click on your KQL DB and Select Get Data, then Select EventHub.
Once you select, It will ask for destination table you can create a new table.
Next setup a connection, use the SAS token that you created before for EventHub
Create a new table with this connection.
Now if you go back to the EventHub and send the stoke data. It should reflect in few seconds in the new table that you created in EventHouse.
Please find rest of the exercise here — https://medium.com/@suryaprakashmcetit/build-a-medallion-architecture-in-ms-fabric-real-time-analytics-ii-4e462dec425c