<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[Stories by John Elisa on Medium]]></title>
        <description><![CDATA[Stories by John Elisa on Medium]]></description>
        <link>https://medium.com/@johnelisaaa?source=rss-d1bc041ef9a3------2</link>
        <image>
            <url>https://cdn-images-1.medium.com/fit/c/150/150/1*w8IHLPq1mQLS7JjGhKXJrg.png</url>
            <title>Stories by John Elisa on Medium</title>
            <link>https://medium.com/@johnelisaaa?source=rss-d1bc041ef9a3------2</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Sun, 17 May 2026 09:29:35 GMT</lastBuildDate>
        <atom:link href="https://medium.com/@johnelisaaa/feed" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[Azure & Power BI Project: Build a Retail Data Warehouse from Scratch]]></title>
            <link>https://medium.com/@johnelisaaa/azure-power-bi-project-build-a-retail-data-warehouse-from-scratch-97ec96c4ba3c?source=rss-d1bc041ef9a3------2</link>
            <guid isPermaLink="false">https://medium.com/p/97ec96c4ba3c</guid>
            <category><![CDATA[data-analysis]]></category>
            <category><![CDATA[etl-pipeline]]></category>
            <category><![CDATA[data-engineering]]></category>
            <category><![CDATA[azure-devops]]></category>
            <category><![CDATA[cloud-services]]></category>
            <dc:creator><![CDATA[John Elisa]]></dc:creator>
            <pubDate>Wed, 01 Oct 2025 04:21:43 GMT</pubDate>
            <atom:updated>2025-10-01T04:21:43.567Z</atom:updated>
            <content:encoded><![CDATA[<p>Before starting the tutorial, download these .csv files <a href="https://drive.google.com/drive/folders/10BKlhBGrjFuPNvgBb5czxNiil5R8xXUy?usp=sharing">https://drive.google.com/drive/folders/10BKlhBGrjFuPNvgBb5czxNiil5R8xXUy?usp=sharing</a></p><p><em>These csv files are short and simple, but nonetheless will be sufficient for the purposes of the tutorial.</em></p><h3>1. The Story: Why We Are Building This</h3><p>Bobby is an Operations Manager for a growing retail chain called “BobbyMart”. As the business grows, he wants to start making data-driven decisions.</p><h4>1.1 How Retail Data is Commonly Handled</h4><p>A common case in the business world, especially in sectors like retail, is that it’s rare for a single system to handle everything. Data is almost always separated into at least two categories: <strong>Transactional Data</strong> and <strong>Master Data</strong>.</p><p><strong>Transactional Data</strong> is the high-volume data generated by daily operations. For a retailer, this comes directly from the Point-of-Sale (POS) system.</p><ul><li><strong>Legacy/On-Premise POS Systems:</strong> Commonly,<strong> </strong>at the end of the business day, the system will automatically generate a file — most often a CSV— containing every single sales transaction from that day.</li><li><strong>Modern/Cloud POS Systems (e.g., Shopify, Square): </strong>These systems often have APIs for real-time data access and cloud storage.</li></ul><p><strong>Master Data </strong>is the descriptive, low-volume data that doesn’t change often. This data is almost always <strong>stored in a relational SQL database</strong> that is part of a larger system.</p><h4>1.2. Why build a data warehouse?</h4><p>You might be thinking, isn’t it possible to answer critical questions like “What are our top 10 selling products?” by using a standard SQL database?</p><p>All we would need to do is set up a MySQL server instance, create tables, load the csv data into the tables, and run a single query to get the answer.</p><p><strong>So, why build a full data warehouse?</strong></p><p>A standard database (like MySQL) is an <strong>OLTP (Online Transaction Processing) system</strong>. It’s designed to be fast and efficient at handling thousands of small transactions per second, and not designed to handle huge, complex analytical queries that scan millions of rows.</p><p>A Data Warehouse (like Azure Synapse) is an <strong>OLAP (Online Analytical Processing) system.</strong> It is specifically architected to be incredibly fast at running large, complex analytical queries across massive datasets.</p><p><strong>A simple SQL approach requires manual work everytime.</strong> Creating a pipeline to automate the whole process ensures proper data transformation, reliability, and quality.</p><p>When a product category changes, a transactional (OLTP) database usually just overwrites the value. A data warehouse <strong>(OLAP) instead uses Slowly Changing Dimensions (SCDs) to preserve history.</strong> We don’t need to understand all the details of that yet, just knowing about it is good for now.</p><h3>2. Azure &amp; DevOps Setup</h3><h4>2.1. Create Azure DevOps Project:</h4><p>Go to <a href="http://dev.azure.com">dev.azure.com</a> and click the “Get started with Azure” button. You need to sign in &amp; have an active subscription, which Azure provides a free trial for.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Ypt17OsiXQ2mpoCm5TIdVg.png" /></figure><p>You will be redirected to the main Azure portal. Then, click the “Azure DevOps organizations” service.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1000/1*17hIGAZNXiOqtppn2Pc1jg.png" /></figure><p>Then, click the “Create new organization” button. An organization is basically a container for all your future projects, name it after you or your company.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*RGUI61M0RM-Igee6Eq41Ow.png" /></figure><p>We can now create a project. I’ll name mine “Retail-Data-Warehouse”, keeping the visibility as private.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*au4tc6BSyV1TYQyASEwsRg.png" /></figure><p>Navigate to “Repos” and initialize a main branch with a README file.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*HUpdqQ21WYd4mIhG55OHUQ.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*xyOKkGhzZejWD0giL5LUqg.png" /></figure><p>Then, create a new folder “sql-scripts” for our .sql files. Git folders can’t be empty, so just add a text file in there for now. Ensure you commit the change.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*pZGgTtnXKioJH7z_t2Cl4A.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*EVcGyLrjwIoRzZ_XHsSYTw.png" /></figure><h4><strong>2.2. Create Azure Resource Group:</strong></h4><p>Great! Now that we’ve set up our repo, let’s head back to the Azure portal (portal.azure.com).</p><p>A Resource Group is used for:</p><ul><li><strong>Access control</strong> : Assigning the Engineering team a “Contributor” role and the Finance team a “Reader” role.</li><li><strong>Lifecycle management</strong> : If our project has 10 different Azure services, deleting the entire resource group prevents us having to shut down all of the services manually one at a time.</li><li><strong>Billing &amp; Cost Management</strong> : In large companies, different departments have to pay for their own cloud usage (e.g. running a report comparing the cost of resource groups “retail-data-warehouse” vs “webapp-prod”)</li></ul><p>Click on the “Resource groups” option.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1000/1*laEH3Ph78IAsaLXp34iKwg.png" /></figure><p>Create a new Azure resource group. Give it a descriptive name like “rg-retail-dw-project” (rg = resource group; dw = data warehouse) and choose your region.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*06XyOdZRnlpAiwclLoxrlg.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*ZdB3pAWY2GFPDjSGj-DBUw.png" /></figure><h4>2.3. Create Storage Account (Data Lake):</h4><p>This is a highly scalable cloud storage service, which will be be the central landing zone for all our raw, unprocessed data (our .csv files).</p><p>Then, go to your newly created resource group and click “Create”.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*UxR5wGcqMU3Nmx3rchoEjg.png" /></figure><p>Search up “Storage Account”, choose the one from microsoft and click create.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*YzzpKDZmAZIE1EQXfD_mZg.png" /></figure><p>Name it uniquely like “stadatalakebobbymart” (st = storage; adatalake (purpose); bobbymart (retail store name).</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*KjeC_fq1zRgFZj6aFDsONQ.png" /></figure><p>Crucially, on the “Advanced” tab, <strong>check “Enable hierarchical namespace”</strong> (will be explained later on why it’s crucial).</p><p>Then, click on Review + Create.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Pdc0nW6sHOi0EoHF3V-k-A.png" /></figure><p>Once created, go to the resource, navigate to “Containers”, and create a container named raw-data.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*PQOxCGBG2CviEQnjmQsaCA.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1000/1*zKHui2ELnd_iLyTsMLmTvg.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*NaqKHvCjJAx9wJpil0aepQ.png" /></figure><p>Inside the “raw-data” container, upload the three CSV files you downloaded earlier.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*wIMlsGszpXPeZFW2UCBtiw.png" /></figure><p>Enabling “hierarchical namespace” is the specific action that transforms a standard Azure Storage Account into an Azure Data Lake Storage Gen2 (ADLS Gen2).</p><p>When we use the portal to upload the .csv files into a “folder” we named “raw-data”, the system doesn’t create a real folder. It simply creates a blob and gives it the literal name “raw-data/products.csv”.</p><p>This is important for a few reasons:</p><ul><li><strong>Listing Files is Slow:</strong> If we wanted to get all files in the raw-data folder, it can’t just open a directory. It has to scan the name of every single blob in the entire container to find the ones that start with the prefix “raw-data/”.</li><li><strong>Renaming “Folders” is Impossible: </strong>You cannot perform a single operation to rename “raw-data” to “processed-data”. You would have to manually run a script that copies every single one of your files to a new blob with the new name prefix and then deletes the old ones.</li><li><strong>Directory-Level Security Doesn’t Exist: </strong>You can’t apply a security rule like “Grant the sales team access to the raw-data folder.” Your only options are to grant them access to the entire container or to manage permissions for every single file individually.</li></ul><p>Enabling hierarchical namespace is great for scalability and efficiency. Although we don’t need it for this particular project, it’s good practice to know for when we work with big data.</p><h4>2.4: Create Azure Data Factory (ADF):</h4><p>ADF tells other services when and how to perform their tasks.</p><p>In the resource group, click “Create”.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*iz7JUs9U4bdjuAZBljF04g.png" /></figure><p>Search for “Data Factory”</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*c7Y3GSQ2eCcrPiLIE3nA1A.png" /></figure><p>Name it “adf-retail-dw-project” (dw = data warehouse).</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*uN9c6Kwu83gGh_6n7HsELg.png" /></figure><p>Click “Review + create”.</p><h4>2.5: Create Azure Synapse Analytics Workspace:</h4><p>This serves as both the final, structured data warehouse and the workbench for querying and analysing that data.</p><p>In the resource group, click “Create”. Search for “Azure Synapse Analytics”.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1000/1*o0okjtf14OtTnjVoyoXbTg.png" /></figure><p>Name the workspace “synw-retail-dw-project”.</p><p>Create a new Data Lake Storage Gen2 account when prompted (this is for Synapse’s own use), different to our earlier data lake we made which is for raw csv data.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/694/1*rYThI7vnfg-XpmP6LGcR4w.png" /></figure><p>During creation, set the SQL administrator password. Remember this password!</p><p>Click “Review + create”. This can take a few minutes.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*EIhaerqCq5UP4jCe5ypANg.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*0m5fkNzT2x-mA8JO60_rfg.png" /></figure><h3>3. Data Modeling — Build the Warehouse Structure</h3><blockquote>Ensure you pause Your Dedicated SQL Pool to save your free credits when not using it. Navigate to your Synapse Workspace, Go to SQL Pools, Select Your Pool, and click Pause.</blockquote><p>First, we need to create a dedicated SQL pool.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*U4FDtkG77VgE5Rs16PdqJg.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/933/1*8jF4otmKVG1QRbAFLe7Dag.png" /><figcaption>Ensure the performance level slider is all the way to the left for this project</figcaption></figure><p>Click the “Review + create” button, then “Create”. The deployment will begin, it can take 5–10 minutes for the dedicated SQL pool to be ready.</p><p>In the meantime, let’s take a look of what our .csv files look like:</p><p>products.csv:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/296/1*Vd2IF_v9o0ugCodXpsT_BQ.png" /></figure><p>stores.csv:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/374/1*t0BJEyt2-FHm5eR2blPCjQ.png" /></figure><p>sales_*.csv:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/427/1*_Xvm-139WdyvEVGhxO-rmQ.png" /></figure><p>Let’s reiterate our goal here, transform raw CSV files into a fast, interactive dashboard, to make data-driven desicions. We now have an empty warehouse that we’ve finished setting up. Now it’s time to define the structure and blueprint of our data warehouse.</p><p>This is a prerequisite for injecting data. We want to first create the tables according to our csv data structure, and INSERT data into them.</p><p>So, let’s open Synapse Studio:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*YmWvSYa5dYfXrOOVsC2AJQ.png" /></figure><p>Click the “Develop” tab, and create 3 SQL scripts.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1000/1*587gNSQemncZQoUxfh1Odg.png" /></figure><blockquote>For each of these SQL scripts, ensure you have connected to your dedicated SQL pool that you’ve created earlier before running the scripts.</blockquote><p>When you create a Synapse Workspace, you get two different SQL engines (in-built &amp; dedicated). We’ll be using the dedicated SQL pool.</p><p>A <strong>Dedicated SQL Pool</strong> is a traditional, stateful database. You can do everything (CREATE, INSERT, UPDATE, DELETE), and the data permanently stored inside the pool.</p><p>The <strong>Serverless SQL Pool </strong>is a stateless query engine. It doesn’t store any data itself. Its only job is to run SELECT queries on files that live outside of it, in our data lake. We cannot run UPDATE or DELETE commands because we’re just reading external files, not managing data in a database.</p><h4>SQL Script 1:</h4><p>This script builds the foundational <strong>structure </strong>of our final, optimized data warehouse. It creates the three permanent, internal tables (DimProduct, DimStore, FactSales) that will hold the clean and structured data ready for analysis.</p><pre>-- Drop existing tables<br>IF OBJECT_ID(&#39;dbo.FactSales&#39;, &#39;U&#39;) IS NOT NULL<br>    DROP TABLE dbo.FactSales;<br>GO<br><br>IF OBJECT_ID(&#39;dbo.DimProduct&#39;, &#39;U&#39;) IS NOT NULL<br>    DROP TABLE dbo.DimProduct;<br>GO<br><br>IF OBJECT_ID(&#39;dbo.DimStore&#39;, &#39;U&#39;) IS NOT NULL<br>    DROP TABLE dbo.DimStore;<br>GO<br><br>-- Create tables fresh<br>-- Dimension for Products<br>CREATE TABLE DimProduct (<br>    ProductKey INT IDENTITY(1,1) NOT NULL,<br>    SourceProductID VARCHAR(50) NOT NULL,<br>    ProductName VARCHAR(100) NOT NULL,<br>    Category VARCHAR(50),<br>    CONSTRAINT PK_DimProduct PRIMARY KEY NONCLUSTERED (ProductKey) NOT ENFORCED<br>)<br>WITH (<br>    DISTRIBUTION = REPLICATE,<br>    CLUSTERED COLUMNSTORE INDEX<br>);<br><br>-- Dimension for Stores<br>CREATE TABLE DimStore (<br>    StoreKey INT IDENTITY(1,1) NOT NULL,<br>    SourceStoreID VARCHAR(50) NOT NULL,<br>    StoreName VARCHAR(100) NOT NULL,<br>    State VARCHAR(50),<br>    CONSTRAINT PK_DimStore PRIMARY KEY NONCLUSTERED (StoreKey) NOT ENFORCED<br>)<br>WITH (<br>    DISTRIBUTION = REPLICATE,<br>    CLUSTERED COLUMNSTORE INDEX<br>);<br><br>-- Fact Table for Sales<br>CREATE TABLE FactSales (<br>    SalesKey INT IDENTITY(1,1) NOT NULL,<br>    ProductKey INT NOT NULL,<br>    StoreKey INT NOT NULL,<br>    SaleTimestamp DATETIME,<br>    Quantity INT NOT NULL,<br>    UnitPrice DECIMAL(10, 2) NOT NULL,<br>    TotalSaleAmount DECIMAL(10, 2) NOT NULL,<br>    CONSTRAINT PK_FactSales PRIMARY KEY NONCLUSTERED (SalesKey) NOT ENFORCED<br>)<br>WITH (<br>    DISTRIBUTION = HASH(ProductKey),<br>    CLUSTERED COLUMNSTORE INDEX<br>);</pre><p>To learn more about T-SQL stuff, I recommend watching:</p><ul><li><a href="https://www.youtube.com/watch?v=k9DpO91W76o">https://www.youtube.com/watch?v=k9DpO91W76o</a></li><li><a href="https://www.youtube.com/watch?v=BxAj3bl00-o">https://www.youtube.com/watch?v=BxAj3bl00-o</a></li></ul><p>The naming convention (<em>DimProduct, FactSales, SourceProductID</em>) is best practice that comes from a data warehousing design pattern called a <strong>Star Schema.</strong></p><p>In a star schema, you have two types of tables:</p><ul><li><strong>Dimension Tables (Dim)</strong>: These tables hold descriptive, contextual information of our data. DimProduct describes our products, and DimStore describes our stores, their role is to provide context.</li><li><strong>Fact Tables (Fact):</strong> This is the central table that holds the numbers, FactSales stores the sales transactions.</li></ul><h4>SQL Script 2:</h4><p>This script creates a direct connection between Synapse and the raw CSV files stored in our data lake. It defines “external tables” that act as pointers, allowing us to read and query the raw source data using SQL without having to import it first.</p><pre>-- Step 1: Create Database Scoped Credential<br>CREATE DATABASE SCOPED CREDENTIAL AzureStorageCredential<br>WITH IDENTITY = &#39;Managed Identity&#39;;<br>GO<br><br>-- Step 2: Create External Data Source<br>CREATE EXTERNAL DATA SOURCE RawDataSource<br>WITH (<br>    TYPE = HADOOP,<br>    LOCATION = &#39;abfss://raw-data@&lt;your-data-lake-name&gt;.dfs.core.windows.net&#39;,<br>    CREDENTIAL = AzureStorageCredential<br>);<br>GO<br><br>-- Step 3: Create External File Format<br>CREATE EXTERNAL FILE FORMAT CsvFormat<br>WITH (<br>    FORMAT_TYPE = DELIMITEDTEXT,<br>    FORMAT_OPTIONS (<br>        FIELD_TERMINATOR = &#39;,&#39;,<br>        STRING_DELIMITER = &#39;&quot;&#39;,<br>        FIRST_ROW = 2,<br>        USE_TYPE_DEFAULT = FALSE<br>    )<br>);<br>GO<br><br>-- Step 4: Create External Tables<br>CREATE EXTERNAL TABLE ext_products (<br>    ProductID VARCHAR(50),<br>    ProductName VARCHAR(100),<br>    Category VARCHAR(50)<br>)<br>WITH (<br>    LOCATION = &#39;/products.csv&#39;,<br>    DATA_SOURCE = RawDataSource,<br>    FILE_FORMAT = CsvFormat<br>);<br>GO<br><br>CREATE EXTERNAL TABLE ext_stores (<br>    StoreID VARCHAR(50),<br>    StoreName VARCHAR(100),<br>    State VARCHAR(50)<br>)<br>WITH (<br>    LOCATION = &#39;/stores.csv&#39;,<br>    DATA_SOURCE = RawDataSource,<br>    FILE_FORMAT = CsvFormat<br>);<br>GO<br><br>CREATE EXTERNAL TABLE ext_sales (<br>    SaleTimestamp DATETIME,<br>    ProductID VARCHAR(50),<br>    StoreID VARCHAR(50),<br>    Quantity INT,<br>    UnitPrice DECIMAL(10, 2)<br>)<br>WITH (<br>    LOCATION = &#39;/sales_2025-09-23.csv&#39;,<br>    DATA_SOURCE = RawDataSource,<br>    FILE_FORMAT = CsvFormat<br>);<br>GO<br></pre><p>Here’s what’s going on:</p><pre>CREATE DATABASE SCOPED CREDENTIAL AzureStorageCredential<br>WITH IDENTITY = &#39;Managed Identity&#39;;</pre><p>This step creates a security credential. Instead of using a password, it uses the Managed Identity of your Synapse workspace.</p><pre>CREATE EXTERNAL DATA SOURCE RawDataSource<br>WITH (<br>    TYPE = HADOOP,<br>    LOCATION = &#39;abfss://raw-data@&lt;your-data-lake-name&gt;.dfs.core.windows.net&#39;,<br>    CREDENTIAL = AzureStorageCredential<br>);<br>GO</pre><p>This simply gives a name (RawDataSource) to the physical location of your data lake for easy access.</p><pre>CREATE EXTERNAL FILE FORMAT CsvFormat<br>WITH (<br>    FORMAT_TYPE = DELIMITEDTEXT,<br>    FORMAT_OPTIONS (<br>        FIELD_TERMINATOR = &#39;,&#39;,<br>        STRING_DELIMITER = &#39;&quot;&#39;,<br>        FIRST_ROW = 2,<br>        USE_TYPE_DEFAULT = FALSE<br>    )<br>);<br>GO</pre><p>This tells Synapse how to read the files. It’s a set of instructions:</p><ul><li><em>FORMAT_TYPE = DELIMITEDTEXT:</em> The file uses a character to separate values.</li><li><em>FIELD_TERMINATOR = ‘,’:</em> The character used is a comma.</li><li><em>FIRST_ROW = 2:</em> This is important. It tells Synapse to skip the first line (the header row) and start reading the actual data from the second line.</li></ul><pre>CREATE EXTERNAL TABLE ext_products ( ... )<br>WITH ( LOCATION = &#39;/products.csv&#39;, ... );</pre><p>This is the final step where everything comes together.</p><p><em>CREATE EXTERNAL TABLE</em> does not create a real table that stores data inside your database. Instead, it creates a metadata-only object that combines:</p><ul><li><em>The Schema: </em>The column names and data types (ProductID VARCHAR(50), etc.).</li><li><em>The Location:</em> The specific file to point to (/products.csv).</li><li><em>The Address: </em>Where to find that file (using RawDataSource).</li><li><em>The file format:</em> How to read that file (using CsvFormat).</li></ul><p>After running this, you can write a query like<em> “SELECT * FROM ext_products”</em>, and Synapse will, in real-time, go to the data lake, open products.csv, use the rules from CsvFormat to read it, and show you the results as if it were a normal SQL table.</p><h4>SQL Script 3:</h4><p>This script packages the entire data transformation and loading process into a single, repeatable command. It creates a stored procedure that extracts data from the external tables, transforms it by joining tables and calculating new columns, and loads the final, clean results into our permanent warehouse tables.</p><pre>-- Create the procedure<br>CREATE PROCEDURE sp_LoadRetailData<br>AS<br>BEGIN<br>    SET NOCOUNT ON;<br><br>    TRUNCATE TABLE DimProduct;<br>    TRUNCATE TABLE DimStore;<br>    TRUNCATE TABLE FactSales;<br>    <br>    -- Load DimProduct from external table<br>    INSERT INTO DimProduct (SourceProductID, ProductName, Category)<br>    SELECT ProductID, ProductName, Category<br>    FROM ext_products;<br>    <br>    -- Load DimStore from external table<br>    INSERT INTO DimStore (SourceStoreID, StoreName, State)<br>    SELECT StoreID, StoreName, State<br>    FROM ext_stores;<br>    <br>    -- Load FactSales from external table<br>    INSERT INTO FactSales (ProductKey, StoreKey, SaleTimestamp, Quantity, UnitPrice, TotalSaleAmount)<br>    SELECT<br>        dp.ProductKey,<br>        ds.StoreKey,<br>        rs.SaleTimestamp,<br>        rs.Quantity,<br>        rs.UnitPrice,<br>        rs.Quantity * rs.UnitPrice AS TotalSaleAmount<br>    FROM ext_sales rs<br>    JOIN DimProduct dp ON rs.ProductID = dp.SourceProductID<br>    JOIN DimStore ds ON rs.StoreID = ds.SourceStoreID;<br>END;<br>GO</pre><p>Here’s what’s going on:</p><pre>TRUNCATE TABLE DimProduct;<br>TRUNCATE TABLE DimStore;<br>TRUNCATE TABLE FactSales;</pre><p>The first thing it does is completely empty out your three main data warehouse tables. The TRUNCATE command is a very fast way to delete all rows. This ensures that every time the procedure runs, you are starting fresh with the latest data and not mixing it with old data.</p><pre>INSERT INTO DimProduct ... SELECT ... FROM ext_products;<br>INSERT INTO DimStore ... SELECT ... FROM ext_stores;</pre><p>Next, it performs a simple data copy. It reads all the rows from our external “pointer” tables (ext_products and ext_stores) and inserts them directly into our permanent dimension tables (DimProduct and DimStore).</p><pre>INSERT INTO FactSales (ProductKey, StoreKey, SaleTimestamp, Quantity, UnitPrice, TotalSaleAmount)<br>    SELECT<br>        dp.ProductKey,<br>        ds.StoreKey,<br>        rs.SaleTimestamp,<br>        rs.Quantity,<br>        rs.UnitPrice,<br>        rs.Quantity * rs.UnitPrice AS TotalSaleAmount<br>    FROM ext_sales rs<br>    JOIN DimProduct dp ON rs.ProductID = dp.SourceProductID<br>    JOIN DimStore ds ON rs.StoreID = ds.SourceStoreID;</pre><p>This final INSERT command is the most important part, as it transforms and loads our main sales data in one powerful step.</p><p>Here’s exactly what it does:</p><ul><li>It combines the data, reading from the raw sales data (ext_sales) and joining it with the DimProduct and DimStore tables we just loaded.</li><li>It swaps the text IDs for number keys. The join looks up and replaces the original, slow text IDs (like ‘PROD-001’ and ‘MEL-01’) with their fast, integer-based ProductKey and StoreKey.</li><li>It calculates a new column on the fly by multiplying the Quantity and UnitPrice to create the TotalSaleAmount.</li><li>It loads the final, clean data — now enriched with the correct keys and the calculated total — into the permanent FactSales table.</li></ul><h3>4. ETL — Build the Data Pipeline in ADF</h3><blockquote>Ensure you have Power BI Desktop installed</blockquote><ol><li>Open Power BI Desktop.</li><li>On the Home ribbon, click Get data.</li><li>In the search box, type Synapse and select Azure Synapse Analytics SQL. Click Connect.</li><li>Find Your Server Name: You need your Synapse SQL endpoint.</li><li>Go to the Azure Portal and navigate to your Synapse workspace.</li><li>On the Overview page, find the Dedicated SQL endpoint. It will look like your-workspace-name.sql.azuresynapse.net. Copy this value. Paste the endpoint URL into the Server box in Power BI.</li><li>Set Data Connectivity mode to DirectQuery. This means Power BI will query the database live instead of importing the data. Click OK.</li><li>Ensure you’ve signed in.</li><li>Select Tables: A “Navigator” window will appear, showing your database. Expand your database.</li></ol><p><strong>Build Visual 1: Total Sales Card</strong>:</p><p>Check the boxes next to your three tables: DimProduct, DimStore, and FactSales. Click Load.</p><p>In the Visualizations pane on the right, click the Card visual (looks like 123). A blank card will appear on your canvas.</p><p>In the Data pane (far right), expand FactSales and drag the TotalSaleAmount field onto the “Fields” well of the card visual. You’ll now see the grand total of all sales.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1000/1*hufIrEc0rDXIbxaI0hTgLw.png" /></figure><p><strong>Build Visual 2:</strong></p><p>Sales by State Bar Chart: Click on a blank area of the canvas. In the Visualizations pane, click the Stacked bar chart icon.</p><p>Drag fields from the Data pane: From DimStore, drag State to the Y-axis. From FactSales, drag TotalSaleAmount to the X-axis.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1000/1*Z0f5GVBzFXHYWWu6WoHqIA.png" /></figure><p>Hopefully, this tutorial helped provide a foundation for Azure data warehousing. There are many more things to explore and experiment with, this is just the tip of the iceberg in the world of data analytics!</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=97ec96c4ba3c" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[7 Essential Advanced SQL Techniques Explained Simply]]></title>
            <link>https://medium.com/@johnelisaaa/7-essential-advanced-sql-techniques-explained-simply-1abbfd1e1823?source=rss-d1bc041ef9a3------2</link>
            <guid isPermaLink="false">https://medium.com/p/1abbfd1e1823</guid>
            <category><![CDATA[database]]></category>
            <category><![CDATA[data-science]]></category>
            <category><![CDATA[data-analyst]]></category>
            <category><![CDATA[data-engineering]]></category>
            <category><![CDATA[sql]]></category>
            <dc:creator><![CDATA[John Elisa]]></dc:creator>
            <pubDate>Wed, 09 Jul 2025 06:43:53 GMT</pubDate>
            <atom:updated>2025-07-09T06:43:53.558Z</atom:updated>
            <content:encoded><![CDATA[<p>Let’s break down some powerful SQL tricks for solving real-world problems. We’ll use simple examples and step-by-step queries to see how they work, focusing on PostgreSQL.</p><h3><em>1. ROW_NUMBER()</em></h3><p>We have an orders table with <em>order_id, customer_id, order_date</em>, and other fields.</p><pre>CREATE TABLE orders (<br>    order_id SERIAL PRIMARY KEY,<br>    customer_id INT,<br>    order_date DATE,<br>    amount NUMERIC<br>);<br><br>INSERT INTO orders (customer_id, order_date, amount) VALUES<br>(1, &#39;2021-01-01&#39;, 50),<br>(1, &#39;2021-01-05&#39;, 100),<br>(1, &#39;2021-01-03&#39;, 70),<br>(2, &#39;2021-02-01&#39;, 200),<br>(2, &#39;2021-01-15&#39;, 120),<br>(3, &#39;2021-01-20&#39;, 60),<br>(3, &#39;2021-02-02&#39;, 40);</pre><p>Your boss tells you to find the most recent order — Well, that’s simple right?</p><pre>SELECT * FROM orders<br>ORDER BY order_date DESC;</pre><p>But then, your boss wants the most recent order <strong>for each customer</strong> (including its details). How can we turn this:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*c7V4QcYmVgXgqo28oLCZLA.png" /></figure><p>Into this:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Rtemx-6cDy1zbKVTZ2CzMg.png" /></figure><p>A naive approach might use aggregation to get the max date per customer, but then <strong>we still need the order details</strong>.</p><p><strong>Approach 1:</strong></p><pre>SELECT customer_id, MAX(order_date) AS recent_date<br>FROM orders<br>GROUP BY customer_id;</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/581/1*BztNrQvZaNK2YjujZAv1-Q.png" /></figure><p><strong>Approach 2:</strong></p><p>One workaround is to join back the above table to the orders table</p><pre>SELECT o.customer_id, o.order_id, o.order_date, o.amount<br>FROM orders o<br>-- join the table from approach 1 with original orders table<br>JOIN (<br>    SELECT customer_id, MAX(order_date) AS recent_date<br>    FROM orders<br>    GROUP BY customer_id<br>) AS m<br>-- match based on customer id and date<br>  ON o.customer_id = m.customer_id<br>  AND o.order_date = m.recent_date<br>ORDER BY o.customer_id;</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*RqClf7OpfIL7c5sxGYFQYg.png" /></figure><p><strong>Limitation:</strong> What if Two Orders Have the Same MAX(order_date)? Then both rows would show up in the result — so you’d get 2 orders for the same customer_id.</p><h4>☝️🤓 A Better Way: Use ROW_NUMBER() Instead</h4><p>SQL has a window function called ROW_NUMBER() that lets you assign a rank to each row, within groups. Let’s walk through how to use it step by step.</p><p><strong>Step 1: Add Row Numbers (No Grouping Yet)</strong></p><pre>SELECT <br>  order_id, <br>  customer_id, <br>  order_date,<br>  ROW_NUMBER() OVER() AS rn<br>  <br>FROM orders</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*IQjjhly9rrn3rhpGPE6Wiw.png" /><figcaption>Assigns a unique row number to each row</figcaption></figure><p><strong>Step 2: Row Numbers Per Customer (Still No Order)</strong></p><pre>SELECT <br>  order_id, <br>  customer_id, <br>  order_date,<br>  ROW_NUMBER() OVER(PARTITION BY customer_id) AS rn<br>  <br>FROM orders</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*cQVLiJ1nu5W8-L-5NmRwvQ.png" /></figure><p>This resets the row number for each customer. But the order of rows within each customer is still random unless we tell SQL how to sort them.</p><p><strong>Step 3: Add Sorting — Most Recent Order First</strong></p><pre>SELECT <br>  order_id, <br>  customer_id, <br>  order_date,<br>  ROW_NUMBER() OVER(PARTITION BY customer_id ORDER BY order_date DESC) AS rn<br>  <br>FROM orders</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*S9tpVcH2_7BsNZxOlSJZKQ.png" /></figure><p>Now we’re telling SQL:</p><ol><li>Group by customer</li><li>Sort each customer’s orders by date descending</li><li>Then number the rows</li></ol><p>That means the most recent order for each customer is always rn = 1.</p><p><strong>Step 4: Keep Only the Most Recent Order Per Customer</strong></p><p>Now we wrap the above in a subquery and filter for <em>rn = 1</em>:</p><pre>SELECT order_id, customer_id, order_date, amount<br>FROM (<br>-- Move step three into a subquery<br>  SELECT <br>    order_id, <br>    customer_id, <br>    order_date,<br>    amount,<br>    ROW_NUMBER() OVER(PARTITION BY customer_id ORDER BY order_date DESC) AS rn<br>  <br>  FROM orders<br>) as ranked<br>-- filter for rn = 1<br>WHERE rn = 1;</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Fm00KxzqSWJefwwfJDMKEQ.png" /><figcaption>This gives you exactly one row per customer, even if multiple orders had the same date.</figcaption></figure><p>As an alternative in PostgreSQL, you could use DISTINCT ON:</p><pre>SELECT DISTINCT ON (customer_id) *<br>FROM orders<br>ORDER BY customer_id, order_date DESC;</pre><p><em>DISTINCT ON (customer_id) *</em> says:<br>→ “Only return one row for each customer. The asterisk is to include all columns from that row”</p><p><em>ORDER BY customer_id, order_date DESC</em> ensures:<br>→ “Pick the row with the latest date per customer.”</p><blockquote><strong>This only works in PostgreSQL</strong> — other databases like MySQL, SQL Server, or SQLite don’t support it.</blockquote><blockquote>If you’re using PostgreSQL, and you want one row per group, based on some ordering, then DISTINCT ON is a powerful and clean shortcut.</blockquote><blockquote>However, If you want to stay database-portable, or need more flexibility (like top 3 orders per customer), go with ROW_NUMBER().</blockquote><h3>2. RANK() or DENSE_RANK()</h3><p>Let’s say you work with a sales table that tracks how much each product sold, and you want to figure out the best-selling products in each category.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/913/1*eDuu0ne61_Ah2ybWZIvHuw.png" /><figcaption>You can see there are ties — multiple products with the same sales in a category.</figcaption></figure><p>A simple way to get “the top seller per category” might be using something like:</p><pre>SELECT <br>  category, <br>  MAX(sales) <br>FROM sales_data <br>GROUP BY category;</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/785/1*HqUWbXGSHTq6WKsjjhXhog.png" /></figure><p>But that just gives you the number — not which product actually sold the most, or what the full ranking looks like. That’s where ranking functions come in handy.</p><p>We want to rank the products by sales, within each category.</p><pre>SELECT<br>  category,<br>  product,<br>  sales,<br>  RANK() OVER (PARTITION BY category ORDER BY sales DESC) AS rank,<br>  DENSE_RANK() OVER (PARTITION BY category ORDER BY sales DESC) AS dense_rank<br>FROM sales_data;</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/929/1*vP8kHaAOHDvBjsE0jS5p_g.png" /></figure><p><strong>What’s the difference?</strong></p><p>Let’s look at Gizmos. WidgetD and WidgetE both sold 150 — a tie!</p><p>RANK() gives them both 1, and then jumps to 3 for the next one. So it skips the number 2. DENSE_RANK() also gives them both 1, but the next rank is just 2 — no gap.</p><blockquote>- Use RANK() if you want to reflect actual placement like in sports. (Ties push the next item down.)</blockquote><blockquote>- Use DENSE_RANK() if you want consecutive numbers with no gaps — easier for filtering top N items.</blockquote><h3>3. LAG() and LEAD()</h3><p>Imagine you’re looking at a table of daily sales:</p><pre>CREATE TABLE daily_sales (<br>  sale_date DATE PRIMARY KEY,<br>  total_sales INTEGER<br>);<br><br>INSERT INTO daily_sales (sale_date, total_sales) VALUES<br>  (&#39;2021-01-01&#39;, 1000),<br>  (&#39;2021-01-02&#39;, 1200),<br>  (&#39;2021-01-03&#39;, 900),<br>  (&#39;2021-01-04&#39;, 1500);</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/937/1*_o5OOJsk279Uh1zJFEftKw.png" /></figure><p>We want to know:</p><ol><li>What were today’s sales?</li><li>What were yesterday’s sales?</li><li>How much did sales go up or down compared to yesterday?</li></ol><h4>LAG() lets you peek at the value from the previous row.</h4><pre>SELECT<br>  sale_date,<br>  total_sales,<br>  LAG(total_sales) OVER (ORDER BY sale_date) AS prev_day_sales<br>FROM daily_sales<br>ORDER BY sale_date;</pre><p><em>LAG(column_name)</em>: Give me the value of column_name from the previous row.</p><p>But how does SQL know what “previous” means? That’s where the OVER clause comes in:</p><p><em>OVER (ORDER BY sale_date)</em>: Order the data by sale_date, then look one row backward from the current row.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/930/1*k3rU7onT_YpqOtyBhIA-kg.png" /><figcaption><strong>Notice how the first row has no previous row, so the result is NULL.</strong></figcaption></figure><h4>LEAD(): Look at the Next Row</h4><p>LEAD() does the opposite — it looks ahead to the next row.</p><pre>SELECT<br>  sale_date,<br>  total_sales,<br>  LEAD(total_sales) OVER (ORDER BY sale_date) AS next_day_sales<br>FROM daily_sales;</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/952/1*BdcHqQchYFHB9TiMcEe81g.png" /><figcaption><strong>Again, the last row doesn’t have a “next day,” so it returns NULL.</strong></figcaption></figure><h4>Want to See Sales Differences?</h4><pre>SELECT<br>  sale_date,<br>  total_sales AS sales_today,<br>  LAG(total_sales) OVER (ORDER BY sale_date) AS sales_yesterday,<br>  total_sales - LAG(total_sales) OVER (ORDER BY sale_date) AS diff<br>FROM daily_sales<br>ORDER BY sale_date;</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*UH40u5DTADv2YyNtBM5o-Q.png" /></figure><h3>4. CASE</h3><p>Let’s say you’re working with an orders table that has a column called amount, and you want to label each order as either “small”, “medium”, or “large”, depending on how much was spent.</p><pre>-- Create the orders table<br>CREATE TABLE orders (<br>    order_id SERIAL PRIMARY KEY,<br>    amount NUMERIC<br>);<br><br>-- Insert sample data<br>INSERT INTO orders (amount)<br>VALUES<br>    (25),   -- small<br>    (49.99),-- small<br>    (50),   -- medium<br>    (120),  -- medium<br>    (199.99),-- medium<br>    (200),  -- large<br>    (350);  -- large</pre><p>Think of CASE like an if/else in code. It lets you check conditions row by row and return a value depending on which condition matches.</p><pre>SELECT order_id,<br>       amount,<br>       CASE <br>         WHEN amount &lt; 50 THEN &#39;small&#39;<br>         WHEN amount &lt; 200 THEN &#39;medium&#39;<br>         ELSE &#39;large&#39;<br>       END AS size_category<br>FROM orders;</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/753/1*gvDWhSLrwwJv__Qc7H54oQ.png" /><figcaption><strong>This creates a new column called size_category that classifies each order. Note: The conditions are checked in order, and SQL stops at the first one that matches.</strong></figcaption></figure><p>If you don’t include an ELSE, SQL will just return NULL for rows that don’t meet any condition. So it’s a good habit to always add an ELSE just in case.</p><p>Another Example: Grouping Customers by Region</p><pre>SELECT customer_id, country,<br>       CASE<br>         WHEN country IN (&#39;US&#39;, &#39;Canada&#39;) THEN &#39;North America&#39;<br>         WHEN country = &#39;Mexico&#39; THEN &#39;Central America&#39;<br>         ELSE &#39;Other&#39;<br>       END AS region<br>FROM customers;</pre><p>Now you’ve got a nice region column that tells you where each customer is from, based on their country.</p><p>Remember that:</p><ol><li>Every CASE must end with an END.</li><li>Always list more specific conditions first. If a general one comes first, it can “hide” the others.</li><li>No ELSE means unmatched rows will be NULL.</li></ol><h3>5. Common Table Expressions (CTEs)</h3><p>Imagine you’re working with a sales table that tracks region, sale_date, and amount.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1008/1*_tqysVJpbEyqdAcYoUwFNA.png" /></figure><p>Your boss asks:</p><blockquote>“Can you tell me the average monthly sales per region over the past year?” Assuming today is 09/07/2025</blockquote><p>This might sound simple, but under the hood, it requires multiple steps:</p><ol><li>First, you need to total up the sales per month and region.</li><li>Then, calculate the average of those monthly totals for each region.</li></ol><p>Instead of cramming all that logic into one messy query, we can use a CTE to break it down.</p><p>A CTE (Common Table Expression) is like a named temporary table you define at the top of your query using WITH.</p><p>Think of it like a temporary table that holds results of a subquery we can reuse.</p><pre>-- Step 1: Build a CTE to calculate monthly sales<br>WITH monthly_sales AS (<br>  SELECT<br>    region,<br>    DATE_TRUNC(&#39;month&#39;, sale_date) AS month,<br>    SUM(amount) AS month_total<br>  FROM sales<br>  WHERE sale_date &gt;= (CURRENT_DATE - INTERVAL &#39;1 year&#39;)<br>  GROUP BY region, DATE_TRUNC(&#39;month&#39;, sale_date)<br>)<br><br>-- Step 2: Use that result to calculate the average monthly sales<br>SELECT<br>  region,<br>  AVG(month_total) AS avg_monthly_sales<br>FROM monthly_sales<br>GROUP BY region;</pre><p><em>DATE_TRUNC(‘month’, sale_date): </em>This trims the sale_date down to just the first day of that month.</p><p>For example:</p><ul><li>‘20–03–2025’ becomes ‘01–03–2025’</li><li>‘15–08–2024’ becomes ‘01–08–2024’</li></ul><p>This makes it easier to group everything that happened in the same month.</p><h3>6. Cumulative and Running Totals with Window Aggregates</h3><p>Let’s say we have a table that tracks daily sales in different countries. It looks like this:</p><pre>CREATE TABLE sales (<br>  country TEXT,<br>  sale_date DATE,<br>  sales_amount INTEGER<br>);<br><br>INSERT INTO sales (country, sale_date, sales_amount) VALUES<br>(&#39;US&#39;, &#39;2021-01-01&#39;, 500),<br>(&#39;US&#39;, &#39;2021-01-02&#39;, 300),<br>(&#39;FR&#39;, &#39;2021-01-01&#39;, 200),<br>(&#39;US&#39;, &#39;2021-01-03&#39;, 700),<br>(&#39;FR&#39;, &#39;2021-01-02&#39;, 400);</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/949/1*4J2Cb-bXjzn5YLh1rmeB_Q.png" /></figure><p>We want to answer two questions:</p><ol><li><strong>What’s the running total of all sales day by day?</strong></li></ol><p>For each day, add up all the sales from that day and every day before it.</p><pre>SELECT<br>  sale_date,<br>  SUM(sales_amount) OVER (ORDER BY sale_date) AS running_total<br>FROM sales;</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/857/1*c5MyNsqceIPQjwO5HwZA2g.png" /></figure><p><strong>2. What’s the running total per country, day by day?</strong></p><p>For each country, show the total sales so far for that country, ordered by date.</p><pre>SELECT<br>  country,<br>  sale_date,<br>  SUM(sales_amount) OVER (<br>    PARTITION BY country<br>    ORDER BY sale_date<br>  ) AS country_running_total<br>FROM sales<br>ORDER BY country, sale_date;</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/884/1*GlQGw-TiJgNawKQT4BDpkA.png" /></figure><p><em>PARTITION BY country</em>: This resets the running total for each country. So now, <em>US </em>sales are summed separately from <em>FR </em>sales.</p><h3>7. GROUP BY ROLLUP</h3><p>Rollup tells SQL:“<em>Please group by all combinations of these columns, plus their subtotals and the grand total.</em>”</p><p>Let’s say we have a table that tracks how much was sold each month of each year. Like this:</p><pre>CREATE TABLE sales (<br>  year INTEGER,<br>  month INTEGER,<br>  amount INTEGER<br>);<br><br>INSERT INTO sales (year, month, amount) VALUES<br>(2020, 1, 1000),<br>(2020, 2, 1500),<br>(2021, 1, 1200),<br>(2021, 2, 1800);</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/780/1*pP3EHD8aWMnKG2Kg1M9Zxg.png" /></figure><p>Now we want a sales report that shows:</p><ol><li>Monthly sales</li><li>Yearly sales totals</li><li>A final row showing grand total sales across all years and months</li></ol><pre>SELECT <br>  year, <br>  month, <br>  SUM(amount) AS total_sales<br>FROM sales<br>GROUP BY ROLLUP (year, month)<br>ORDER BY year, month;</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/803/1*qDe5LxaU1QOMZXM-FnrzcA.png" /><figcaption><strong>ROLLUP walks down the hierarchy — first year/month, then just year, then nothing (the grand total: 2500 + 3000).</strong></figcaption></figure><p>What this does:</p><ol><li>Groups by both year and month to get monthly totals</li><li>Adds subtotal rows per year (month = NULL)</li><li>Adds a grand total row (year = NULL and month = NULL)</li></ol><p>Those NULLs aren’t real missing data. They’re markers saying: “This last row is the total of everything”, But seeing NULL in a report can be confusing.</p><p>We can use COALESCE() to fix that:</p><p>COALESCE(value1, value2, …, valueN) returns the first non-NULL value from the list.</p><p>EXAMPLE:</p><pre>SELECT COALESCE(NULL, NULL, &#39;hello&#39;); -- returns &#39;hello&#39;<br>SELECT COALESCE(NULL, 0, 99);         -- returns 0</pre><p>So, if we find a missing year column, change it to the string “ALL YEARS”. Similar logic with month as well!</p><pre>SELECT <br>  COALESCE(CAST(year AS TEXT), &#39;ALL YEARS&#39;) AS year_label,<br>  COALESCE(CAST(month AS TEXT), &#39;TOTAL&#39;) AS month_label,<br>  SUM(amount) AS total_sales<br>FROM sales<br>GROUP BY ROLLUP(year, month)<br>ORDER BY year, month;</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/864/1*sJX5TM9Z3wT1ymBeNT3PlA.png" /></figure><p>Because year is an INTEGER, and ‘ALL YEARS’ is a TEXT (string).<br>And in SQL, all arguments passed to COALESCE() must be of the same data type, we do <em>CAST(year AS TEXT)</em>.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=1abbfd1e1823" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[SQL Survival Guide]]></title>
            <link>https://medium.com/@johnelisaaa/sql-survival-guide-9cdfcb738809?source=rss-d1bc041ef9a3------2</link>
            <guid isPermaLink="false">https://medium.com/p/9cdfcb738809</guid>
            <category><![CDATA[sql]]></category>
            <category><![CDATA[business-analysis]]></category>
            <category><![CDATA[database]]></category>
            <category><![CDATA[data-engineer]]></category>
            <category><![CDATA[developer]]></category>
            <dc:creator><![CDATA[John Elisa]]></dc:creator>
            <pubDate>Thu, 03 Jul 2025 07:07:34 GMT</pubDate>
            <atom:updated>2025-07-03T07:07:34.881Z</atom:updated>
            <content:encoded><![CDATA[<p>Back when I was a student taking the Databases unit at Monash University, I remember coming back from mid-sem break with a quiz looming and just wishing I had a no-nonsense recap of everything I needed to know to ace it.</p><p>Are you in the same boat? Maybe you’re cramming for a uni exam. Or maybe you tweaked your resume a little and now you’re scrambling to survive that SQL online assessment. Or maybe you’re just genuinely curious and want to learn SQL the right way.</p><p>Either way — you’re in the right spot.</p><p>In this guide, we’ll walk through the SQL concepts that actually matter, from the academic basics to how things are used in real companies and production environments. Whether you’re prepping for exams or job interviews, this is your one-stop survival kit. Let’s dive in.</p><h3>1. What is SQL, and Why Use It?</h3><p>SQL (Structured Query Language) is how we talk to databases — specifically, relational databases that store information in rows and columns, kind of like a super-powered spreadsheet. With SQL, you can grab the data you need, add new entries, update stuff, or delete things you no longer want. It also lets you define how different pieces of data relate to each other — like linking a user to all the orders they’ve made.</p><p><strong>If your data is structured and predictable </strong>— like a list of users, products, bookings, messages — <strong>SQL is usually the best tool for the job.</strong> It’s great when you need to filter, sort, join, or analyze data, especially when relationships between data matter.</p><p>When you’re first learning about databases, you’ll probably hear people say things like “MongoDB is better for unstructured data,” or “Firebase is good for real-time apps”, but what does that actually mean? Why wouldn’t you just store everything in SQL and call it a day?</p><p><strong>MongoDB is your go-to when your data keeps changing shape.</strong> Say you’re building a marketplace where sellers customize their profiles differently — one adds social links, another lists store hours, and a third includes multiple locations. SQL would force you to constantly redesign your tables. MongoDB just stores everything as flexible documents, so each seller can have completely different fields without breaking anything.</p><p><strong>Firebase shines when you need live updates.</strong> Think group chats where messages appear instantly, or collaborative docs where you see others typing in real-time. Firebase handles this automatically without any complex setup.</p><blockquote>Each tool has its strengths. It’s less about which is better overall and more about which is right for the job.</blockquote><h3>2. Setting up your Database and Tables</h3><p>You can try out code from this tutorial with an online SQL sandbox: <a href="https://sqlplayground.app/">https://sqlplayground.app/</a></p><p><strong>Imagine you’re asked to build a simple employee directory for a company with multiple departments.</strong></p><h4>2.1 W<strong>e need a table to store our department names</strong>.</h4><blockquote>But how do we guarantee that we can uniquely identify each department, even if two have similar names?</blockquote><p>In SQL, we use something called a <strong>Primary Key. </strong>A primary key is a column in a table whose values uniquely identify each row. It has two simple rules: it can’t be empty (NULL), and it must be unique. Think of it like a Social Security Number or a student ID.</p><pre>CREATE TABLE departments (<br>    dept_id   INTEGER PRIMARY KEY,  -- A unique ID for each department (1, 2, 3, ...)<br>    name      TEXT                  -- Name of department (e.g., &quot;Sales&quot;)<br>);</pre><p>We told SQL we want a new table named departments, Then created a column named dept_id which We specified its data type as INTEGER (Positive only) and designated it as the PRIMARY KEY. We also added a column for the department’s name, which will hold text.</p><h4>2.2 We need a table to store employee data.</h4><p>This table will also need its own primary key (emp_id) to uniquely identify each employee.</p><blockquote>How do we specify which department an employee belongs to?</blockquote><p>We could add a text column and type “Sales” or “Engineering.” But that leads to potential typos and errors. A much better way is to refer to the ID from our departments table. This creates a reliable link.</p><p>This link is called a <strong>Foreign Key.</strong> A foreign key is a column in one table that refers to the primary key of another table. It’s the “bridge” that connects our data. It enforces a rule: you can only put a department ID in the employees table if that ID already exists in the departments table. This ensures you can’t assign an employee to a non-existent department!</p><pre>CREATE TABLE employees (<br>    emp_id    INTEGER PRIMARY KEY, -- A unique ID for each employee<br>    name      TEXT,                -- The employee&#39;s name<br>    salary    INTEGER,             -- Their salary<br>    dept_id   INTEGER,             -- This will link to the departments table<br><br>    -- Now, we define the relationship<br>    FOREIGN KEY (dept_id) REFERENCES departments(dept_id)<br>);<br><br>-- We&#39;re telling SQL that the dept_id column <br>-- in this table is a foreign key that points <br>-- directly to the dept_id column in the departments table.</pre><h4>2.3 Adding data to our tables</h4><p>Let’s populate our departments and employees table.</p><pre>-- Insert some departments<br>INSERT INTO <br>  departments (dept_id, name) <br>VALUES<br>  (1, &#39;Sales&#39;),<br>  (2, &#39;Engineering&#39;),<br>  (3, &#39;HR&#39;),<br>  (4, &#39;Marketing&#39;),<br>  (5, &#39;Legal&#39;);</pre><pre>-- Insert some employees<br>INSERT INTO <br>  employees (emp_id, name, dept_id, salary) <br>VALUES<br>  (1, &#39;Alice&#39;,   2, 80000),<br>  (2, &#39;Bob&#39;,     2, 60000),<br>  (3, &#39;Charlie&#39;, 1, 70000),<br>  (4, &#39;Diana&#39;,   3, 50000),<br>  (5, &#39;Ethan&#39;,   1, 40000),<br>  (6, &#39;Fiona&#39;,   4, 75000);</pre><p>Notice how the dept_id for each employee corresponds to one of the IDs we just created. For example, Alice is in department 2, which is “Engineering”. We now have a small, functional database!</p><p>Notice we didn’t add any employees to the “Legal” department (dept_id 5). This will be useful for examples later on.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1005/1*b2b3jg7oZ8tJ0KqSu8nxZw.png" /></figure><h3>3. Basic Data Retrieval with SELECT</h3><p>The most common thing you’ll do in SQL is ask for data. The command for this is SELECT.</p><h4><strong>3.1 Getting All Data from a Table</strong></h4><p>Let’s start by looking at everything in our employees table. The <strong>*</strong> symbol is a wildcard that means “all columns.”</p><pre>SELECT * FROM employees;</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*qEflIBla02ARSLWHvMwpqw.png" /><figcaption><strong>This gives us a full table view of all our employee records, just as we entered them.</strong></figcaption></figure><h4>3.2 Selecting Specific Columns</h4><pre>SELECT name, salary FROM employees;</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/871/1*y7NbEdITUKX_n5-1VpIDlA.png" /><figcaption><strong>This is much cleaner and gives you just the data you asked for.</strong></figcaption></figure><h4>3.3. Creating Calculations and Renaming Columns with “AS”</h4><p>You can also perform calculations directly in your SELECT statement!</p><p>Let’s say the salary column stores the monthly salary, and you want to see the annual salary. You can do the math right there. But a column header like <em>salary * 12</em> is ugly. We can give it a more readable, temporary name using an alias with the <em>AS</em> keyword.</p><pre>SELECT<br>    name,<br>    salary * 12 AS annual_salary<br>FROM employees;</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/685/1*eOZ96VReT8mwwnO98QnUzA.png" /><figcaption><strong>The output will now have a clean column titled annual_salary.</strong></figcaption></figure><h4>3.4 Finding Unique Values with “<em>DISTINCT”</em></h4><p>What if you wanted to make sure all departments have at least 1 employee? If you just select <em>dept_id</em> from the employees table, you’ll get duplicates (e.g., 2, 2, 1, 3, 1, 4). To see only the unique department IDs, use <em>SELECT DISTINCT</em>.</p><pre>SELECT DISTINCT dept_id FROM employees;</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/331/1*8FzX7I5ltNSGKMU1-ZzZjg.png" /><figcaption><strong>We find out that the “Legal” department (dept_id=5) has no employees!</strong></figcaption></figure><h4>3.5 Filtering Data with “WHERE”</h4><p>Getting all your data is great, but the real power of SQL comes from filtering it to find exactly what you need. This is done with the <em>WHERE </em>clause.</p><p><strong>3.5.1 Basic Filtering</strong></p><p>Let’s find all the employees who work in the “Engineering” department (dept_id = 2).</p><pre>SELECT name, dept_id FROM employees<br>WHERE dept_id = 2;</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/802/1*vBXxr_qybGkaHFGjVNUTTA.png" /><figcaption><strong>Note: SELECT name FROM employees also works, and will only display the names instead</strong></figcaption></figure><p><strong>3.5.2 Combining Filters with AND and OR</strong></p><p>Let’s find employees who are in the Engineering department (dept_id = 2) <strong>AND </strong>earn more than $60,000.</p><pre>SELECT name, salary, dept_id FROM employees<br>WHERE dept_id = 2 AND salary &gt; 60000;</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*3ofigOWI3VrI-z0IAGwSrg.png" /><figcaption><strong>Only Alice meets both criteria.</strong></figcaption></figure><p>Now, let’s find employees who are in the Sales department (dept_id = 1) <strong>OR </strong>whose name starts with ‘D’.</p><pre>SELECT name, dept_id FROM employees<br>WHERE (dept_id = 1) OR (name LIKE &#39;D%&#39;);</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/925/1*uCbR5-4RQFTlxRo5E6swCw.png" /><figcaption><strong>This will give you Charlie and Ethan (from Sales) and Diana (whose name starts with ‘D’). The LIKE ‘D%’ clause matches any name that starts with the letter “D”, where % means “any sequence of characters” after “D”.</strong></figcaption></figure><blockquote><strong>Pro Tip:</strong> When you mix AND and OR, <strong>AND is processed first</strong>. It’s a good habit to use parentheses () to make your logic clear, like WHERE (dept_id = 1 AND salary &gt; 50000) OR dept_id = 3;.</blockquote><p><strong>3.5.3 Other Useful Filtering Tools</strong></p><p><strong>IN:</strong> A shorthand for multiple OR conditions. To find employees in either the Sales (1) or HR (3) departments:</p><pre>SELECT name, dept_id FROM employees<br>WHERE dept_id IN (1, 3);</pre><p><strong>BETWEEN: </strong>For checking a range (inclusive). To find employees earning between $50,000 and $80,000:</p><pre>SELECT name, salary FROM employees<br>WHERE salary BETWEEN 50000 AND 80000;</pre><p><strong>LIKE:</strong> For finding patterns in text. The % wildcard matches any number of characters.</p><ul><li>Find names starting with ‘A’: <em>WHERE name LIKE ‘A%’</em> (matches Alice).</li><li>Find names ending in ‘a’: <em>WHERE name LIKE ‘%a’</em> (matches Diana and Fiona).</li></ul><h3>4. Sorting and Limiting Your Results</h3><p>By default, your database doesn’t guarantee the order of the results. To get a sorted list, you need to be explicit using <em>ORDER BY.</em></p><ul><li><strong><em>ASC</em></strong>: Ascending order (A-Z, lowest to highest). This is the default.</li><li><strong><em>DESC</em></strong>: Descending order (Z-A, highest to lowest).</li></ul><p>Let’s get a list of employees sorted by their salary, from highest to lowest.</p><pre>SELECT name, salary FROM employees<br>ORDER BY salary DESC;</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/892/1*5FGYPawMdJuYTRRlP5RPcw.png" /></figure><p>What if you only want the top 3 highest-paid employees? You can combine ORDER BY with LIMIT.</p><pre>SELECT name, salary FROM employees<br>ORDER BY salary DESC<br>LIMIT 3;</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/909/1*h_6ijvMUU0-qGC8IUs2kCg.png" /></figure><h3>5. Summarizing Your Data (and Grouping)</h3><p>Before diving into multiple tables, let’s look at another powerful aspect: aggregate functions and grouping. The most common are:</p><ul><li><strong>COUNT(): </strong>Counts the number of rows.</li><li><strong>SUM(): </strong>Adds up the values in a column.</li><li><strong>AVG():</strong> Calculates the average.</li><li><strong>MIN() / MAX(): </strong>Find the minimum or maximum value.</li></ul><p><strong>5.1 Simple Summaries</strong></p><p>How many employees do we have in total?</p><pre>SELECT COUNT(*) as num_of_employees <br>FROM employees</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/825/1*jvayPk3K_qRkk3PAJfR_AQ.png" /></figure><p>What’s the average salary across the entire company?</p><pre>SELECT AVG(salary) as avg_salary <br>from employees</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/822/1*sdjGnHsLekRUutAyQWkBBg.png" /></figure><h4>5.2 Summarizing by Category with GROUP BY</h4><p>This is where things get really interesting. What if you want to know the average salary per department? This is what GROUP BY is for. It bundles rows together based on a column and then runs the aggregate function on each bundle.</p><pre>SELECT<br>    dept_id,<br>    AVG(salary) AS average_salary,<br>    COUNT(*) AS number_of_employees<br>FROM employees<br>GROUP BY dept_id;</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/773/1*TQNu61Nriyq8ll4I_SeF7A.png" /></figure><p>This query tells SQL:</p><ul><li>First, group the employees by their dept_id. (SQL splits the table into smaller “mini-tables”, one for each unique dept_id.)</li><li>Then, for each of those groups, calculate the average salary and count the number of employees.</li></ul><h4><strong>5.3 Filtering Groups with HAVING</strong></h4><p>The <em>WHERE </em>clause filters rows before they are grouped. But what if you want to filter the groups themselves? For example, “Show me only the departments with more than 1 employee.”</p><p>For this, you use the <em>HAVING </em>clause, which works on the results of your aggregations.</p><pre>SELECT<br>    dept_id,<br>    COUNT(*) AS num_employees<br>FROM employees<br>GROUP BY dept_id<br>HAVING COUNT(*) &gt; 1;</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/646/1*agsWVD_pvT7euCHQklTh-Q.png" /><figcaption><strong><em>WHERE </em>filters individual rows BEFORE grouping</strong>, while <strong><em>HAVING</em> filters entire groups AFTER GROUP BY</strong> is done.</figcaption></figure><h3>6. Bringing Our Tables Together with “<em>JOIN”</em></h3><p>So far, we’ve worked with our employees and departments tables separately. But the real power of a relational database comes from connecting them.</p><p>Let’s say you want a list of all employees and the name of the department they work in. The employee’s name is in the employees table, but the department’s name is in the departments table. How do we pull information from both at the same time?</p><h4><strong>6.1 The Most Common Join: INNER JOIN</strong></h4><p>An <em>INNER JOIN</em> looks for rows where the key (our dept_id) exists in both tables. It finds the perfect matches and returns a combined row.</p><pre>SELECT<br>    e.name AS employee_name,<br>    d.name AS department_name<br>FROM<br>    employees AS e<br>INNER JOIN<br>    departments AS d ON e.dept_id = d.dept_id;</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/639/1*PpmLTlZ7EBvkzZIqVXmt1w.png" /></figure><p><strong><em>FROM employees AS e INNER JOIN departments AS d</em>: </strong>We’re telling SQL we want to join employees and departments. We also give them short nicknames (aliases) e and d to make our query easier to read and write.</p><p><strong><em>ON e.dept_id = d.dept_id</em>:</strong> This is the crucial part. It’s the rule for the join. It says, “Match a row from the employees table with a row from the departments table wherever their dept_id values are the same.</p><h4><strong>6.2 Seeing Everything with LEFT JOIN</strong></h4><p>But what if you wanted to see a list of all departments, and just show which employees are in them, if any? You might want to do this to find departments that are empty. For this, we use a LEFT JOIN.</p><p>A <em>LEFT JOIN</em> says: “Give me every single row from the left table (the one mentioned first), and then bring in any matching rows from the right table. If there’s no match, just show NULL.” While previously, INNER JOIN <strong>only </strong>shows rows where there is a match in both tables.</p><pre>SELECT<br>    d.name AS department_name,<br>    e.name AS employee_name<br>FROM<br>    departments AS d<br>LEFT JOIN<br>    employees AS e ON d.dept_id = e.dept_id;</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/812/1*MYluMpL8-BaxIhSywsP72A.png" /><figcaption>The NULL value instantly tells us that the Legal department has no matching employees.</figcaption></figure><h4><strong>6.3 Other Joins</strong></h4><p>There are also RIGHT JOIN (the opposite of LEFT JOIN) and FULL OUTER JOIN (which shows all rows from both tables, matching where possible and using NULL for any non-matches). However, INNER and LEFT joins will handle over 95% of your needs.</p><h3>7. Modifying Your Data</h3><p>Databases are living things. People get raises, new employees are hired, and sometimes people leave. You need commands to manage these changes.</p><h4>7.1 Changing Existing Data: UPDATE</h4><p>Let’s say Alice got a well-deserved raise, and her new salary is $90,000. We use the UPDATE command to change her existing record.</p><pre>UPDATE employees<br>SET salary = 90000<br>WHERE emp_id = 1; -- Using the unique ID is safest!</pre><p>This is the most important rule of updating: Always use a WHERE clause! If you forget it, the command UPDATE employees SET salary = 90000; would give every single employee a salary of $90,000.</p><h4>7.2 Removing Data: DELETE</h4><p>Now, imagine Bob has decided to leave the company. We need to remove his record from the database using the DELETE command.</p><pre>DELETE FROM employees<br>WHERE emp_id = 2;</pre><h3>8. Ensuring Safety with Transactions</h3><p>Imagine a banking app. When you transfer money, two things must happen:</p><ol><li>Money is subtracted from your account.</li><li>Money is added to the other person’s account.</li></ol><p>What if the system crashes after step 1 but before step 2? The money vanishes! To solve this, databases use transactions.</p><p>A transaction is a wrapper around a sequence of SQL commands that treats them as a single, all-or-nothing operation.</p><ol><li>It starts with BEGIN or START TRANSACTION.</li><li>You run your UPDATE, INSERT, or DELETE commands.</li><li>If everything is successful, you COMMIT the changes, making them permanent.</li><li>If something goes wrong, or you change your mind, you ROLLBACK the changes, and the database returns to the state it was in before you started.</li></ol><p>This is guaranteed by a set of properties known as ACID (Atomicity, Consistency, Isolation, Durability), which ensure your data remains reliable and consistent.</p><p>Let’s safely give all Engineering employees a 10% raise!</p><pre>-- Step 1: Start the transaction<br>BEGIN;<br><br>-- Step 2: Apply the raise (but don’t commit yet!)<br>UPDATE employees<br>SET salary = salary * 1.10<br>WHERE dept_id = 2;<br><br>-- Step 3: Safety check — inspect the updated rows<br>SELECT name, salary<br>FROM employees<br>WHERE dept_id = 2;</pre><p>Our data originally looked like this:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/296/1*DhSMUw4s1uEw3ntF2kF8KQ.png" /></figure><p>After running the above code, it looks like this:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/609/1*vOhLYZw24wWgRBC4tW6aVw.png" /><figcaption>Manually check if these are the updated values we want! It looks correct here!</figcaption></figure><pre>-- (if something looks wrong): Undo it<br>-- ROLLBACK;<br><br>-- (if everything looks good): Save it<br>COMMIT;</pre><p>Without <em>BEGIN</em>, Every <em>UPDATE</em>, <em>INSERT</em>, or <em>DELETE </em>is immediately saved — you can’t undo anything unless you manually reverse it.</p><p>With BEGIN…COMMIT or ROLLBACK:</p><ol><li>You can test and inspect changes</li><li>Only COMMIT makes them permanent</li><li>ROLLBACK erases them like they never happened</li></ol><h3>9. Evolving Your Database Structure</h3><p>What happens when your needs change? Maybe you decide you need to store each employee’s email address. You don’t have to start over; you can modify your database’s structure (its schema) on the fly.</p><h4>9.1 ALTER TABLE</h4><p>This command lets you modify an existing table. Let’s add that email column:</p><pre>ALTER TABLE employees<br>ADD email VARCHAR(100); -- A string up to 100 characters</pre><p>Now, the employees table has a new email column, which will be NULL for all existing employees until we UPDATE them.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/455/1*h1NTJy-sohMEayk95nRkUw.png" /></figure><h4>9.2 DROP TABLE</h4><p>If you no longer need a table, you can delete it completely. This action is permanent and deletes both the table structure and all the data inside it, so use it with extreme caution!</p><pre>-- example usage<br>DROP TABLE departments;</pre><h3>10. Creating Shortcuts with Views</h3><p>Imagine you frequently need to run a complex JOIN query to see high-earning employees and their department names. Typing it out every time is tedious and prone to error.</p><p>A View is a “virtual table” that is based on the result of a query. You can save a complex query as a View and then interact with it as if it were a simple table.</p><p>Let’s create a view for our high-earners.</p><pre>CREATE VIEW high_earners_view AS<br>SELECT<br>    e.name,<br>    e.salary,<br>    d.name AS department_name<br>FROM<br>    employees AS e<br>INNER JOIN<br>    departments AS d ON e.dept_id = d.dept_id<br>WHERE<br>    e.salary &gt; 60000;</pre><p>Now, instead of re-typing that whole query, you can simply do this:</p><pre>SELECT * FROM high_earners_view;</pre><p>And there you have it — the SQL survival guide I wish I had during uni. Thanks for reading and coding along with me!</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=9cdfcb738809" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Building a Full Stack React App: Integrating AWS Cognito + Lambda + RDS(MySQL) + Amplify + API…]]></title>
            <link>https://medium.com/@johnelisaaa/building-a-full-stack-react-app-integrating-aws-cognito-lambda-rds-mysql-amplify-api-a4f616559acb?source=rss-d1bc041ef9a3------2</link>
            <guid isPermaLink="false">https://medium.com/p/a4f616559acb</guid>
            <category><![CDATA[aws]]></category>
            <category><![CDATA[tutorial]]></category>
            <category><![CDATA[amazon]]></category>
            <category><![CDATA[fullstack-development]]></category>
            <category><![CDATA[programming]]></category>
            <dc:creator><![CDATA[John Elisa]]></dc:creator>
            <pubDate>Thu, 16 Jan 2025 03:57:48 GMT</pubDate>
            <atom:updated>2025-01-16T03:57:48.793Z</atom:updated>
            <content:encoded><![CDATA[<h3>Building a Full Stack React App: Integrating AWS Cognito + Lambda + RDS(MySQL) + Amplify + API Gateway</h3><p>In this tutorial, we’ll build a secure REST API for our e-commerce website by integrating AWS Cognito, AWS Lambda, and MySQL (RDS).</p><h3>User Authentication</h3><p>We start with an already implemented sign-in button in our navigation bar that leverages Cognito to handle user authentication, if you haven’t implemented one yet, here is a 6 minute tutorial of mine that should get you up to speed:</p><p><a href="https://medium.com/@johnelisaaa/setting-up-amazon-cognito-for-your-react-app-787de7999c07">Setting Up Amazon Cognito for your React App</a></p><p><strong>We won’t be implementing the frontend as much and will mostly focus on the backend for the purpose of this tutorial.</strong></p><p>Final Result:</p><iframe src="https://cdn.embedly.com/widgets/media.html?url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DRJZTQkexW4o&amp;type=text%2Fhtml&amp;schema=youtube&amp;display_name=YouTube&amp;src=https%3A%2F%2Fwww.youtube.com%2Fembed%2FRJZTQkexW4o%3Ffeature%3Doembed" width="854" height="480" frameborder="0" scrolling="no"><a href="https://medium.com/media/c05a23a59761df474c83bc75d04546dd/href">https://medium.com/media/c05a23a59761df474c83bc75d04546dd/href</a></iframe><h3>How it’s going to work:</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/761/1*C1q6c3hlUzR5X-lqAw82vA.png" /></figure><ol><li><strong>User Authentication with AWS Cognito</strong>: The React-based front end uses Cognito for sign-in/sign-out processes. When a user clicks “Sign In,” Cognito manages the authentication flow, securely storing user tokens and profile information.</li><li><strong>Serverless Backend with Lambda</strong>: After a successful sign-in, the application calls various REST endpoints to interact with user data. AWS Lambda functions act as the backend logic, triggered by API Gateway when endpoints like /syncUser, /profile, or /products are called. These Lambda functions use the mysql2/promise library to connect to a MySQL database on RDS</li><li><strong>Data Persistence with MySQL (RDS)</strong>: User details — such as email, name, userID — are stored in a MySQL database, connected using MySQL workbench. The Lambda functions handle the CRUD operations for products and purchases.</li><li><strong>Pages — Products, Sold Items, My Purchases, My profile:<br></strong>Authenticated users can access these pages. This page interacts with the backend through functions like fetchUserProfile and updateUserProfile, fetchProducts, updating the MySQL database via Lambda.</li><li><strong>Conditional UI Rendering</strong>:<br>In the navigation bar, features like the “Sell” button (represented by a ‘+’ symbol) or profile dropdown only appear for authenticated users. This means that only signed-in users can list items for sale or access their account details.</li></ol><h3>Creating the REST API (Lambda + API Gateway)</h3><p>Imagine you want to create a small service on the internet, like a button people can click to get a greeting message (e.g., “Hello, world!”).</p><ul><li>A <strong>REST API </strong>is just a way to connect that button to the code that makes the greeting.</li><li><strong>AWS Lambda</strong> is a tool that runs your code for you on the internet. You write a small piece of code that says, “When someone clicks the button, show them ‘Hello, world!’”, Lambda runs this code without you needing to set up a whole computer or server</li><li><strong>AWS API Gateway</strong> is like the doorman to your Lambda. It listens for clicks on the button (or requests from anyone on the internet) and sends them to your Lambda code. Once Lambda finishes running, API Gateway sends the result back to whoever clicked the button.</li></ul><h4>How do you connect these?</h4><ol><li>You write the “Hello, world!” code in Lambda.</li><li>You use API Gateway to create a URL (like a website link) that triggers your Lambda when visited.</li><li>Now, anyone who clicks the link gets your greeting!</li></ol><p>This is the same backend architecture we’re going to use for our app!</p><h3>Initializing Database (RDS MySQL):</h3><p>Proceed to your AWS Console, search “RDS”, and create a database. Click “Easy create”, choose your engine type and the free tier, then create your database.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*H0Dq8CrkAI1_PsKgZ2H1Lw.png" /></figure><p>Go to your created database, click “modify” and make the database publicly accessible, and apply the changes immediately.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/765/1*9_sFS0Q_nSERSx3-x-8ekQ.png" /></figure><p>You also want to click on the VPC your database is on, and add these inbound rules. This will allow us to connect to MySQL from our device.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*PZjn1buJHgnOv2SdmafMeA.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*A5B7gSURDX2EzNq-qVoegA.png" /></figure><p>Now, go to your database instance and take note of your endpoint and port.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*m5LvQL9584Ho1RJgX_mWwA.png" /></figure><p>Paste your endpoint into the hostname field, and your port in the port field. Fill in your username and password you created earlier, and the default schema should be the name of your database you’re going to make.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*hdgTXuW3T3dAtJUvsKOmCg.png" /><figcaption>MySQL Workbench</figcaption></figure><p>Great! Let’s create the database and its tables:</p><pre>-- 1. Create the database (schema) <br>CREATE DATABASE chemazon;<br><br>-- 2. Switch to the database<br>USE chemazon;<br><br>-- 3. Users table<br>CREATE TABLE IF NOT EXISTS users (<br>  id INT AUTO_INCREMENT PRIMARY KEY,<br>  cognito_sub VARCHAR(255) NOT NULL,<br>  email       VARCHAR(255) NOT NULL,<br>  name        VARCHAR(255),<br>  created_at  TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,<br>  updated_at  TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP <br>                 ON UPDATE CURRENT_TIMESTAMP<br>) <br><br>-- 4. Products table<br>CREATE TABLE IF NOT EXISTS products (<br>  id          INT AUTO_INCREMENT PRIMARY KEY,<br>  seller_id   INT NOT NULL,<br>  title       VARCHAR(255) NOT NULL,<br>  description TEXT,<br>  image_url   VARCHAR(255),<br>  price       DECIMAL(10,2),<br>  discount    DECIMAL(3,2),<br>  stock       INT,<br>  rating      DECIMAL(3,2),<br>  status      VARCHAR(20),<br>  created_at  TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,<br>  updated_at  TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP <br>                 ON UPDATE CURRENT_TIMESTAMP<br>) <br><br>-- 5. Purchases table<br>CREATE TABLE IF NOT EXISTS purchases (<br>  id           BIGINT AUTO_INCREMENT PRIMARY KEY,<br>  product_id   INT NOT NULL,<br>  buyer_id     INT NOT NULL,<br>  purchased_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,<br>  CONSTRAINT fk_product<br>    FOREIGN KEY (product_id) REFERENCES products(id)<br>      ON DELETE CASCADE<br>      ON UPDATE CASCADE,<br>  CONSTRAINT fk_buyer<br>    FOREIGN KEY (buyer_id) REFERENCES users(id)<br>      ON DELETE CASCADE<br>      ON UPDATE CASCADE<br>) </pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/491/1*vx7kcK5BRcC_6sXnoGeqtw.png" /></figure><h3>API Gateway + Services:</h3><ol><li>Sign in to <a href="https://aws.amazon.com/console/">https://aws.amazon.com/console/</a> if you haven’t already and search up “API Gateway”.</li><li>Press “Create API”, scroll down to “REST API” and click “build”. Go ahead and name your API (e.g. chemazonAPI), and confirm creation.</li></ol><p>We’re going to have these methods:</p><ul><li><strong>syncUserInfo(userInfo) :</strong> This method sends user information to the server (API Gateway + Lambda) to sync or store it. It uses an HTTP POST request to send the data and ensures it’s in a JSON format.</li><li><strong>fetchUserProfile(userSub) : </strong>This method retrieves a user’s profile from the server based on their unique identifier (userSub). It sends an HTTP GET request with the user ID as a query parameter to get the data.</li><li><strong>updateUserProfile(userSub, profileData)</strong> : This method updates a user’s profile on the server with new information. It uses an HTTP PUT request, sending both the user ID and updated profile data in JSON format.</li><li><strong>createProduct(productData)</strong> : Sends a POST request to POST /products with the new product’s details in the request body. Creates a new product listing in the database.</li><li><strong>fetchProducts():</strong> Sends a GET request to GET /products, optionally with query parameters (e.g., filter, page, etc.). Retrieves a list of products from the server.</li><li><strong>buyProduct(productId, userSub)</strong>: Sends a POST request to POST /buyProduct with productId and userSub in the request body. Allows a user to purchase a specific product (e.g., creating a record in the “purchases” table).</li><li><strong>fetchPurchases(userSub)</strong>: Sends a GET request to GET /purchases?userSub={userSub}. Retrieves all the purchase records for a given user from the server.</li><li><strong>fetchUserProducts(sellerSub):</strong> Sends a GET request to GET /products?seller_sub={sellerSub}. Fetches products that belong to a specific seller (identified by their Cognito sub).</li></ul><h3>Linking API Gateway to Lambda</h3><p>First, go to your database instance and set up a lambda connection.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*LWO6NFTTam8Eip4elENnMQ.png" /></figure><p>I will provide an example of how the function “SyncUserFunction” api service works; The rest of the functions are available on my <a href="https://github.com/johnbobelisa/chemazon">github</a> if you would like to see its implementations.</p><p>So, create a new function “SyncUserFunction”, and choose your existing database proxy.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*4S-AgTccg0y_JvuohPFTjA.png" /></figure><p>Then, proceed to our previously created “chemazonAPI”, click create resource, and type in “syncUser” as the resource name. Then, choose POST as the method type, tick the Lambda proxy integration, select the syncUserFunction for the lambda function field and create method.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*QEUrHVloV_xLgw4E9-xweQ.png" /></figure><p>Now, ensure that your lambda function has the below outbound rules:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*lPsX8EXKg8jCcSxW8s8zqQ.png" /></figure><p>Finally, let’s add the logic to our lambda function.</p><p>Proceed to the “syncUserFunction” code section, and replace the index.mjs code with this:</p><pre>import mysql from &#39;mysql2/promise&#39;;<br><br>export const handler = async (event) =&gt; {<br>  // Define CORS headers to include in all responses<br>  const corsHeaders = {<br>    &quot;Access-Control-Allow-Origin&quot;: &quot;*&quot;,<br>    &quot;Access-Control-Allow-Methods&quot;: &quot;OPTIONS,POST,GET,PUT&quot;,<br>    &quot;Access-Control-Allow-Headers&quot;: &quot;Content-Type&quot;<br>  };<br><br>  let connection;<br><br>  try {<br>    console.log(&quot;Received event:&quot;, JSON.stringify(event));<br><br>    // Handle request body parsing<br>    let body;<br>    if (typeof event.body === &#39;string&#39;) {<br>      try {<br>        body = JSON.parse(event.body);<br>      } catch (parseError) {<br>        throw new Error(&quot;Failed to parse JSON body: &quot; + parseError.message);<br>      }<br>    } else if (typeof event.body === &#39;object&#39; &amp;&amp; event.body !== null) {<br>      body = event.body;<br>    } else {<br>      throw new Error(&quot;Request body is not valid JSON.&quot;);<br>    }<br><br>    const { sub, email, name } = body;<br>    if (!sub || !email || !name) {<br>      throw new Error(&quot;Missing required user fields (sub, email, name).&quot;);<br>    }<br><br>    // Create a MySQL connection using mysql2/promise<br>    connection = await mysql.createConnection({<br>      host: process.env.DB_HOST,<br>      port: process.env.DB_PORT || 3306,  // default MySQL port<br>      user: process.env.DB_USER,<br>      password: process.env.DB_PASS,<br>      database: process.env.DB_NAME,<br>      connectTimeout: 5000  // 5-second timeout<br>    });<br><br>    // Check if user already exists<br>    const [rows] = await connection.execute(<br>      &#39;SELECT id FROM users WHERE cognito_sub = ?&#39;,<br>      [sub]<br>    );<br><br>    // Insert new user if not found<br>    if (!rows.length) {<br>      await connection.execute(<br>        `INSERT INTO users (cognito_sub, email, name, created_at, updated_at) <br>         VALUES (?, ?, ?, NOW(), NOW())`,<br>        [sub, email, name]<br>      );<br>    }<br><br>    return {<br>      statusCode: 200,<br>      headers: corsHeaders,<br>      body: JSON.stringify({ message: &#39;User synced successfully.&#39; }),<br>    };<br>  } catch (error) {<br>    console.error(&#39;Error syncing user:&#39;, error);<br>    return {<br>      statusCode: 500,<br>      headers: corsHeaders,<br>      body: JSON.stringify({ error: &#39;Error syncing user.&#39;, details: error.message }),<br>    };<br>  } finally {<br>    if (connection) {<br>      try {<br>        await connection.end();<br>      } catch (endError) {<br>        console.error(&#39;Error ending connection:&#39;, endError);<br>      }<br>    }<br>  }<br>};</pre><p>This AWS Lambda function synchronizes Cognito user details with a MySQL database by checking if the user exists and inserting them if not. It handles database connections, parses the request, and ensures proper CORS headers and error handling.</p><p>Now, you have a working REST API!</p><h3>Automated Build and Deployment for Web Hosting</h3><p>Now, all you need to deploy your app for other people to see!</p><p>Here is my app: <a href="https://main.d1ktvyh6vc45ny.amplifyapp.com/">https://main.d1ktvyh6vc45ny.amplifyapp.com/</a></p><p>All you have to do is first, push your project onto github.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*E4ld3zA_c9_EgPIPc3kjig.png" /></figure><p>Finally, create a new amplify app on aws amplify:</p><p>Select your repository, branch, and now every time you push onto github, these changes will automatically build and be deployed to your amplify domain.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*i7yUXfG9GAA-_wZzHUTS4g.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*oIhFEBZtx8InQf_KibR6Ig.png" /></figure><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=a4f616559acb" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Setup Web Hosting for your React App with AWS Amplify + S3 + Cognito]]></title>
            <link>https://medium.com/@johnelisaaa/setup-web-hosting-for-your-react-app-with-aws-amplify-s3-cognito-b726ecb13ec2?source=rss-d1bc041ef9a3------2</link>
            <guid isPermaLink="false">https://medium.com/p/b726ecb13ec2</guid>
            <category><![CDATA[aws]]></category>
            <category><![CDATA[cognito]]></category>
            <category><![CDATA[s3-bucket]]></category>
            <category><![CDATA[amplify]]></category>
            <category><![CDATA[web-hosting]]></category>
            <dc:creator><![CDATA[John Elisa]]></dc:creator>
            <pubDate>Tue, 31 Dec 2024 05:26:39 GMT</pubDate>
            <atom:updated>2024-12-31T05:26:39.843Z</atom:updated>
            <content:encoded><![CDATA[<p>This tutorial will teach you how to deploy your react app online using AWS amplify + s3. If you have setup cognito on your react app, there is a section at the end of this tutorial on how to configure it as well with amplify + s3.</p><p>If you haven’t added cognito (user authentication, e.g. signIn/Out) to your app and you’re interested in adding cognito, read this quick 6 minute tutorial: <a href="https://medium.com/@johnelisaaa/setting-up-amazon-cognito-for-your-react-app-787de7999c07">https://medium.com/@johnelisaaa/setting-up-amazon-cognito-for-your-react-app-787de7999c07</a>.</p><h4>1. Run “npm run build” on your react project</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/685/1*upROpAJjjM8ZsajVY1aJ1w.png" /></figure><h4>2. Proceed to the AWS Console:</h4><p><strong>2.1: Creating an S3 bucket (storage for all your frontend assets):</strong></p><ul><li>Click “Create Bucket”</li><li>Add a name to your bucket, keep the default settings for the rest of the fields, and click “Create Bucket”.</li><li>Then, click your created bucket, press “Upload”, and upload your build/ folder. If you’re using vite + react, this will be a dist/ folder instead.</li><li>Go to your bucket’s properties, scroll down to “Static website hosting”, enable static website hosting and add your index document in there (index.html).</li><li>After that, click the “create amplify app” button.</li></ul><p><strong>2.2: Creating Amplify App + Linking to S3 bucket:</strong></p><ul><li>After clicking the “create amplify app” button, choose the “Amazon S3” method, find and choose your previously created S3 bucket, and click “Save and Deploy”.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*T8JZ6mAQ6cDPl2HtUp4k7A.png" /></figure><p>It should now be deploying your frontend to the given domain appropriately.</p><h4>3. Linking Cognito to new deployed URL (OPTIONAL):</h4><p><strong>3.1: Changing code configuration:</strong></p><pre><br>const cognitoAuthConfig = {<br>  authority: &quot;&lt;cognitodomain&gt;&quot;,<br>  client_id: &quot;&lt;clientID&gt;&quot;,<br>  // Change redirect_uri from http//localhost:xxxx --&gt; amplify deployed URL<br>  redirect_uri: &quot;https://staging.xxxxxxxxx.amplifyapp.com/&quot;,<br>  response_type: &quot;code&quot;,<br>  scope: &quot;email openid profile&quot;,<br>  onSigninCallback: (_user) =&gt; {<br>    window.history.replaceState({}, document.title, window.location.pathname);<br>  },<br>};</pre><pre>const signOut = async () =&gt; {<br>    await auth.removeUser();<br>    <br>    const clientId = &quot;&lt;clientID&gt;&quot;;<br>// Change logoutURi from http//localhost:xxxx --&gt; amplify deployed URL<br>    const logoutUri = &quot;https://staging.xxxxxxxxx.amplifyapp.com/&quot;;<br>    const cognitoDomain = &quot;&lt;domain&gt;&quot;;<br>    window.location.href = `${cognitoDomain}/logout?client_id=${clientId}&amp;logout_uri=${encodeURIComponent(logoutUri)}`;<br>  };</pre><p><strong>3.2: Adding Authorized URLs to Cognito and External Providers:</strong></p><ul><li>Begin by going to AWS Console &gt; Cognito &gt; Your project’s user pool &gt; App Clients &gt; Your app client &gt; Login Pages &gt; click “Edit” for the Managed login pages configuration section.</li><li>Add your amplify deployed URL to both the callback URL and Sign Out URLs.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*btcO_zTH_fM5ropHkMjI_g.png" /></figure><ul><li>Next, if you have an external provider like Google, go to Google Cloud Platform console &gt; Your project &gt; APIs &amp; Services &gt; Credentials &gt; Add your deployed amplify URL to the authorized redirect URIs.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/867/1*NypcEJsf7XRwyu3Z_mgLDg.png" /></figure><h3>Final Result:</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*FYE-3BLNNOTBm6c7FKOQQA.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*9ZNWLScRJhciK9tQBj--BA.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*OtOdKSCYEw60PVR3E4A07g.png" /></figure><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=b726ecb13ec2" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Setting Up Amazon Cognito for your React App]]></title>
            <link>https://medium.com/@johnelisaaa/setting-up-amazon-cognito-for-your-react-app-787de7999c07?source=rss-d1bc041ef9a3------2</link>
            <guid isPermaLink="false">https://medium.com/p/787de7999c07</guid>
            <category><![CDATA[amazon-cognito]]></category>
            <category><![CDATA[reactjs]]></category>
            <category><![CDATA[user-authentication]]></category>
            <category><![CDATA[aws]]></category>
            <category><![CDATA[vitejs]]></category>
            <dc:creator><![CDATA[John Elisa]]></dc:creator>
            <pubDate>Mon, 30 Dec 2024 14:26:43 GMT</pubDate>
            <atom:updated>2024-12-31T04:40:13.199Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*qZ09x2978aT0l3bJ-3WlZQ.jpeg" /></figure><h3>Set Up Amazon Cognito for your React App in 6 minutes</h3><p>I have a react project setup initialized with vite; The react app is basically a simple replica of amazon. I have a navigation bar here that has a “Sign In” button.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*yYRV-cusPLE8nwgV" /></figure><blockquote><strong>The goal of this tutorial:<br></strong>- Implement the functionality of user registration/login using cognito.<br>- Add a sign in method using an external identity provider (Google). <br>- User registration and logging in through username/password. <br>- Once a user is logged in, the “Sign in” button in the navbar changes to a “Sign Out” button</blockquote><p>If you have absolutely no idea what Amazon Cognito is, watching this video by “be a better dev” will help lots, but you don’t need to watch it to proceed with this tutorial.</p><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2FQEGo6ZoN-ao%3Fstart%3D386%26feature%3Doembed%26start%3D386&amp;display_name=YouTube&amp;url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DQEGo6ZoN-ao&amp;image=https%3A%2F%2Fi.ytimg.com%2Fvi%2FQEGo6ZoN-ao%2Fhqdefault.jpg&amp;type=text%2Fhtml&amp;schema=youtube" width="854" height="480" frameborder="0" scrolling="no"><a href="https://medium.com/media/f93c6789781e6e617a630374803448c5/href">https://medium.com/media/f93c6789781e6e617a630374803448c5/href</a></iframe><h3><strong>Part 1: Setup (AWS)</strong></h3><ol><li>Go to <a href="https://aws.amazon.com/">https://aws.amazon.com/</a>. Create an AWS account and Sign in to the console.</li><li>Search “Cognito”. Click “Get started for free in less than five minutes” under the “Add sign-in and sign-up experiences to your app” section.</li><li>Define your application and click “Create”. We’ll skip the return URL section for now.</li></ol><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*gVrd9Iq1vYNkqHF0FAhzRg.png" /></figure><h3>Part 2: Setup (Google Sign-in)</h3><ol><li>Go to <a href="https://console.cloud.google.com/">https://console.cloud.google.com/</a> &gt; create a new project &gt; APIs &amp; Services &gt; Credentials &gt; Create Credentials &gt; OAuth client ID. Fill In the name and application type, and click “Create”.</li></ol><figure><img alt="" src="https://cdn-images-1.medium.com/max/897/1*VgsHa1EQrTJvT8TivLsA8w.png" /><figcaption><strong>IMPORTANT: Add your localhost/web URL and “&lt;cognito-domain&gt;/oauth2/idpresponse” to the authorized redirect URIs section. After creation, you will be given a ‘Client ID’ and ‘Client Secret’.</strong></figcaption></figure><p>Next, go back to the AWS console &gt; proceed to your just created cognito application &gt; user pool &gt; social and external providers &gt; add identity provider &gt; Google.</p><p>Copy your <strong>Client ID</strong> and <strong>Client Secret </strong>obtained earlier and paste it to the appropriate fields. In the “Authorized scopes” section, add “email profile openid”</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Yc36vmGr2IXyYDwq29QlgQ.png" /></figure><p>You’ve successfully added a “Sign in with Google” feature.</p><h3>Part 3: Code Implementation</h3><p><strong>Here was my initial code:</strong></p><pre>// main.jsx<br>import { StrictMode } from &#39;react&#39;<br>import { createRoot } from &#39;react-dom/client&#39;<br>import &#39;./index.css&#39;<br>import App from &#39;./App.jsx&#39;<br><br>createRoot(document.getElementById(&#39;root&#39;)).render(<br>  &lt;StrictMode&gt;<br>    &lt;App /&gt;<br>  &lt;/StrictMode&gt;,<br>)</pre><pre>// App.jsx<br>import { BrowserRouter as Router, Routes, Route } from &#39;react-router-dom&#39;;<br>import Navbar from &quot;./components/Navbar/Navbar&quot;;<br><br>function App() {<br>  return (<br>    &lt;Navbar /&gt;<br>  );<br>}<br><br>export default App;</pre><pre>// Navbar.jsx<br>import React, { useState } from &#39;react&#39;<br>import &#39;./navbar.css&#39;<br>import amazonLogo from &#39;../../assets/amazon_logo.png&#39;<br><br>const Navbar = () =&gt; {<br>  const [isResponsive, setIsResponsive] = useState(false)<br><br>  const toggleResponsiveMenu = () =&gt; {<br>    setIsResponsive(!isResponsive)<br>  }<br><br>  return (<br>    &lt;nav className=&quot;navbar&quot;&gt;<br>      &lt;div className=&quot;navbar-logo&quot;&gt;<br>        &lt;img src={amazonLogo} alt=&quot;Company Logo&quot; /&gt;<br>      &lt;/div&gt;<br><br>      &lt;div className=&quot;search-bar&quot;&gt;<br>        &lt;input type=&quot;text&quot; placeholder=&quot;Search for products...&quot; /&gt;<br>        &lt;button&gt;Search&lt;/button&gt;<br>      &lt;/div&gt;<br><br>      &lt;ul className={`navbar-links ${isResponsive ? &#39;active&#39; : &#39;&#39;}`}&gt;<br>        &lt;li&gt;&lt;a href=&quot;#&quot;&gt;Products&lt;/a&gt;&lt;/li&gt;<br>        &lt;li&gt;&lt;a href=&quot;#&quot;&gt;Cart&lt;/a&gt;&lt;/li&gt;<br>        &lt;li&gt;&lt;button href=&quot;#&quot;&gt;Sign In&lt;/button&gt;&lt;/li&gt;<br>      &lt;/ul&gt;<br><br>    &lt;/nav&gt;<br>  )<br>}<br><br>export default Navbar;</pre><p>Currently my app only has a navigation bar. I want users to be able to sign in by clicking the “Sign In” button on my navigation bar, after a successful sign in, the navigation bar updates the old “Sign In” button to a “Sign Out” button, and signed-in users can click it to sign out.</p><p>Proceed to the app client you made during the AWS setup, and scroll down to the “quick setup guide”.<strong> Their quick setup guide works, but there are some additional things needed to be added to our code which will be shown in my guide below</strong>.</p><h4><strong>Install the oidc-client-ts and react-oidc-context libraries.</strong></h4><pre>npm install oidc-client-ts react-oidc-context --save</pre><h4><strong>Newly modified main.jsx:</strong></h4><pre>// main.jsx<br>import { AuthProvider} from &quot;react-oidc-context&quot;;<br>import App from &quot;./App&quot;;<br>import ReactDOM from &quot;react-dom/client&quot;;<br>import React from &quot;react&quot;;<br><br>const cognitoAuthConfig = {<br>  authority: &quot;&lt;domain-url&gt;&quot;,<br>  client_id: &quot;&lt;your-client-id&gt;&quot;,<br>  redirect_uri: &quot;http://localhost:5173/&quot;, // Point to your localhost<br>  response_type: &quot;code&quot;,<br>  scope: &quot;email openid profile&quot;,<br>  onSigninCallback: (_user) =&gt; {<br>    window.history.replaceState({}, document.title, window.location.pathname);<br>  },<br>};<br><br>const root = ReactDOM.createRoot(document.getElementById(&quot;root&quot;));<br><br>// wrap the application with AuthProvider<br>root.render(<br>  &lt;React.StrictMode&gt;<br>    &lt;AuthProvider {...cognitoAuthConfig}&gt;<br>      &lt;App /&gt;<br>    &lt;/AuthProvider&gt;<br>  &lt;/React.StrictMode&gt;<br>);</pre><blockquote><strong>AuthProvider:</strong> <br>Wraps the app to provide OpenID Connect (OIDC) authentication (brain for your app that handles login and logout).</blockquote><blockquote><strong>cognitoAuthConfig:<br></strong>- <strong>authority</strong>: The AWS Cognito domain URL (It tells the app where to send authentication requests).</blockquote><blockquote>- <strong>client_id</strong>: A unique identifier for your app, issued by AWS Cognito.</blockquote><blockquote>- <strong>redirect_uri</strong>: The URL to which users are redirected after signing in or out.</blockquote><blockquote>- <strong>response_type</strong>: Indicates the OAuth flow being used (code means Authorization Code Flow). Basically a website’s way of asking for a secret key safely after you log in.</blockquote><blockquote><strong>- scope</strong>: Tells the app what parts of your profile (like your email) it’s allowed to see.</blockquote><p><strong>The quick setup guide didn’t include:</strong></p><p><strong>onSigninCallback: </strong>This is called after a successful login to clean up the browser’s URL.</p><p>When a user signs in, they are redirected back to your application with a URL containing query parameters like:</p><pre>http://localhost:5173/?code=abc123&amp;state=xyz987</pre><p>If the user refreshes the page while these parameters are still in the URL, this can cause problems like <strong>duplication processing; </strong>When the app reloads, the react-oidc-context library sees the ‘code’ and ‘state’ in the URL and tries to process them again.</p><p>Since the ‘state’ parameter was already consumed during the initial sign-in, the library can’t find a matching state in its storage, resulting in a common error like:</p><pre>No matching state found in storage</pre><p><strong>The onSigninCallback function fixes this</strong> as it removes the ‘code’ and ‘state’ parameters from the URL <strong>immediately </strong>after they are processed. If the user refreshes the page now, there are no leftover code or state parameters for the library to reprocess. The library simply reads the user’s tokens (already stored) and continues as normal.</p><p>The app is then finally created and wrapped in the AuthProvider so every part of the app can know if you’re logged in or out.</p><h4><strong>Newly modified App.jsx:</strong></h4><pre>import React from &quot;react&quot;;<br>import { useAuth } from &quot;react-oidc-context&quot;;<br>import Navbar from &quot;./components/navbar/navbar&quot;;<br>import &quot;./App.css&quot;;<br><br>function App() {<br>  const auth = useAuth();<br><br>  const signOut = async () =&gt; {<br>    // Remove the user from local session<br>    await auth.removeUser();<br>    <br>    // Then redirect to Cognito’s logout endpoint<br>    const clientId = &quot;&lt;client-id&gt;&quot;;<br>    const logoutUri = &quot;http://localhost:5173/&quot;;<br>    const cognitoDomain = &quot;&lt;cognito-domain&gt;&quot;;<br>    window.location.href = `${cognitoDomain}/logout?client_id=${clientId}&amp;logout_uri=${encodeURIComponent(logoutUri)}`;<br>  };<br><br>  switch (auth.activeNavigator) {<br>    case &quot;signinSilent&quot;:<br>      return &lt;div&gt;Signing you in...&lt;/div&gt;;<br>    case &quot;signoutRedirect&quot;:<br>      return &lt;div&gt;Signing you out...&lt;/div&gt;;<br>  }<br><br>  if (auth.isLoading) {<br>    return &lt;div&gt;Loading...&lt;/div&gt;;<br>  }<br><br>  if (auth.error) {<br>    return &lt;div&gt;Oops... {auth.error.message}&lt;/div&gt;;<br>  }<br><br>  // Pass signOut to Navbar<br>  return (<br>    &lt;div&gt; <br>      &lt;body&gt;<br>        &lt;Navbar signOut={signOut} /&gt;<br>        &lt;Sidebar /&gt;<br>      &lt;/body&gt;<br>    &lt;/div&gt;<br>  );<br>}<br><br>export default App;</pre><blockquote><strong>useAuth:</strong><br>Provides authentication-related state/methods (e.g., auth.isLoading, auth.removeUser). Tracks whether the user is logged in or out.</blockquote><blockquote><strong>signOut Function:<br></strong>- <strong>auth.removeUser()</strong>: Logs the user out locally (Removes the user session data from the browser.). <br>- Then redirects to Cognito’s logout endpoint completing the logout process on Cognito’s side.</blockquote><blockquote><strong>auth.activeNavigator: </strong><br>- <strong>signinSilent</strong>: Renders a message when signing in. <br>- <strong>signoutRedirect:</strong> Renders a message when signing out via redirect.</blockquote><blockquote><strong>auth.isLoading: </strong><br>Displays a loading message while the authentication process is ongoing.</blockquote><blockquote><strong>auth.error: </strong><br>Displays an error message if something goes wrong with authentication.</blockquote><p>Finally, we pass the signOut function to the Navbar component so the sign out button can trigger logout when needed.</p><h4>Newly modified navbar.jsx (Navigation bar)</h4><pre>import React, { useState } from &quot;react&quot;;<br>import &quot;./navbar.css&quot;;<br>import amazonLogo from &quot;../../assets/amazon_logo.png&quot;;<br>import { useAuth } from &quot;react-oidc-context&quot;;<br><br>// Intercept the passed signOut method from app.jsx<br>const Navbar = ({ signOut }) =&gt; {<br>  const [isResponsive, setIsResponsive] = useState(false);<br>  const auth = useAuth();<br><br>  const toggleResponsiveMenu = () =&gt; {<br>    setIsResponsive(!isResponsive);<br>  };<br><br>  return (<br>    &lt;nav className=&quot;navbar&quot;&gt;<br>      &lt;div className=&quot;navbar-logo&quot;&gt;<br>        &lt;img src={amazonLogo} alt=&quot;Company Logo&quot; /&gt;<br>      &lt;/div&gt;<br><br>      &lt;div className=&quot;search-bar&quot;&gt;<br>        &lt;input type=&quot;text&quot; placeholder=&quot;Search for products...&quot; /&gt;<br>        &lt;button&gt;Search&lt;/button&gt;<br>      &lt;/div&gt;<br><br>      &lt;ul className={`navbar-links ${isResponsive ? &quot;active&quot; : &quot;&quot;}`}&gt;<br>        &lt;li&gt;&lt;a href=&quot;#&quot;&gt;Products&lt;/a&gt;&lt;/li&gt;<br>        &lt;li&gt;&lt;a href=&quot;#&quot;&gt;Cart&lt;/a&gt;&lt;/li&gt;<br>        &lt;li&gt;<br>          // if user is logged in<br>          {auth.isAuthenticated ? (<br>            &lt;button onClick={signOut}&gt;Sign out&lt;/button&gt; // change sign in button to sign out<br>          ) : (<br>            &lt;button onClick={() =&gt; auth.signinRedirect()}&gt;Sign in&lt;/button&gt;<br>          )}<br>        &lt;/li&gt;<br>      &lt;/ul&gt;<br>    &lt;/nav&gt;<br>  );<br>};<br><br>export default Navbar;</pre><blockquote>- <strong>If the user is logged in (auth.isAuthenticated is true):</strong> <br> The navbar shows a Sign Out button.</blockquote><blockquote>- <strong>If the user is not logged in (auth.isAuthenticated is false):<br> </strong>The navbar shows a Sign In button that triggers the auth.signinRedirect() method to start the login process.</blockquote><p><strong>Finished!</strong></p><h3>Final Result:</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*nFEHu3QW2qqfdHwHxmicYQ.png" /></figure><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=787de7999c07" width="1" height="1" alt="">]]></content:encoded>
        </item>
    </channel>
</rss>