Data Warehousing in AWS part 1

Muhammad Ariq Naufal
Feb 22 · 4 min read

Hello this is my first medium article, this article purpose is a notebook from my Data Warehousing in AWS course. So let’s begin 😊.

Before I’m going to data warehousing, the first thing we need to know is basic about databases. In databases there are 2 types of database which is OLTP and OLAP.

Online Transaction Processing (OLTP) is mostly a relational databases, collect and manages transactional and operational data. OLTP example is Point-of-Sale (POS) terminals.

Online Analytical Processing (OLAP) is mostly a relational databases, perform complex analytical queries, and the data is imported from transactional or systems.

Here is the main different of OLTP and OLAP:

So what is a Data Warehouse?

A data warehouse is:

  • A central repository of business data from disparate sources.
  • A type of relational database that enables analysis of data.
  • A collection of approved and trusted historical corporate data.

Data warehouse also have a goals to provide access to data, ensure consistency and quality of data, and enable querying of the data in different ways using common BI tools.

Data warehouse have a small subset that called a data mart. Data mart is used to model a single subject scoped to a department or other small entity within the business. Data mart is often organized in a star or snowflake schema.

What is star and snowflake schema?

Star schema is organized around a central table — called the fact table — that contains measurements for a specific event, such as the sale of an item. The fact table contains foreign key relationships to one or more dimension tables, which contain primary keys and descriptive attributes for the sale, such as the customer or product.

Star Schema example

Snowflake schema is similar to a star schema, but the dimensions are split up to further normalize database normally form to 3NF.

Snowflake Schema example

when we discussing about data warehouse we also talk about big data, you know what big data is?

Big data is high-volume, high-velocity, and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation. — Gartner

So now we can talk about data warehouse in AWS.

AWS have a tools that called Amazon Redshift that take care of many challenging task of deployment and ongoing maintenance of data warehouse. in Amazon Redshift we can integrate various data source into the Redshift.

There are 2 ways of interacting into Redshift

Amazon Redshift built on top of Massively Parallel Processing (MPP) architecture, MPP is a database architecture in which a master node (called a leader node in Amazon Redshift) distributes work to one or more slave nodes (called compute nodes in Amazon Redshift) in the database cluster.

When we create a database a traditional database store their data in a row-based.

Amazon Redshift have a different ways to store their data, Redshift using a columnar storage to make scanning larger data sets and complex analytics queries more fast and efficient.

This is an example how columnar storage perform faster than a row storage. Suppose you want to query on values in column_2 (represented in blue in the diagram). When data is stored by row, the database engine must scan through all of the values for each row to obtain the value from column 2 in each row. However, with columnar storage, Amazon Redshift must only locate the block (or blocks) containing data for column_2 to return the requested values.

When we create a data warehouse we must plan how big the cluster is and many thing you must consider like…

There are 2 type of cluster in AWS.

When using Amazon Redshift you can define how to launch your cluster, Amazon provide this settings using Amazon VPC

Maybe I will split this article in to 2 article, so the next article will talk about how to load data into the redshift and so on. You can find the next article here.

Thank you for reading

Sources :

  • Amazon Training Course

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data…

Sign up for Analytics Vidhya News Bytes

By Analytics Vidhya

Latest news from Analytics Vidhya on our Hackathons and some of our best articles! Take a look.

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Muhammad Ariq Naufal

Written by

average student who want to learn new things

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store