A Beginners Guide to SQL and Database Management System

Great Learning
8 min readMay 29, 2020

--

Contributed by: Srinivasan Valady Ramanathan

Rationale

Tabular data, arguably the oldest data structure, an easy visual presentation of data for consumption by humans has been in existence since ancient times. Remember the log tables we used in school? Does it ring a bell? What is the percentage of the population using spreadsheets daily?

In 1994, Robert ran a retail chain of supermarket stores across the USA. The company’s sales transaction volume (Data Volume) is said to cross 1 Million in the coming fall. Robert could no longer capture the data in excel, with all the calculations. Since the transactions were entered by different stores, there were data quality issues and he could not capture all the data integrity needs in the spreadsheet. He needed more consistent data with security and access Control Mechanisms (Data security). He was surveying solutions to address these concerns. Guess what he found? Database Management System.

Database Management System

The Database Management system is a piece of software to store, organize, manage, and access data in the files system. A Database stores the information in the file system addressing the issues. While the software interacts with the file system to store and retrieve data, it should also provide a way to logically store the data, and give instructions to operate in a friendly way. The logical storage units in the Database are called Tables and a way to interact with tables is through SQL. SQL (Structured Query Language) provides a common query language across Databases to store, manage, and retrieve data. It is used in a client environment or from applications.

Back at Robert’s Den, Robert has decided to deploy a new DBMS solution for his company. The company’s technology team is tasked with the job of planning a design to use the DBMS thereby providing speed and ease of access for the users. The company’s staff at the point of sale cannot be coding to record transactions in the database. There is a need to build an interface with ease of use to record transactions and retrieve as necessary. If the data is independent of the application (Data Independence), it is convenient. Let us do a thought experiment, if all the data points are stored in a single table, what would happen? Yes, you guessed it correctly. A retail chain with customer data (name, address, etc.) will have to repeat for transaction data (every transaction a customer makes). Think about the Data redundancy and duplicity here. The Data here has to be split into multiple tables and modeled to avoid redundancy (remember we are talking in the age of 8MB RAM Desktops) and faster responses. This data modeling effort is called the Normalization of data. SQLs should have a way to join between tables to fetch combined data.

Tables and its Interactions

Every table talks about a particular object with every row having its members and all the columns are attributes of the same Item. A Table has two keys-

Primary Key

A column that uniquely identifies customers in a table. E.g. Aadhar number is a unique identifier for a person in India even though their names can be similar. This helps in identifying a customer to retrieve as the need arises.

Foreign Key

A key to link two tables together. A Foreign key in the table refers to the Primary Key in another table.

The SQL commands are divided into Data definition language which deals with the creation, dropping, and modification of table structure and Data manipulation language which deals with retrieval, aggregation, and other operations on data.

How to Create, Drop and Alter a Table ? — Data Definition Language

Let us look at two tables we want to create-

Customer

Transaction

A table can be created by using CREATE TABLE COMMAND-

CREATE TABLE Customer (
CustomerID int,
LastName varchar(255),
FirstName varchar(255),
City varchar(255)
CONSTRAINT PK_Person PRIMARY KEY (CustomerID));

If you notice, we mentioned what would be the data type of a column while defining the table itself thereby maintaining Data Integrity. The Database will throw an error if the user tries to enter the alphabets in the CustomerID column. The data type options in a typical database would be-

  1. Numeric data types such as int, tinyint, bigint, float, real, etc.
  2. Date and Time data types such as Date, Time, Datetime, etc.
  3. Character and String data types such as char, varchar, text, etc.
  4. Unicode character string data types, for example nchar, nvarchar, ntext, etc.
  5. Binary data types such as binary, varbinary, etc.
  6. Miscellaneous data types — clob, blob, XML, cursor, table, etc.

A table definition can be altered by using the command.

ALTER TABLE Transaction
ADD FOREIGN KEY (CustomerID) REFERENCES Customer(CustomerID);

Now that the foreign key constraint is added, the database will not record a transaction without a customerID.

We can drop the table by-

DROP TABLE Customer

Insert, Update and Delete Statements

After a creation of a table, records have to be inserted into a table. The table insertion can be done by

INSERT INTO Customers(CustomerID,LastName,FirstName,City)
VALUES (400,’cameroon’,’james’,’OH’);

Updating a correction to an inserted record is done by update command. Update command run as follows-

UPDATE Customers
SET City= ‘cincinatti’
WHERE CustomerID = 400;

A record can be deleted by

DELETE FROM Customer WHERE CustomerID=100;

Reading from a Database — Data Manipulation Language

After implementing the application and a few months in operation, there was a need to use the data to build reports monthly or daily at Robert’s retail chain. The questions asked were like the below

1. Who are my most valuable customers?

2. How many customers have not done a transaction yet this year?

3. How many transactions happen in Los Angeles city?

4. How does a customer’s average transaction value compare against the city’s Average transaction value?

Answering the above questions required SQL operations like row-level calculations, aggregation, Joins, filters, and Analytic functions. To pull the data from the database we use a select command. select is used as follows-

SELECT column1, column2, …
FROM table_name;

Row-level Calculations

The calculations or formulas written in an excel sheet are similar to row level calculations. The operation performed apply to only one row. What if we want to capitalize the last name and first name of a customer?

SELECT
Upper(LastName),
LOWER(FirstName) from
Customer

Aggregate Operation

To answer the question “ Who are my most valuable customers?”, we need aggregation. We have many aggregate functions like SUM(), AVG(),MAX(),MIN(),COUNT() etc. The answer can be queried as follows-

SELECT

cust.CustomerID,cust.FirstName||cust.LastName as CustomerName, sum(trans.TransactionAmount) as Total_Transaction_Amount
FROM Customer cust inner join Transaction trans on cust.CustomerID=trans.CustomerID group by cust.CustomerID, cust.FirstName||cust.LastName order by sum(trans.TransactionAmount) desc;

GROUP BY — is used to aggregate data and corresponding aggregate function, will produce the aggregation. Group by requires all the columns in the group by clause to be present in the select clause.

Join Operation

Join helps to join between two tables. There are four types of joins and seven ways to join them-

To answer the question “How many customers have not done a transaction yet this year?”, left outer join should be used-

SELECT cust.CustomerID,cust.LastName,cust.FirstName, cust.city
FROM Customer cust Left Join Transaction trans on cust.CustomerID=trans.CustomerID
where trans.CustomerID is NULL and year(TransactionDate) = <Currentyear>

Left Join — Left Join combines all the items in the left table to only the joining items in the right table. e.g. we can select all the customers irrespective whether they have done a transaction or not with their transaction details,

Right Join — Right Join combines all the items in the right table to only the joining items in .he left table.

Inner Join — Joins only the items present in both the tables.

Filter Operation

This operation helps to filter the records. The question “ How many transactions happens in Los Angeles city?” is answered by using this operation-

SELECT count(TransactionID)
FROM Customer cust Inner Join Transaction trans on cust.CustomerID=trans.CustomerID

where cust.city=’Los Angeles’

Analytical Functions

Analytical functions help in computing aggregate values over a group of rows. They also help in comparing against the previous and next values by using Lead () and Lag() functions. These functions help in calculating moving averages in a moving window by providing windowing functions.

The question “ How does a customer’s average transaction value compare against the city’s Average transaction value?” is answered by analytical functions.

SELECT cust.CustomerID,

avg(trans.TransactionAmount) over (Partition by cust.CustomerID) as avg_Customer_Transaction amount,

avg(trans.TransactionAmount) over (partition by cust.City) as avg_City_Transaction_Amount,

FROM Customer cust Inner Join Transaction trans on cust.CustomerID=trans.CustomerID

We can compare the average between two different groups.

Finale

Robert’s Retail chain was able to satisfy their reporting needs from the database. However, querying a database for reporting from a live system serving operational needs, will put a lot of load on the database server. Secondly, storing much of history in an operational system renders the live system slow. It would be ideal to move the data from the operating system to a warehouse/data lake to store the data for reporting purposes. The data model for a warehouse will be different to support fewer joins at the cost of storage.

Every enterprise like the retail chain stores its data in multiple databases, data warehouses, and data lakes. There is an imminent need for a data scientist to bring data from one/multiple databases into a data discovery area required for their analysis. SQL is an important skill to achieve it. In the case of heterogeneous data sources, ETL tools can be used to combine data.

** The examples used in the article are fictitious and not real, Any resemblance is coincidental. This can serve as a basic pointer to SQL and not exhaustive.**

--

--

Great Learning

Great Learning is an ed-tech company for professional and higher education that offers comprehensive, industry-relevant programs.