Basics MySQL knowledge required for Data Science.
In this blog we will be discuss about the fundamental course of SQL which is useful for all professional be it data analyst, business analyst, data scientist to extract, manipulate or draw insights from data stored in SQL databases. We will be looking at MySQL database starting from introduction, installation and practical sessions.
INTRODUCTION :
1. Why do we need databases ?
The rapid explosion of data creates a necessity to have a place to store and retrieve data efficiently. We are actually leaving in age of the data.
We require the databases mainly for 4 operation i.e. Store, Update, Delete and Search. We really need fast search and as we will see later in the blog which is one of the main reason in using specialization databases.
As a data scientist perspective we need data to know how to retrieve this data, manipulate it and save the processed data.
2. What is a database ?
It is an organized collection of data, generally stored and accessed electronically from a computer system.
3. What are the good properties of a database ?
Eg. Food Recipe Blog.
4. What are different types of databases ?
5. How data is stored in database ?
As we see in No-SQL type of databases which is schema less and everything is stored as JSON objects we can see we have key-value type of notification. We can nest things at many levels which is not possible in relational databases and hence No-SQL way of defining tables is much more simple and flexible and is much more popular recently.
6. Which are the list of companies using MySQL ?
80 % of the companies in worldwide uses MySQL. Below is the list of companies using MySQL.
INSTALLATION MySQL / MariaDB
In this part we will learn how to install MySQL. Please refer the following link for installation of MySQL / MariaDB (MariaDB is fork version of MySQL acquired by Oracle).
We will be installing MySQL 5.7 for this blog (Windows)
Installation Links:
GETTING STARTED
In this part we will learn we will learn what are types of queries and get started with writing most of the used queries like SELECT & INSERT. We will also look at the data types of MySQL.
1. What is SQL ?
SQL stands for ‘Structured Query Language’ . It is a language to communicate with databases. It is standard which means that, it does not depends on any hardware. We can create, maintain and retrieve the information related to database and its data using this language.
2. History of SQL ?
3. What are the types of command in SQL ?
We have following types of SQL statements:
- DDL (Data Definition Language)
- DML (Data Manipulation Language)
- DCL (Data Control Language)
4. Exploring Databases.
In this part we will see how we can create a database and connect to that database and use it for our business purpose.
Before creating tables and exploring more let us look at the data types of the MySQL. Ref link is given below.
5. Creating Tables.
Here we will see how we can create the structure of the table in MySQL.
6. Inserting Records in Tables.
Here we will see how we can insert the records in our newly created table User Details.
7. NULL & NOT NULL.
MODIFYING DATABASE STRUCTURE
In this part we will see how we can use DELETE and UPDATE command. Also we will see how we can view the structure of the table using DESCRIBE and lastly we will use ALTER command to alter the structure of the table.
1.UPDATE Command.
UPDATE command is used to update the data in the tables in a database.
Note:
- Updates can happen only one column at a time.
- Using where clause in an update helps us to update the rows which we are interested in.
- If where clause is not mentioned it will set all rows of column name mentioned to new value.
2.DELETE Command.
DELETE command is use to delete the records from the table.
Note:
- All the matching rows corresponding to where clause will be deleted from the table.
- If where clause is not used all rows will be deleted.
3. DESCRIBE Command.
DESCRIBE or SHOW CREATE TABLE are basically used to show the structure of the table. The difference between two is that DESCRIBE gives us the structure whereas another one give us the syntax of create table which can used further.
4. ALTER Command.
Using ALTER command we can add a column in our table, change it’s name along with change its data type as well.
We can also change the datatype of the column using ALTER command.
IMPORTING & EXPORTING DATA
In this part we will see how we can import and export the data in MySQL which can be in flat files, csv or excel format. We will also see how we can backup the database along with its objects and restore it on another server.
1. Importing data from CSV to MySQL.
2. Exporting data from MySQL to CSV.
3. Backing up databases in MySQL.
In this part we will learn how to backup the databases into SQL file.
4. Restoring database in MySQL.
Here we will see how we can restore the database which is dropped accidently by the user.
For restoring the database first we need to create an empty database and then redirect the output SQL file generated in backup phase.
DATA ANALYSIS USING MySQL
This module plays an important role and we will look at the data analysis part using MySQL. As a data scientist we need to retrieve the insights from the data so this is what this module is all about.
1. Counting Rows & Items.
2. Aggregate Functions (SUM, AVG, STDDEV, MIN, MAX)
3. Slicing required data in MySQL.
In this part we will see IN, OR, NOT operators. Also we will see AND and BETWEEN operators of where clause.
IN OPERATOR:
NOT IN OPERATOR:
BETWEEN OPERATOR:
4. Limiting the data in MySQL.
In this part we will see how we can limit the number of rows to view from our tables.
We can limit the number of rows using LIMIT keyword.
5. Sorting data in MySQL.
ASCENDING ORDER:
DESCENDING ORDER:
6. Filtering patterns using LIKE and wildcard in MySQL.
7. Grouping, Rolling up and Filtering in group in MySQL.
DESCRIPTIVE ANALAYSIS: FIFA 2019
The above link consists a project using MySQL queries. We have done descriptive analysis on FIFA 2019 players dataset.
WORKING ON JOINS IN MySQL.
In real scenario data is not stored in a single table. It is stored in multiple tables. In this module we will see how we can get the data from multiple tables using something known as joins and subqueries.
Types of Joins.
1.Left Join.
2. Inner Join.
3. Cross Join / Cartesian Join.
We must note that this type of join is rarely used in practice.
4. Right Join.
5. Self Join.
Self Join is a scenario where we join a table to itself. It is complex to debug and must be use only when we require pair wise list. Eg. Pairwise players belonging to same club in FIFA_2019 dataset or pairwise distribution of players whose age difference is less than 10.
INTRODUCTION TO INDEXING
Indexing helps us to make query and search fast in the database.
1. How indexing works ?
Let us consider the example of Library and understand a scenario where indexing will help us for faster search.
Case 1: If there is a scene that a new book has comes in and along with that there are few other books as well which needs to updated or removed or have become outdated. So if a librarian keep a new book on some random self so to find that book next time it will take a lot of time.
Case 2: If a smart librarian has sorted all the books based on the subject followed by author in alphabetical order then keeping a new book which belongs to XYZ subject for XYZ author becomes easier for fetching.
So, considering the case 2 we can say that, indexing on subject followed by author will make our search faster and will also help us for insertion, updating selves followed by removing outdated books without any difficulty.
2. Relationship.
3. Table Constraints.
Table constraints put constraints on the data which is inserted in the table. If we try to insert data violating that constraint MySQL won’t allow us to insert that data.
MySQL BUILT IN FUNCTIONS.
In this section we will discuss about some of the built in functions used in MySQL which can help us in reporting.
1.String Function: CONCAT
This function is used to concatenate strings from the data. For eg. if we need to concatenate first and last name of the person from the data. We use the concat keyword in MySQL to achieve this.
2. String Function: CASE
This function is use to convert strings into lowercase or uppercase. We use ucase (UPPER) or lcase (LOWER) keyword in MySQL to achieve this.
3. String Function: Trimming Strings
This function is use to remove trailing and leading spaces in a string. We use ltrim, rtrim and trim functions of MySQL to achieve this.
4. String Function: Extracting / Slicing Strings
This is use to extract some part of strings. We use left, right and substring keyword for extracting first, last and some characters of string.
5. Date-Time Functions: Current Date & Time.
We use now() function to get the current date along with timestamp. Along with that we can also use curdate(), curtime(), unix_timestamp() and observe the following result.
6. Date-Time Functions: Extracting Date & Time.
7. Date-Time Function: Extracting Month / Day / Year
8. Date-Time Function: Formatting Date-Time
9. Numeric Functions
There are various numeric functions which are used in MySQL. We will see following function:
- round() function which is used to round the number to nearest integer.
2. floor() function which is use to round down the values.
3.ceil() function which is used to round up the values.
4.truncate() function which help us to display number of the decimal points required.
5. abs() functions to remove the negative sign of the numbers.
MySQL WINDOWS FUNCTIONS
MySQL windows functions are used to perform data analysis calculation more easily. We will be using OVER() and PARTITIONED BY quite often if we want to use windows function.
OVER(): It determines the window (set of rows)
PARTITIONED BY : It splits the result into partitions where window function is applied.
To demonstrate windows functions we will be creating demo database called SALES. Consider the schema of the database / table below.
1. To get Total amount / Sales.
To get the total amount per sales and then populate back to the main table we would have use CTE (Common Table Expression). The query along with the output is shown below.
Now to debug and grasp the above query is little bit difficult. It is also not easiest to read. So, here comes the windows function to rescue us from this problem where we will be using OVER clause which helps us to divide the data in windows and use PARTITION BY clause along with it to further break down the data.
2. To get Number of Line Amount / Sales along with Total Sales.
3. To get the Daily Sales.
Conclusion
So, in this blog we learnt basic MySQL knowledge required for any Data Scientist. We started with introduction and understanding the history. We defined what are DDL, DML and DCL commands. We studied about them in details and explored them with examples. We understood how analysis can be carried out by writing SQL queries and did descriptive analysis of FIFA_2019 players project. We also understood what are various types of joins and built in functions. Last but not the least we understood the windows functions which can be helpful in reporting purposes.
Hope you understood the basic intuition behind this blog.
Do connect with me on LinkedIn : https://www.linkedin.com/in/gaurav-rajpal/
If you liked it please give 50 applauses.