Basic SQL For Data Analysis
Learn one of the most popular data languages
Hello everybody. Hope you’ve been well. Today we’ll be looking at SQL, a language that’s very important to learn when you’re working with databases. As usual, I want to give you an easy to understand 101 of both SQL and databases. With a tiny bit of effort using SQL can be pretty fun and simple. Let’s get started.
Databases: What are they?
Before we start pulling all the fancy data with a new toy, let’s understand what a database is. A database is an organized collection of data. This can be anything from financial data, students in a university, players in an online video game etc. Databases are useful for pretty much every kind of company and institution and come in many varieties.
You may have seen things like Oracle, MySQL, PostgreSQL, etc when looking up databases or a tech blog. These are called relational database management systems. In short, software that allows us to create and manage databases. In this article I’ll be using the RDMS SQLite . It is pretty easy to set up.
SQL is the language that talks to relational databases in order to store, update, change, and delete data. As to show you how it works, I’ll be creating a database in SQLite with data I created related the video game, Street Fighter 5.
Creating a Database and Importing File
The first thing we’ll do is create a database. We’ll just call it testDB.
-- Creating a new Database Note, statements go after sqlite>
sqlite> .open C:/sqlite/testDB.db
sqlite> .databases
seq name file
--- --------------- --------------------
0 main C:\sqlite\testDB.db
And done. The first line tells SQLite I want to make a permanent database in a specific location called testDB. The second line is just to make sure that the file has been created. Simple non? But alas our database is empty. Let’s fix that now.
Like I mentioned earlier, I created a CSV file of data related to characters in SF5. Like with R, I can upload this file into a database in order to start working with it.
--First create a table
sqlite> create table SF5Characters( Name Text Not Null, Health Int Not Null, Stun Int Not Null, Origin Text Not Null, Projectile Text Not Null, Command_Grab Text Not Null);sqlite> .mode csv
sqlite> .import C:/sqlite/SF5Characters.csv SF5Characters
sqlite> .header on
sqlite> .mode column
In the create table statement I created a table. A table is pretty much the same thing as a data frame in R. It holds a specific set of data for us to view manipulate etc. The lines following are the names of the columns, the datatype the column will be accepting, and Not Null meaning it will not accept blank fields. Each column name is separated by a comma and the statement ends with a semicolon.
The next two lines are to let SQLite know that a CSV file will be coming and the actually importing of the file. Finally the next 2 lines are just so the output of our SQL statements will be clean.
Working With Data: SQL Statements & Queries
We can finally start playing SQL now! Hooray! Most SQL queries look something like this:
SELECT column_name FROM table_name WHERE column_name = "Frank";
filter(dataframe, column_name == "Frank") #dplyr equivalent in R
All queries end in semicolons. Queries are made up of statements (the upper case words) that structure how you want the data returned. The four most common statements are:
- SELECT: Choose this data
- UPDATE: Change this data
- INSERT: Add new data
- DELETE: Remove this data
Also for the those not familiar with SF5 here’s what some of the columns mean.
- Health: How much damage a character can take.
- Stun: How damage a character can take before they are “stunned” or immobilized.
- Origin: What Street Fighter game did a character debut in.
- Projectile: Does a character have a “fireball” type move.
- Command_Grab: Does the character have a special grab besides universal?
Now that those definitions are out of the way let’s start by figuring how many rows are in this table:
SELECT COUNT(*) FROM SF5Characters;
----------
27
Our first query. Looking good except one problem. There’s 26 characters in this data set. Why do we have 27 rows? Let’s investigate.
sqlite> SELECT Name from SF5Characters;
Name
Name
Ryu
"Chun li"
Nash
"M. Bison"
Cammy
Birdie
Ken
Necalli
Vega
"R. Mika"
Rashid
Karin
Zangief
Laura
Dhalsim
F.A.N.G
Alex
Guile
Balrog
Ibuki
Juri
Urien
Akuma
Kolin
Ed
Abigail
Ah I see the problem. The column name “Name” was accidentally added in as a part of the data set. Let’s put delete to work.
sqlite> DELETE FROM SF5Characters WHERE Name = 'Name';
sqlite> SELECT COUNT(*) FROM SF5Characters;
----------
26
There we go. Now let’s answer some basic questions.
Which characters are completely new to the Street Fighter Series?
sqlite> SELECT * FROM SF5Characters WHERE Origin = "SF5";Name Health Stun Origin Projectile Command_Grab
---------- ---------- ---------- ---------- ---------- ------------
Rashid 950 1000 SF5 Yes No
Laura 1000 1000 SF5 Yes Yes
F.A.N.G 950 950 SF5 Yes No
Alex 1025 1050 SF5 No Yes
Kolin 1000 1000 SF5 Yes No
Ed 1000 1000 SF5 Yes No
Abigail 1100 1050 SF5 No Yes
SQL statements can give you all the data you need at once if you use the * after Select. This is convenient so that you don’t have type another statement in order to figure something out. But we got another problem. Alex is not new to Street Fighter but Necalli is. Update will come to our rescue.
sqlite> UPDATE SF5Characters Set Origin = "SF5" WHERE Name = "Necalli";sqlite> UPDATE SF5Characters Set Origin = "SF3" WHERE Name = "Alex";sqlite> SELECT * FROM SF5Characters WHERE Origin = "SF5";Name Health Stun Origin Projectile Command_Grab
---------- ---------- ---------- ---------- ---------- ------------
Necalli 1000 1000 SF5 No Yes
Rashid 950 1000 SF5 Yes No
Laura 1000 1000 SF5 Yes Yes
F.A.N.G 950 950 SF5 Yes No
Kolin 1000 1000 SF5 Yes No
Ed 1000 1000 SF5 Yes No
Abigail 1100 1050 SF5 No Yes
Fixed. There are 7 characters new to the series so far. 2 characters have less than 1000 health, 5 have projectile attacks and 3 have command grabs. Good to know.
How about we do a bit of organizing? Let’s get a list of all the characters by from lowest to highest health. We can use a clause called ORDER BY to do that. a clause is a word that comes after FROM to return data under more specific commands.
sqlite> SELECT Name, Health FROM SF5Characters ORDER BY Health ;Name Health
---------- ----------
Akuma 900
Cammy 900
Karin 900
Dhalsim 925
Chun li 950
Nash 950
R. Mika 950
Rashid 950
F.A.N.G 950
Guile 950
Ibuki 950
Juri 950
Urien 950
Ryu 1000
M. Bison 1000
Ken 1000
Necalli 1000
Vega 1000
Laura 1000
Kolin 1000
Ed 1000
Birdie 1025
Alex 1025
Balrog 1025
Zangief 1050
Abigail 1100
What if you only want characters with a projectile and a command grab? Then there’s the handy AND clause for that. Basically if both parts of the statement is true, you’ll get the results.
sqlite> SELECT Name, Health FROM SF5Characters WHERE Projectile = "Yes" AND Command_Grab = "Yes" ORDER BY Health ;Name Health
---------- ----------
Laura 1000
Huh, just Laura. I guess that makes her unique. I hope you got the gist of it. In SQL think of the data that you want returned to you and create the query that’ll allow you to do that. With a bit of practice you can get the hang of this in no time. Speaking of practice, here’s a a few links to some SQL tutorials. Best part about them is that they’re free and you don’t have to download a database in order to practice! How neat is that?
Well that’s all for now everyone. With the resources listed you can get started on SQL in no time at all. I might do a article in the future that shows you how you can access a database through R! It ought to be fun. If there’s something you’d like me to clear up please let me know.
If you liked this article hit the recommend button.