SQL for Data Professionals (Useful commands)
First of all, What does SQL mean?
SQL is an abbreviation for the structured query language
What does Structured Query Language (SQL) mean?
Structured Query Language (SQL) is a programming language that is typically used in relational databases or data stream management systems.
It was developed by IBM in the early 1970s and is now an official standard recognized by the American National Standards Institute (ANSI) and the International Organization for Standardization (ISO).
Why Use SQL?
Think about it this way: Have you ever opened a very large data set in Excel, only for your computer to freeze or even shut down? SQL allows you to access only certain parts of your data at a time so you don’t have to download the data into a CSV, manipulate it, and possibly overload Excel. In other words, SQL takes care of the data analysis that you may be used to doing in Excel.
SQL subsets
The SQL language is divided into subsets according to the operations we want to perform on a database, such as:
DML — Data Manipulation Language
The first group is DML (Data Manipulation Language). DML is a subset of the SQL language that is used to perform inclusions, queries, changes, and deletions of data present in records. These tasks can be performed on several records in several tables at the same time. The commands that perform the functions mentioned above are Insert, Update and Delete
DDL — Data Definition Language
The second group is the DDL (Data Definition Language). A DDL allows the user to define new tables and associated elements. Most commercial SQL databases have proprietary extensions to DDL.
Basic DDL commands are :
- CREATE INDEX
- CREATE VIEW
- ALTER TABLE
- ALTER INDEX
- DROP INDEX
- DROP VIEW
DCL — Data Control Language
The third group is the DCL (Data Control Language). DCL controls aspects of data authorization and user licenses to control who has access to view or manipulate data within the database.
Two DCL keywords:
GRANT — authorizes the user to perform or set operations.
REVOKE — removes or restricts a user’s ability to perform operations.
DTL — Data Transaction Language
BEGIN WORK — (or BEGIN TRANSACTION, depending on the SQL dialect) — can be used to mark the beginning of a database transaction that can be completed or not.
COMMIT — finalizes a transaction within a database management system.
ROLLBACK — causes changes to existing data since the last COMMIT or ROLLBACK to be discarded.
COMMIT and ROLLBACK interact with control areas such as transaction and rental. Both end any open transactions and release any data-linked padlocks. In the absence of a BEGIN WORK or similar statement, the semantics of SQL are implementation-dependent.
DQL — Data Query Language
Although it has only one command, DQL is the most used part of SQL. The SELECT command allows the user to specify a query (“query”) as a description of the desired result. This command consists of several clauses and options, making it possible to prepare queries from the simplest to the most elaborate.
On this post, we are going to focus on DQL and some Clauses, some Logical Operators and Relational operators such as:
Clauses
- FROM — Used to specify the table to select the records.
- WHERE — Used to specify the conditions that must gather the records to be selected.
- GROUP BY — Used to separate selected records into specific groups.
- ORDER BY — Used to sort selected records in a specific order.
Logical Operators
AND — And logical. Evaluates conditions and returns a true value if both are correct.
OR — OR logical. Evaluates conditions and returns a true value if any are correct.
Relational operators
LIKE — Used to compare a model and to specify records from a database. “Like” +% extension means searching for all results with the same start as the extension.
BETWEEN — Used to specify values within a closed range.
Let’s start coding
First, we have a table called car
We use the command SELECT to tell what columns to choose. In this example, we want to select all the columns so we need to use the Symbol *. After that, we use the clause FROM to tell which table it should select the columns here we want the table car and we finish our code with a semicolon because it is a good practice.
SELECT *FROM car ;
The result is the entire table
Now instead of using the symbol * we are telling which columns we want, after the name we need to use a comma to write the next one. We have a new clause WHERE we can use this command to filter our columns, in this example, we are filtering cars with production year equal 1999.
SELECT brand , production_year , model , colorFROM carWHERE production_year = 1999;
Now we are going to SELECT the columns brand, production_year, model, color, FROM the table car, WHERE production_year is BETWEEN 2000 And 2009.
SELECT brand , production_year , model , colorFROM carWHERE production_year BETWEEN 2000 AND 2009;
Working with two tables
We will select data from two tables because in the real world we usually use more than one table to query the necessary data.
First table director
Second table movies
We are going to SELECT all columns FROM the table movie and JOIN the table director ON(where) movie.director_id is equal Id from table director and ordering the data alphabetically by title using ORDER BY.
SELECT *FROM movieJOIN director ON movie.director_id = director.id ORDER BY title;
We are going to SELECT all columns FROM the table movie and JOIN the table director ON(where) movie.director_id is equal Id from table movie and GROUP BY title ( it will aggregate data that has similarities. For example, if you have any duplicates in your data, you can use “GROUP BY” to count the number of duplicates in your fields.)and LIMIT (it helps when you have a very large database and do not want to be loading data for a long time you can limit the maximum data).
SELECT *FROM movieJOIN director ON movie.director_id = director.idGROUP title
LIMIT 100 ;
For those who are starting to learn SQL, with this knowledge and commands it is already possible to extract the data you need to start your analysis and generate value.