SQL prevents database corruption and injection, except in the ridiculous movie’s hacker scenes.
SQL is the Structured Query Language used to define and manipulate data in most of the databases in the world, and the most critical ones (banks, hospitals, airlines, secret services… ). And then, it gives the impression that with SQL you can do whatever you want, bypassing all application control, as if it were a backdoor to your database, wide opened on the network.
Programmers always laugh when seeing ridiculous hacking scenes in movies. In 2016 there was this “use SQL to corrupt their database” line in Jason Bourne (nothing to do with JSON or /bin/sh, by the way, it’s a movie) and recently in StarTrek: discovery series the hacking 'audit' was explained as “The probe used multiple SQL injections”. I’ve put the links at the end of this post.
I just want to explain that SQL is not a problem in this area, but can be the solution. SQL is not only a powerful language to manipulate data: it also has all features to encapsulate and protect the access to the database.
Actually, SQL is the implementation of Edgar F. Codd rule number 5:
When one says that SQL is DDL (Data Definition Language) and DML (Data Manipulation Language), and forgets the DCL (Data Control Language), he suggests that we can do anything, like dropping or changing data, and forgets all this ‘authorization’ part. And DDL does not only CREATE and DROP the tables, but we can (and should) create views to implement the encapsulation of data access, like the ‘private’ and ‘public’ declarations we find in other languages. And stored procedure when we need a procedural language more complex rules.
If we can “use SQL to corrupt their database”, that’s not because of SQL. That’s because of the developers who did not care about security and maybe didn’t know that:
- schemas should be used to isolate the different layers — internal ones vs. the ones exposed to the users
- views must be used to hide the internal private columns or rows, and expose only the public ones. Views are not only for queries (SELECT), but also modifications (INSERT, UPDATE, DELETE) can be made on views.
- grant/revoke table access must be done according to the visibility specification. Never do a GRANT ALL PRIVILEGES TO PUBLIC thinking that the front-end will guard all access.
- stored procedures should be used to expose the application services, rather than giving direct DML access on tables or views to the front-end
Here is a clear definition, by Philip Salvisberg, about the right design of database application where data is not exposed directly but through a secured application API defined in the database:
The Pink Database Paradigm (PinkDB) - Philipp Salvisberg's Blog
The Pink Database paradigm (PinkDB) is an application architecture for database centric applications. It is focusing on…
With this, impossible to “use SQL to corrupt their database” because the SQL statements are not written by the user, but coded by the developer who implemented this API, with all access rules enforced. But then, what about hackers trying to change the API behavior by trying StarTreck’s “multiple SQL injections”?
The idea is exposed in the following were a user can pass some SQL statements through the application API to tweak a SELECT statement and change it to be a DROP TABLE one:
xkcd: Exploits of a Mom
But this, again, is not a weakness of SQL but the wrong design and misunderstanding of Codd’s Rule 5 which states that there is two way to manipulate data: “interactive and by program”. If the user is able to do this SQL injection, that’s because the developer has used the wrong one. Probably by laziness, he used the ad-hoc query language instead of the programmer’s one.
Yes, SQL can be used by end-users and this is why it is a 4th generation declarative language, looking like an English sentence describing the result we want. Here, users can do data manipulation without the need to learn a procedural language. The security access, in this case, relies entirely upon the DDL (view definition) and DCL (privileges granted) after the end-user authentication (login with his username). The user has not been granted DROP TABLE privilege, and will never be able to do it.
But that is not correct when the application connects to the database with a generic account, like when one program runs to serve multiple users and transactions, because this account has probably more privileges to cope with the multiple user profiles.
As Rule 5 mentions, the SQL language can be used by programs. There, it is not the user who writes the SQL statement. The SQL statement is coded in advance by the application developer. And it is parsed and compiled. And then it is executed, many times, for different end-users, with different parameters. In this case, SQL is not a simple text command passed as a character string to the database engine. Here, SQL defines a cursor that is prepared from a SQL text that contains some variables. And the cursor is executed, passing values to those variables. There is no way to ‘inject’ SQL in this because the user can inject only values to the parameters: he cannot inject code.
Do you take user input and build from it a string to pass to Runtime.exec() in Java or os.system() in Python? This is exactly what you do with something like:
Statement stmt = con.createStatement();
"select * from students where name='"+argv+"'"
This is subject to SQL injection because you can pass a value that will change the WHERE clause predicate.
The right way to program a database access is to prepare a statement (once), and then execute it:
"select * from students where name=:1"
ResultSet rs = stmt.executeQuery(stmt);
Here there’s no way to pass something else than a String value for the name column.
This is mandatory for security, but also for performance. Do you imagine that you compile a different program for each user input?
In summary, are you an end-user running ad-hoc queries or a professional programmer who builds a secured and efficient application? SQL is the right language for both, but you must pick the right approach. Or your code will be ridiculous like these movies hacking scenes:
“Use SQL to corrupt their database” (Jason Bourne, 2016)
“The probe used multiple SQL injections” (Star Trek: Discovery, 2019)
Where to learn?
There are many resources about using prepared statements, but the best multi-database multi-language I know is from Markus Winand: