5 nifty SQL funcionalities that you might find useful
SQL is so much more than CRUD operations
SQL is a language used to query databases in most software systems around the world. It is pretty much a given for software developers and it is now used beyond the developer world, since it is required to understand and represent data.
But in the same way that SQL is now not only required for software developers, the day will come (and it’s coming) when basic CRUD operations, defined as Create, Read, Update and Delete; will no longer be required, they’ll be just the base line, the “Hello World” of SQL.
So, because of that, today I bring you 5 nifty functionalities that SQL provides that might be useful to know and you might use at some point. Consider that this functions require basic concepts of SQL like joins, casting, etc. Keep in mind that I’m using PostgreSQL here because it is one of the most widely used RDBMS in the world, but most of these are also present in other systems or have a pretty similar equivalent.
now(), interval and dates
Dates are specially important when you’re checking things like reports or you want to check the behaviour of the data in a specific range of time, like that time you deployed a new feature and you want to see how the whole operation was affected after it. Well, you can certainly use dates to check that, most databases have a “created_at” and “updated_at” field that tells you exactly that info, but how do you query it?
Well, assume you have something like this database right here
Notice that here I’m using timestamp for the created_at fields, you could use the same one or some equivalent in your case but this is the standard.
Now, if you wanted to filter all the orders created in the moth of december you could use something like this:
select * from orders
where created_at < ‘2021–01–01’
and created_at > ‘2020–12–01’
Or you could use the keyword between
select * from orders
where created_at BETWEEN ‘2020–12–01’ AND ‘2021–01–01’
That’s very good, and much more legible. But what if you want to check something more flexible, say a query that monitors how everything is doing in the last 24 hours. Here you don’t want to continuosly keep changing the dates, or if, for example you want a dashboard that checks that data every couple of minutes you’re not gonna keep changing the dates. Well, here’s where two important functionalities come into place.
now() basically replaces the date with whatever is currently the database’s date whenever you use the query.
So that gives you the end point, but how do you get the first part of the query, well, SQL provides you with interval which basically takes any certain amount of time that can be added or substracted from another date. Check this out:
select * from orders
where created_at > (now()-interval ‘1 day’)
Amazing. You can understand the query without understanding SQL just because it uses human language. You could also replace the days with minutes, hours, weeks; whatever it is you need. Here, interval takes the current date and substracts 1 day from it.
Conditionals
One of the most powerful tools that programming languages have (and what basically elevates them to that category) are conditionals, the way in which you can define the different scenarios and the actions in those. And, yes, SQL has that too, in PostgreSQL its called CASE…WHEN.
So, say you want to display all the products but you have this one retailer that is having a discount sale and their products are 50% off and you need to include that in your query.
select id, name, merchant_id, price,
CASE merchant_id WHEN 4567 THEN price*0.5
ELSE price
END as discount_price
from products
Here you’d have two price columns, one with and one without the discount, and you could have as many retailers and as many cases as you want for different discounts.
So, the sctructur with case is:
case <column> when <column_value> then <value_to_display>
else <default_value>
end as <new_column_name>
row_number()
Say you have a list of users and you want to rank them according to when they created their accounts, maybe you want to divide them into groups, maybe you want to reward your most loyal fanbase.
Well, row_number serves as a way to enumerate your rows (duh) according to a certain criteria, so for example if we wanted them sorted by date it would look something like this:
select *,
ROW_NUMBER() OVER (order by created_at) as position
from users
This would give you an extra column displaying that number, maybe you’d like to multiply that for a discount, or group them using that value.
concat()
Ok, lets continue with a simple one, concat is one that you might know, but it is worth remembering. Concat lets you concatenate two columns into a single one, the most common use for this is to join the user’s first and last name into one single value.
In this case lets try to mix the name of a product to the merchant that sells it, since many retailers tend to offer a product with the same name but you just know that if it comes from your favourite supermarket it’s just that better.
We could try to concatenate both the product price with the merchant name using an inner join here too, kinda like this:
select
CONCAT(products.name, ‘ from ’, merchants.merchant_name)
as full_product_name
from products
inner join merchants
on products.merchant_id = merchants.id
So, what this does is create a new column called “full_product_name” that takes the product name, adds “from” and then the merchant name. So it’d look like “Milk from Wallmart”.
generate_series()
Ok, I’m keeping one of my favourites and most complicated functionalities for last, generate_series lets you do just that, generate a new dataset with the range that you give it. Python developers may be familiar with these using the range function, which takes a start, an end and a step; well this is exactly like that.
So, for this example, think you want to check all the orders created on a sunday, maybe because that is the day that you think is the most active. So you’ll want to create a new series with those dates and then join them with your orders table using the creation date as criteria, something like this.
select orders.* from
from generate_series(
‘2020–12–06 00:00’::timestamp,
‘2021–01–03 00:00’,
‘1 week’) starting_date(dte)
inner join orders
on orders.created_at between starting_date.dte
and starting_date.dte + interval ‘1 day’
Ok, so a couple of things here, first, we use the timestamp in the first parameter because the generate_Series can generate any kind of series, so we specify here that we want timestamps. Then you might be noticing the starting_date(dte), well, since generate_series behaves kind of like another table, we need to define it alongside its columns (you could add that date in your select as well). Finally, you can see I used an inner join and the condition used was the between function we saw before, using the dates again.
Conclusion
The bigger take here is that SQL is so much more than a few select, where and from. There is a rabbit hole out there that you might want to dive into. Also, each RDMS has its own perks so maybe you could explore the one you use to take advantage of its tools in a much deeper level.