databases like im 5: relational databases! (article 1)

ai like im 5
8 min readMar 19, 2024

--

before i start, a couple things

  1. databases are incredibly important for anyone in the data science/ai/ml space and i think they are hard to break into.
  2. so i think that sql, nosql, and a lot of the other data/database tools deserve their own multiple part series.
  3. this is purely about data storage and retrieval, there is not a true relation to machine learning, or deep learning so feel free to skip over it.

anyways,

welcome to the series on

  • sql?
  • sequel?
  • squeal?
  • squirrel?

yeah… i give up on pronouncing.

sql is a very important skill in the world.

so even if we can’t agree on the correct pronunciation, we still use it a lot and should master it.

note: before we dive in, be prepared for a lot of database buzz words. i am convince these are designed just to confuse you and make people sound smarter than they actual are.

a great video about this.

because the best way to learn databases is to build a database,

let’s build a database:

food waste is a major problem in america.

  • when i worked as a wedding/banquet server, we would serve food every weekend.
  • and at the end of the night, we were often allowed to eat and take food home.
  • after a while, we got sick of eating the same stuff and didn’t even want to eat it.
  • and as a result, some nights, we would have to throw food away, a bunch of it.
  • to someone else, that is a incredible meal, but to us who have it had 1000 times, we were good.

million dollar idea: what if there was an app that allowed restaurant workers to trade food with each other

  • allowing people to try all different types of food.
  • allowing people to meet up and make connections.
  • and even the option of using a delivery app, like uber eats, to make help with the logistics.

so you go and build that app and realize that you are going to have to handle data such as:

  • user data → names, profiles, pictures, locations, and more
  • food → tracking types of food, what is available, descriptions, how long it’s been available for
  • transactions → details of what was traded, when it happened, and more

and if it successful, there is going to be a lot of data…

a weally weally big amount of data

for context, facebook has about a million gigabytes of data, do you think they store everything in

  • excel files
  • on local computers
  • and had people manually clicking and dragging everything

no, they use databases!

so if you want your app to be successful and your users to not feel like it’s a calculator running on potatoes

(even though you can run some cool stuff on a calculator running on potatoes)

you are going to want a database

  • databases are data storage systems optimized up the wazoo allowing for efficient retrieval, storage, and more
  • they are incredibly secure, scalable, and super easy to integrate!
  • these databases will only store necessary and instantaneous data, meaning that we have others solution for archived and historical data we will talk about in the future.

there are lots of database options pertaining to different use cases and more but we will maintain focus on the most common and best type:

relational databases

  • the most popular form of databases.
  • based on relation algebra, it is centered on a very important idea, relations.
  • in plain english: relational database are intertwined and connected together.

tables

  • at the heart of a relational database is a table
  • a table is just a optimized dataset/spreadsheet, also made of rows and columns.
  • you will sometimes see these called entities
  • and the the columns called fields
  • but no need for buzz words or corporate jargon

for our app, we need to create a basic table to store user information

  • let’s take some very basic information about our users
  • a primary key is essentially a unique identifier and index
  • although it can be any column, it must be non-null(every corresponding row must have a value) and a unique value
  • but for simplicity and many other reasons, we will often create another column for the primary key.

let’s create a table for food:

  • each user is going to have some type of food
  • should we store all the user information again or is there a better way to do things…

not rocket science!

foreign keys are like our friend borat, it is a foreigner connecting
worlds together.

>

for example, if we wanted to know the name of who has tom yum soup

  • we would search the food table and look for that dish name
  • find that corresponding userid
  • go back to users table and look at the user information
  • finally finding the name for that id!

there’s no need to re-store the name of who has tom yum soup!

and now we we create a table for transactions

  • we have to be super careful when creating keys like this though
  • having good design is incredibly important bringing me to my next point

schemas

  • relational databases are sensitive, meaning that making a lot of changes to the structure of tables, their features, and their keys is probably not the best idea.
  • before we go out and blind build a very complex database like this, we should probably draw it out and get a thought out plan of how to design a database.

we call these thought out plans schemas!

  • schemas are the logic behind databases explaining x and y
  • there are many different types of schemas in the data world but we don’t have to dive too deep into them for now
  • some are visual and some are a bit more complex
  • but just get a basic though idea of how to construct a diagram that shows each table (entity), it’s columns (fields), connections (keys), what is lacks, and more

some great questions to ask based on this

  • is there redundant information?
    - we call the process of evaluating this normalizing
    - it is a deep topic!
  • are the keys correct and well thought out?
  • what data types should we use for each column/field, should they contain null (empty values), and is the information sensitive or a special type of data?
  • what are we missing from the tables?

and most importantly, how scalable is the data?

  • i know it sounds weird, but we should design our database, thinking 5 to 10 to even 25 years down the line
  • because of sql’s sensitivity, scaling requires intricate database design from the beginning.

relational databases can be like dominoes, if one thing goes down, the whole system could too.

if you don’t start with a great base, stuff like this can happen.

so for this reason, an important thing to grasp is:

types of relationships

  • it’s a really good idea to constrain our table relationships
  • or in plain english: we should specify that exact relationship, we when we create a table.

there are going to be 3 main types of these relationships

  • one to one
  • one to many/many to one
  • many to many.
  • don’t worry if it doesn’t make sense yet, it will click as things go on.
  • i chose a not totally clear example, because there is not always a 100% correct choice for constraining
  • this will hold true with all lot of other things in the database world.
  • sometime there is a clear answer but each has it’s own pros and cons so most of the time it depends on the situation at hand!

but in my opinion, one to many makes the most sense!

building a better database

let’s fix up our user information table

  • password data needs security. we should encrypt the passwords with hashing (nerd encryption)
  • adding in phone numbers, active status, restaurant, and last login data would be a good idea.
  • and we should store the address/location of our users.

more of that relational database beauty!

transaction table

  • our transaction table is pretty good
  • but what about that deliver feature, we should probably store some information about that!
  • we’re going to need a couple more tables to handle that!

although you can’t see the full power of the key system yet, hopefully you can see this type of design makes everything so much simpler and scalable.

  • we can directly access the user’s address

food items

  • our food items table is pretty good, but let’s add some nutritional info, and a image type.

we have a great foundation and now we can design:

a refined schema!

relationships in this schema!

  1. users to addresses: one-to-one
  • each user has one address

2. users to food items: one-to-many

  • a user can list multiple food items

3. transactions to users (buyers and sellers): many-to-one (twice)

  • each transaction has one buyer
  • each transaction has one seller
  • but each seller and buyer can have multiple transactions

4. transactions to food items: one-to-one

  • each transaction references one food item

5. transactions to delivery (optional): one-to-one

  • each transaction may be linked to one delivery

6. delivery to user address: one-to-one

  • each delivery is linked to a user’s address

7. delivery to couriers: many-to-one

  • each delivery can be linked to a courier
  • a courier can do multiple deliveries

8. delivery to transactions: one-to-one

  • each delivery is linked to a one transaction

what would our database look like without these 8 relationships!

anyways,

learning databases is hard, because most of the stuff is incredibly simple. but the only way to really learn it, is to encounter it. so why not just build an entire database and fill it with fake data and fake scenarios! that’s what i will do.

anyways, here is an absolutely insane video:

have a amazing day and

godspeed!

--

--