Creating a Database on PostgreSQL — Airbnb#006

And failing miserably at it

Douglas Rocha
3 min readSep 8, 2022

Welcome to another journal on my Airbnb project! Read more about it here. This is the following step to First Look at the Data pt. 3 — Airbnb#005.

Well, so, there is a big amount of data to be used. My first thought when learning that was to try and use SQL to properly store all of that. Luckily, the Power BI course I’m doing right now — for more on that take a look at First Steps on Power BI and Car Sales Dashboard or any of my other posts about projects I did thanks to that — had a couple of chapters on PostgreSQL. The first one, which I have already completed, explained how to install PostgreSQL and use it to create databases. “That is exactly what I need!”, or so I thought.

Indeed it was a great bunch of lessons and I learned a whole lot, the problem came when I tried to actually create a database for this Airbnb project. As I usually like to do, I started with what is hardest and took on the “widest” dataset, even though it was not the longest. If you read my other journals you know that that is listings.csv with whopping 74 columns. That meant (at least as far as I have learned) that I’d have to manually create all of those columns.

Maybe I wanted to stall on that, I don’t know, but I choose to rather start by modeling the database. And I actually started doing it, this is what it looked like:

Yes, I am aware it looks terrible. I tried following the material I had from a class I took on my graduation on Databases and two videos I found on Youtube: Entity Relationship Diagram (ERD) Tutorial — Part 1 and Entity Relationship Diagram (ERD) Tutorial — Part 2. Don’t blame them, blame me. I’m not a Data Base Architect nor aspire to be (you can see why). Nevertheless, I started realizing I wouldn’t be able to properly create the database with all the relationships I wanted and quit trying to model it.

Then, again, came the job of manually creating 74 columns on PostgreSQL. I did it: created all of them one by one. Started doing it through the program's interface but soon started doing it on the actual Query. It ended up something like this:

CREATE TABLE "Unmodeled"." Listings"
(
listing_id integer NOT NULL,
listing_url text,
scrape_id bigint,
last_scraped timestamp without time zone,
name text,
description text,
neighborhood_overview text,
picture_url text,
host_id INTEGER NOT NULL,
host_url text,
.
.
.
calculated_host_listings_count_private_rooms integer,
calculated_host_listings_count_shared_rooms integer,
reviews_per_month integer,
PRIMARY KEY (listing_id)
)
TABLESPACE pg_default;ALTER TABLE IF EXISTS "Unmodeled"." Listings"
OWNER to postgres;

Next was importing the csv file. I could not even count the number of errors I faced, but the biggest one was definitely the problem with the price column because it was formatted like “$100” and SQL couldn’t understand that as a number. That wasn’t the only problem I faced on data types, imagine how many errors showed up when trying to import 74 columns at once.

Result: I quit. Yes, I gave up on creating the database and using SQL to first analyze this data. I knew (and still know) there must definitely be a solution to this. My quick research didn’t give me an answer, but I’m sure that by doing some more chapters in the course I’m doing or maybe even doing others in the future it will come clear to me how to solve this.

I wanted to tell you what the next step is going to be but I guess I’ll have to do some thinking and learning before finding that out. See you in the next journal!

--

--

Douglas Rocha

Software Engineer | Working Java, React, SQL and Python | Writing Best Coding Practices, Clean Code and Software Engineering