Importing the Data into Power BI — Airbnb#007

Douglas Rocha
4 min readSep 9, 2022

--

Welcome to another journal on my Airbnb project! Read more about it here. This is the following step to Creating a Database on PostgreSQL — Airbnb#006.

So, if you read my previous journal you know I’ve given up on trying to import the csv files into PostgreSQL, at least for the time being. Knowing I had a file with over 9 million lines I thought SQL was the only possible answer to that. But then, doing the Power BI course I’m currently enrolled in, the teacher said a phrase that really caught my attention. I don’t have it exactly how he said but it was something like:

“You will need Power BI to store data when Excel can’t.”

Somehow that was exactly my problem. The thing was I didn’t know Power BI could store more data than Excel (in reality I thought its capacity was even smaller). But, hearing that, the idea of importing my datasets to PowerBI came up. And, well, experimentation is never a bad idea. I tried importing the calendar.csv file, the biggest one with over 400Mb and the frightening 9 million rows. It definitely took a while to complete the importing process but, surprise surprise (or not, if you are ever so slightly more used to Power BI than me) it worked! All of the 9 million lines fit in Power BI. Some transformations were due, of course, but it was far easier — for me at least — to perform them in Power BI than in PostgreSQL. And we are going to cover them here as well.

When I first imported it, it looked like this:

Out of the bat we can see that the dates are right (they are in Portuguese because of the language settings in my OS, so that is not a problem). Although, the ids as appearing as integers, we don’t want that. Furthermore, the values in price and adjusted_price are appearing as text, which is not what we want. So, to Transform Data we go.

With just a couple of clicks we can solve our first problem and have the id column be a text as it should:

To solve the other may be a bit more difficult. If we try to just change the data type as we have done here, it will flag an error in every single cell because of the dollar sign. So first we need to remove the dollar sign, or, rather, we are going to replace it with nothing. Right clicking the column name brings up a menu with a lot of options, among which is the Replace Values option.

Clicking it brings up this window where we can type what we want to replace and what we want to replace it with. In this case, I want to replace the dollar sign with nothing. Hit Ok and I have this:

All of the values are already in “number shape” and we can now change the data type exactly as we did before. Repeat the process on adjusted_price and I have my dataset ready. … Or do I?

For some awkward reason, the dates column lost its data type and became text:

The level of difficulty in transforming that into date depends on how your OS and the environment of Power BI are set region-wise. Mine are set correctly as USA’s date format, so with a couple of clicks, how I’ve done before, they now are dates:

And now I really have a working dataset! The problem I had with PostgreSQL was solves ever so simply by just using Power BI instead. Of course I intend to try again later to have all of this data as a single database on PostgreSQL that I can analyze with SQL, but for now this should be enough.

There are other files and tables to be imported into Power BI so I can analyze the data as a whole. But for now maybe I will analyze this set of data on it’s own and see what it tells me. See you in the next journal!

--

--

Douglas Rocha

Software Engineer | Working Java, React, SQL and Python | Writing Best Coding Practices, Clean Code and Software Engineering