First Look at the Data pt. 2— Airbnb#004

Douglas Rocha
2 min readAug 30, 2022

--

Welcome to my first journal on my Airbnb project! Read more about it here. This is the following step to First Look at the Data pt. 1 — Airbnb#003.

Last journal I took a look at the biggest file: calendar.csv. The second biggest file is reviews.csv so we are going to take a look at it now. This time LibreOffice Calc could open it hence it had “only” 418,470 rows. On the other hand, the F column holds the whole comment usually without a line break so it is pretty long (which explains the size of the file).

The Data Dictionary I used as documentation in the last journal didn’t really have much information on these column names, but they are pretty self-explanatory. I just want to mention that, as I have been led to understand, listing_id refers to the Airbnb listed place, reviewer_id to the guest or, in this case, the person that made the review, and the column with just id, despite what one may think, does not refer to the comment, but to the hosting of that guest or the renting of the listing for that specific period of time.

A quick look at the data already makes me drive my attention to some common words like “copacabana”, “beach” and “favela”. I already want to make a Word Cloud-like chart to look for these common words but I know Power BI doesn’t offer that chart natively. In other words, I’m also adding to the list of things to learn: how to have a Word Cloud in Power BI.

As this was an easy file to open and look into, I’m also talking about my first look into reviews_summary.csv. The file named listings.csv is actually the third biggest one and should be the next in line, but looking into reviews_summary.csv right after looking into reviews.csv seems like a better idea.

As one could expect, this file has the exact same number of rows as the “full version” but with fewer columns. It could also be expected that, as there was close to no information on the columns of the full version in the documentation, this file isn’t even listed there. But it is clear what each column means and that the purpose of this table is mostly to record how many reviews a listing has had and when they occurred.

That is it for these two files, see you in the next journal.

--

--

Douglas Rocha

Software Engineer | Working Java, React, SQL and Python | Writing Best Coding Practices, Clean Code and Software Engineering