Web scraping has three simple steps:
The top Python libraries for webscraping are: requests, selenium, beautiful soup, pandas and scrapy. Today, we will only cover the first four and save the fifth one, scrapy, for another post (it requires more documentation and is relatively complex). Our goal here is to quickly understand how the libraries work and to try them for ourselves.
As a practice project, we will use this 20 dollar job post from Upwork:
There are two links that the client wants to scrape and we will focus on the second one. It’s a webpage for publicly traded companies listed in…
If you’re like me, then you use Spreadsheet as a database. Yes, I said it. And no, no one will judge here. This is a safe place for us heathens. After all, who needs a full blown database for projects that will either die or evolve right?
The problem with data, regardless where they’re stored, is that they’re only useful if they’re visible. And pushing data in front of people can be a challenge sometimes. Good thing they’re easy to do when the people are in Slack and the data is in Spreadsheets!
Here are some other use cases:
Our goal in this post is not to predict, as accurately as possible, the price of bitcoin tomorrow. Instead we want to see how we can use a machine learning algorithm called Random Forest to create a model that can predict bitcoin prices using historical data on bitcoin supply and demand.
Nota bene: The random forest algorithm, while awesome in many ways, has no awareness of time. Meaning the price predictions in this post will ignore seasonality. Again, our goal is NOT to accurately predict bitcoin prices but to see random forest in the works.
Specifically we are interested to know how the following factors affect the average market price of bitcoin across major bitcoin…
In this tutorial we will create our own trivia bot! The questions and answers will be stored in a Google Spreadsheet and we will write our program inside its script editor. We will be using webhooks to connect to Telegram. If you have no idea how to do this, don’t worry — I wrote a seven step tutorial here.
We will trigger the trivia bot by simply sending a message. Our bot will evaluate if the message is the correct answer. If so, it will send the next question; otherwise, it will repeat the current question. Here’s the finish product:
There are multiple ways to combine data in Pandas:
An append looks like this:
It’s what you’d use if you want to stack dataframes vertically. Very straightforward, just like its syntax: df1.append(df2, sort = False)
It takes minimal coding to create your own Telegram bot. In fact, you don’t even need to have a code editor installed to start building one. By the end of this post, you’d have learned how to create your personal interactive telegram bot with just a Google Spreadsheet. The final product would be a bot that can reply to your messages. Something like this:
Before I hash out the step by step instructions, it’s important that you have a conceptual understanding of how your bot is going to work. …
There are three main ways to group and aggregate data in Pandas.
There’s not a lot of difference between these functions except performance and readability. The groupby() function has the fastest runtime amongst the three but that is barely noticeable if you are running it against a small dataframe. In this post we will go through the syntax of each function so you can decide which one is most convenient for you.
But first, let us be clear on what we mean by “group and aggregate.” …
There are multiple ways to filter data inside a Dataframe:
The name of this function is often a source of confusion. Contrary to what you might expect, the filter function cannot filter values inside a Dataframe. It can only filter the row and column labels.
To demonstrate what I mean, we will use a Dataframe called books that has data of the top 100 books from 1990 to 2010:
With the filter() function, I can filter the columns I want to see — for example, if I’m interested to know which authors made it to the list, I filter for the Author…
There are two main ways to locate data inside a dataframe.
Here’s an example dataframe:
To select data by integer location, we will use the iloc method which, yep, literally translates to “integer location”.
df.iloc[ ]
.iloc accepts:
Using .iloc with an integer will select a single row of data. Here we selected our first row using the integer location, 0. …
Data exploration can be overwhelming for anyone who has little to no background in data analysis. There is technically no limits to what you can explore and there is no guidelines to what you should be looking for. But as the saying goes, all journeys begin with a single step. This post attempts to lay down not one but five easy steps you can follow when you’re exploring a dataset. I call it LAUGH:
Step 1: L-oad the data
Step 2: A-sk for definitions
Step 3: U-se questions
Step 4: G-et a feel
Step 5: H-ave next actions
By the end of this post, you would have learned what you should care about when exploring a dataset and how to do basic data exploration in a Jupyter Notebook. …