Data cleaning is the most crucial step in any project, if we do not take care of it properly, it might lead us to a completely different conclusion. Often than not, we might spend half of the time cleaning our data in most projects.
In this article, I am going to share some Python functions that can help us in data cleaning especially in:
We are going to use pandas in this project, let’s install the package if we do not have it.
conda install pandas
Matplotlib is one of the famous library to visualise data in Python. In this article, I will go through these key functions that I think is useful and important so that those who are new to Matplotlib can quickly pick it up.
As usual, we would need to install matplotlib package if we do not have it.
pip install matplotlib
Let’s import the package that we need in this tutorial.
Some of us are familiar with data manipulation in SQL but not in Python, we tend to switch frequently between SQL and Python in a project, result in reducing our efficiency and productivity. In fact, we can achieve a similar result of SQL in Python using Pandas.
As usual, we would need to install pandas package if we do not have it.
conda install pandas
We will be using the famous titanic dataset from Kaggle in this session.
After installing the package and downloading the data, we need to import them in our Python environment.
We will use pandas…
With over 2 billions user, Facebook is the most popular platform today. Most of the facebook users use Facebook Messenger app to communicate to each other, including me. I am really curious how I behave on this platform and how the rise of other social media apps like Instagram, WeChat and SnapChat impact my usage on Facebook.
Today, I am going to show you how to download your Facebook message data and how to analyse it.
I believe most of us have run this command line to execute your python script.
$ python main.py
Can we do a little bit more like defining our own argument in this script? The answer is definitely yes!
$ python main.py arg1 arg2
We are going to use Python argparse module to configure command line arguments and options. The argparse module makes it easy to write user-friendly command-line interfaces. The program defines what arguments it requires, and argparse will figure out how to parse those out of sys.argv. …
Although Python 2 is officially deprecated (Python 3, your time is now!), I believe some of us still have to maintain existing Python 2 projects before fully port those projects to Python 3.
In this article, I will show you how to manage your Python virtual environment by using Conda. Conda is a package and environment management system, it allows us to create and switch environments easily on our local machine.
I will be using the iTerm terminal in macOS. …
Spyder was my all-time favourite when I started my Python programming. I like its variable explorer, IPython console and project directory selector. The variable explorer allows me to check and debug the variable without printing them. The IPython console enables me to do any quick test before writing into script while the project directory selector allows me to change directory without using the command line. How convenient is it!
Recently the Python development in Visual Studio Code has improved dramatically for the past one year and it has piqued my interest. I have been using it for a few months…
As a Data Analyst, most often than not I receive requests like “Can you send this report to me on a weekly basis?” or “Can you send this data to me through email every month?”. Sending report is easy but it will be irritating if you have to do the same thing every week. That’s why you should learn how to use python to send email/report and schedule the script on your server.
In this article, I will show you how to extract data from Google BigQuery and send it as a report. …
Google Search Console (previously Google Webmaster Tools) is a web service offered by Google that helps you monitor and maintain your site’s presence in Google Search Result. It also serves as a great tool to analyse the organic search from Google that leads to your site.
As a Data Analyst, most of the time I need to share my extracted data to my product manager/stakeholder and Google Drive is always my first choice. One major issue over here is I have to do it on weekly or even daily basis, which is very boring. All of us hate repetitive tasks, including me.
Fortunately, Google provides API for most of its service. We are going to use Google Drive API and PyDrive to manage our files in Google Drive.
Before going into coding, you should get Google Drive API access ready. I have wrote an article on…
Data Analyst, Game Designer and future Pirate King