Natural Language Processing

Create a knowledge graph using spaCy for natural language processing

Photo by Andrew Itaga on Unsplash

Today, we will take the contents of a Wikipedia article and prepare it for natural language processing. We will use spaCy to process the text and use Power BI to visualize our graph.

# for manipulating dataframes
import pandas as pd
# for webscraping
from requests import get
from bs4 import BeautifulSoup
# for natural language processing
import spacy
import en_core_web_sm
nlp = en_core_web_sm.load()

Then, we’ll issue a get request to Wikipedia like so:

url = 'https://en.wikipedia.org/wiki/QAnon'
response = get(url)

To get an idea…


Data Science Tricks

A friendly tutorial on getting zip codes and other geographic data from street addresses.

Photo by Jasmin Sessler on Unsplash

Knowing how to deal with geographic data is a must-have for a data scientist. In this post, we will play around with the MapQuest Search API to get zip codes from street addresses along with their corresponding latitude and longitude to boot!

The Scenario

In 2019, my friends and I participated in CivTechSA Datathon. At one point in the competition, we wanted to visualize the data points and overlay them on San Antonio's map. The problem is, we had incomplete data. Surprise! All we had were a street number and a street name — no zip code, no latitude, nor longitude. …


Data Science Tools

A hands-on introduction to Microsoft’s analytics tool.

Photo by Jacob Bowman on Unsplash

As a data scientist, you’ll need to learn to be comfortable with analytics tools sooner or later. In today’s post, we will dive headfirst and learn the very basics of Power BI.

The Data

The dataset that we will be using for today’s hands-on tutorial can be found at https://www.kaggle.com/c/instacart-market-basket-analysis/data. This dataset is “a relational set of files describing customers’ orders over time.” Download the zip files and extract them to a folder on your local hard drive.

Download Power BI Desktop

If you haven’t already, go to https://powerbi.microsoft.com/desktop


Data Science / Power BI Visualization

A visual step-by-step guide to forecasting using Power BI.

Photo by Luma Pimentel on Unsplash

In this post, we’ll go through the process of creating forecasting in Power BI.

Get the Data

You can download the dataset that I used here. It contains daily female births in California in 1959¹. For a list other time-series datasets, check out Jason Brownlee’s article.

Let’s load the data into Power BI. Open up Power BI and click on “Get data” on the welcome screen as shown below.


Data Science / Opinion

Learning a few basic skills just might make a data scientist out of you yet!

Photo by Ilham Rahmansyah on Unsplash

Every once in a while, I would come across an article that decries online data science courses and boot camps as pathways towards getting a data science job. Most of the articles aim not to discourage but serve as a reminder to take a hard look in the mirror first and realize what we’re up against. However, a few detractors have proclaimed that the proliferation of these online courses and boot camps have caused the degradation of the profession.

Bridging the Skill Gap

Data science have captured popular imagination ever since Harvard Business Review dubbed data scientist as…


Exploring Trump

For adventurous beginners in NLP.

Photo by Lorenzo Rui on Unsplash

In a previous post, we set out to explore the dataset provided by the Trump Twitter Archive. This is part three of the Exploring Trump series.

PyCaret is an open source, low-code machine learning library in Python that allows you to go from preparing your data to deploying your model within seconds in your choice of notebook environment.¹

PyCaret does a lot more than NLP. It also does a whole slew of both supervised and unsupervised ML including classification…


Coding Tools

A visual step-by-step guide to replacing Mac’s default terminal application with iTerm2.

Photo by Safar Safarov on Unsplash

This the weekend, I’ve decided to restore my Macbook Pro to factory settings so I can have a clean start at setting up a programming environment.


Data Science / Careers

Opinionated advice for the rest of us. Love of math, optional.

Photo by Andrew Itaga on Unsplash

Since my article about my journey to data science, I’ve had a lot of people ask me for advice regarding their own journey towards becoming a data scientist. A common theme started to emerge: aspiring data scientists are confused about how to start, and some are drowning because of the overwhelming amount of information available in the wild. So, what’s another, right?


Exploring Trump

For adventurous beginners in NLP.

Photo by Leon Seibert on Unsplash

In a previous post, we set out to explore the dataset provided by the Trump Twitter Archive. This is part two of the Exploring Trump series.

Housekeeping

Let’s import pandas and also set the display options so Jupyter won’t truncate our columns and rows. Let’s also set a random seed for reproducibility.

# for manipulating data
import pandas…


Data Science / Python NLP Snippets

A beginner’s guide to PyCaret’s natural language processing module.

Photo by Gabriel Gurrola on Unsplash

I remember a brief conversation with my boss’ boss a while back. He said that he wouldn’t be impressed if somebody in the company built a face recognition tool from scratch because, and I quote, “Guess what? There’s an API for that.” He then goes on about the futility of doing something that’s already been done instead of just using it.

Ednalyn C. De Dios

Data Scientist. Breaking things to solve problems.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store