NLP Deep Learning

Photo by Patrick Tomasso on Unsplash

In this post, I will summarize what I learnt from Natural Language Processing with Deep Learning offered by Stanford University, including the Winter 2017 video lectures, and the Winter 2019 lecture series. Both lectures were taught by Prof. Christopher Manning at Stanford University. …

Exploration of Spark Performance Optimization

Photo by NASA on Unsplash

Welcome back! This is the third part of the series on Exploration of Spark Performance Optimization. In the first two posts, we have discussed on the characteristics of Spark and how to use Yarn web UI for code performance checking. …

Exploration of Spark Performance Optimization

Photo by NASA on Unsplash

Welcome back to the series of Exploration of Spark Performance Optimization! In Introduction To Apache Spark, I briefly introduced the core modules of Apache Spark. Although in the last post, I mentioned that Spark enables distributed computing and the end users barely need to know about the resource and task…

Making Sense of Big Data, Exploration of Spark Performance Optimization

Photo by NASA on Unsplash

Apache Spark is a popular framework in the field of Big Data. Coming from a background of coding in Python and SQL, it didn’t take me long to get my hands on using Spark. However, without understanding the mechanisms, I got confused quite often at the first beginning. The shift…

Feature Engineering Using PySpark

Photo by Markus Spiske on Unsplash

As a data scientist, we spend most of our time on feature engineering. To develop features that widely cover multiple dimensions of the targets, we usually need to extract information from multiple data sources. As a result, I find table joining to be a particularly useful technique to merge features…

Feature Engineering Using PySpark

Photo by Ales Krivec on Unsplash

When we travel to a foreign country, the different time systems can be a headache issue. Sometimes, after several hours of flight, we might arrive in a foreign country with local time even earlier than when we started the flight. It is not that we can magically travel back in…

Photo by Aaron Burden on Unsplash

In my previous posts, I have revisited Python Pandas library which handles tabular data. Another important aspect of data analysis is data visualization. Thus, in this post, I will summarize 5 quick facts about Python Matplotlib, which is one of the most popular Python visualization libraries.

1. Figure Anatomy

Matplotlib supports object-oriented programming…

Photo by Aaron Burden on Unsplash

Following my last post on 10 Quick Facts About SQL And SQL Server, I continue to revisit key concepts in Python Pandas library. This post summaries 10 quick facts about Python Pandas which I found particularly useful.

For code examples, I will use the historical Apple stock price data, ranging…

Photo by Aaron Burden on Unsplash

Recently, I have been revisiting some key concepts about SQL. I am using Microsoft SQL Server for practice. This post will summarize 10 quick facts about SQL and SQL Server. …

Photo by Clément H on Unsplash

I never had a chance to learn Python in the classroom. Instead, I picked up the programming language by self-learning. A disadvantage of not learning a programming language systematically is that sometimes I cannot fully understand other people’s Python code and I might not be able to make full use…

Jiahui Wang

Motivated to LEARN and SHARE

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store