5 Python Libraries You Use a Lot

Uyen Nghb
The Startup
Published in
4 min readAug 31, 2020

--

Python is one of the most common coding language use by data scientists not only because its language is beginner-friendly, but also because of its free online library and resources.

Python libraries are code bundles that can be installed into a project to help you complete tasks more quickly. Since Python is a coder’s tool, these libraries are separated, rather than embedded into a software (think Tableau, where you don’t code to create visualizations), and thus, provide a coder with lots of personalization to their project. You don’t have to write a ton of code to manipulate data, perform mathematics, or create visualizations — but still got that personalization feature.

To fully acknowledge the benefits of this resource, I went and identify 5 most common python libraries that you will use a lot as a data scientist.

Pandas

Pandas is the most widely-used Python packages that are also an open-source library for data analysis, manipulation, and visualization. Pandas is built on the Python language and it allows Python users to deal with tabular data using a one-dimensional data structure (Series) and a two-dimensional data structure (Data Frame).

For those who are familiar with SQL, Pandas is similar to SQL in functionality but many use Pandas when their data does not come from one data warehouse.

As Pandas is open-source, your code as well as others can be seen and shared across the internet, which can make your work much easier by taking really good code and adjust it to your own purpose.

NumPy

While Python creates lists and matrices, it’s actually not so efficient with performing mathematical functions (or scientific computing). Hence, we have NumPy. NumPy is a numerical library that performs these mathematical functions in the Python environment.

NumPy is important to handle arrays, which essentially means data stored in arrays. The data array structure is very similar to Python’s list structure but is more efficient and can be read faster, as well as allow us to perform maths in a simpler syntax. This video explains the basic function of NumPy.

The code you often see for NumPy is ndarray, which stands for an n-dimensional array. The array can go from 1 dimension to n-dimension. Practically, if you want to make a graph, you will need an array to have at least 2 dimensions so that it matches with the y and x-axis of the graph.

Matplotlib

To make a graph, Matplotlib is a plotting library in Python that allows Python users to create static and interactive visualizations. Say, if you need to make a graph showing the correlation between location and purchases of a certain good at a company or the daily increase of coronavirus cases.

You will often see matplotlib.pyplot or pylab in code. Since Python is a coder’s tool, many of its packages are separated into modules with different functions. While Matplotlib is an entire package on its own, we still need to specifically import pyplot as an automatic frame for setting up graphs and axes. Hence, we usually see ‘matplotlib.pyplot’ imported, rather than just ‘matplotlib’.

Pylab is another module that contains both pyplot and numpy for one import procedure. This is useful when we want to perform mathematics and still have access to graph visualization at the same time.

I found that DataCamp’s explanation is quite good: here.

SciPy

With the three basic libraries out of the way, we come to SciPy, which is a scientific computing package that offers more mathematical functions than NumPy. It actually uses NumPy to perform “various commonly used tasks in scientific programming, including linear algebra, integration (calculus), ordinary differential equation solving, and signal processing.”

The difference between NumPy and SciPy is that NumPy allows you to do basic sorting, indexing, and simple array mathematics to make sense of your data, and that is what you will use most of the time for practical uses. However, SciPy contains algebraic functions that is not fully contained in NumPy that is useful for more technical purposes.

SciKit Learn/TensorFLow

SciKit Learn is a machine learning library for Python. It supports NumPy and SciPy but also offers functions like clustering, vector machine, k-neighbors, random forests, and more. I also included TensorFlow because they are often compared to one another, but the function of TensorFlow is much more specific for deep learning, and it supports SciKit rather than act as an alternative tool.

While machine learning is one of the techniques used in AI, deep learning is one of the techniques used in machine learning. Machine Learning teaches the computer to make predictions on real-world data and compare it to a predetermined standard on every prediction until it produces the closest result. Deep learning is one of the many ways to teach a machine to make better predictions. This method became popular because it mimics the human brain and how our neural network works.

You can read more about this topic on Udemy here.

--

--

Uyen Nghb
The Startup

Data lover with a background in PR and a peculiar college experience spreading over seven global cities. Follow my journey. For inquiries: uyen.nghb@gmail.com