My top 5 Python packages for 2023

Andrew Wild
DataJoey
Published in
4 min readOct 5, 2022

Python’s meteoric rise over the last 10 years has made it the most popular programming language in the world.

The language combines a simple syntax with an endless supply of third party packages.

These packages make Python incredibly useful and can boost your productivity beyond belief.

With so many packages available though, it’s difficult to know where to start.

To help, here’s a list of my top 5 Python packages that have helped me as a Python data engineer/developer and look set to gain (or maintain) popularity in 2023.

1. FastAPI

https://github.com/tiangolo/fastapi

Put simply, FastAPI is a framework for building APIs in Python.

If you are building a web app backend and thinking about using Flask or Django, I would highly recommend reading up on FastAPI before you do.

FastAPI allows you to build high performance APIs in a very straightforward manner.

It’s easy to learn and has some of the best documentation I’ve ever read.

The package ships with OpenAPI docs out of the box meaning you have a documented API by default.

The performance is incredibly fast (hence the name) and they’ve managed to create a package that is opinionated, but flexible.

We’ve built our back end micro services at DataJoey (https://datajoey.com/) using FastAPI and I can highly recommend it.

2. Pandas

https://github.com/pandas-dev/pandas

No Python package list would be complete without mentioning Pandas; it has arguably been one of the driving forces behind Python’s rise.

As their website suggests: Pandas is a fast, powerful, flexible and easy to use tool for data analysis and manipulation.

I agree with all of the above. There’s so much functionality built in, it almost feels like cheating at times.

The package allows you to analyse and transform data through their DataFrame object and can help you solve data science, data engineering and machine learning problems.

Pro tip — Two very useful add on packages for pandas are: Geopandas and Pandasql. Geopandas allows you to handle geospatial (i.e. map) data within a DataFrame and Pandasql allows you to write SQL against a DataFrame (if you prefer that to the Pandas syntax).

3. Pyspark

https://github.com/apache/spark/tree/master/python/pyspark

You can think of Pyspark as being Pandas on steroids (now there’s a thought).

Pyspark is a wrapper for Apache Spark — a framework that can perform processing tasks on very large data sets.

Spark extends the concept of a DataFrame, meaning it’s similar to Pandas in a sense, but has all of the scalability of distributed computing.

The framework has taken over the big data processing space in recent years; all but killing off Hadoop.

The Pyspark package gives Python engineers a way to get all the benefits of Spark without having to learn a new programming language (Scala).

They even recently introduced a Pandas API that allows you to get going with little to no learning curve if you’ve used Pandas before.

With the big data industry going from strength to strength and Spark being the processing framework of choice, it’s worth reading up on Pyspark.

4. Requests

https://github.com/psf/requests

The Requests module allows you to easily send HTTP requests using Python.

Whether you’re calling out to another service in your application or scraping the web, this library will help.

The main goal of the package was to make HTTP requests more human-friendly and they’ve succeeded in that goal.

In most cases, Requests allows you to get the information you want in one line of code.

It’s simplicity makes it stand out over other packages like urllib and http, and I’d recommend trying it as your first option when interacting with HTTP requests in Python.

5. Re (Python Regex)

https://docs.python.org/3/library/re.html

Bear with me on this one. Regex is possibly one of the most hated things in tech.

As the old saying goes: “If you have a problem and you use regex, you now have two problems.”

Having said that, it is still the best option for extracting data from a block of text.

For those unfamiliar, regex allows you to parse large amounts of text looking for specific patterns. Once a pattern is matched, regex will return the text you wanted.

The main gripe people have is that the syntax is somewhat funky.

In my experience, https://regex101.com/ makes using regex bearable. It allows you to generate your patterns and test them out against your block of text. I’d recommend using it if you find yourself in regex hell.

Natural language processing tools have come a long way in recent years, but regex doesn’t look like it will be unseated any time soon.

Thanks for reading. If you enjoyed the article, consider reading one of our other blog posts.

For more high quality data/cloud content, please see the DataJoey LinkedIn or Medium pages. To hear more about DataJoey, get in touch at https://datajoey.com/.

--

--