Member-only story
17 Strategies for Dealing with Data, Big Data, and Even Bigger Data
Tips and libraries to speed up your Python code
Dealing with big data can be tricky. No one likes out of memory errors. ☹️ No one likes waiting for code to run. ⏳ No one likes leaving Python. 🐍
Don’t despair! In this article I’ll provide tips and introduce up and coming libraries to help you efficiently deal with big data. I’ll also point you toward solutions for code that won’t fit into memory. And all while staying in Python. 👍
Python is the most popular language for scientific and numerical computing. Pandas is the most popular for cleaning code and exploratory data analysis.
Using pandas with Python allows you to handle much more data than you could with Microsoft Excel or Google Sheets.
SQL databases are very popular for storing data, but the Python ecosystem has many advantages over SQL when it comes to expressiveness, testing, reproducibility, and the ability to quickly perform data analysis, statistics, and machine learning.
Unfortunately, if you are working locally, the amount of data that pandas can handle is limited by the amount of memory on your machine. And if you’re working in the cloud, more memory costs more money. 💵