Evolution of Pandas Library!

Karthik
Variablz Academy
Published in
4 min readFeb 14, 2023

Pandas is an open-source library for data analysis in Python, which has been around for over a decade. It provides a data structure for efficiently storing large datasets and tools for working with them, which makes it an indispensable tool for Data Scientists/Data Analyst etc.,

In this article, we will take a brief look at the timeline of Pandas library and its evolution over the years.

2008: Pandas Birth

In 2008, Wes McKinney, a software engineer at AQR Capital Management, released the first version of Pandas. It was created to help him to manipulate and analyze financial data, but the library's popularity suddenly increased among data analysts and data scientists in other fields.

2010–2014: Growth in Popularity

During this period, Pandas acquired popularity in the data analysis community and found widespread use in a number of disciplines, including statistics, economics, and finance. It quickly got established as a staple tool in the data science toolkit.

2015: Increased Accessibility

Pandas 0.17, which was released in 2015, made the library more versatile and powerful. New capabilities like the ability to perform time-series analysis, simpler handling of missing data, and enhanced speed, were added. These features helped Pandas improve and increase its popularity even more.

2017–2019: Major Upgrade

Pandas 0.20, which was released in 2017, marked a significant improvement. Numerous new features were included, including enhanced datetime processing, support for multi-level indexing, and enhanced huge dataset management. It increased the adaptability and capability of Pandas, making data scientists and analysts increasingly drawn to them.

Then updates ranging from version 0.21 to version 0.25 (2017–2019) got released. Starting with the 0.25.x series of releases, pandas only supports Python 3.5.3 and higher. These versions were sole to fix the bugs and few features were included.

Here are some of the major new features added to the Pandas library between versions 0.20 and 0.25:

  • Categorical Data Type: A new data type called “Categorical” was added to better handle categorical variables. This improved memory usage and increased computation speed.
  • Nullable Integer Data Type: A new Nullable Integer data type was introduced to store integer values that can contain missing values represented as NaN.
  • Pandas.NA: A new constant was introduced that represents missing values in the new nullable data types.
  • Merge and Join Enhancements: The merge and join methods received performance enhancements and new capabilities, including faster left and right joins, as well as improved support for multi-level merging.
  • Resampling Enhancements: The resampling method received improvements, including the ability to specify custom aggregation functions, and better handling of missing data.
  • TimedeltaIndex: The TimedeltaIndex, which represents a set of durations, received new features, including the ability to perform element-wise operations, as well as the addition of new aggregation methods.
  • Deprecation of Panel Data Type: The Panel data type was officially deprecated and will be removed in a future version of Pandas.

Note: This is not an exhaustive list; additional improvements and bug fixes were made in these versions.

2019 and Beyond:

Pandas versions ranging from 1.0 — 1.3 got released (2019–2021), which greatly pushed Pandas library popularity and usage.

Here is a list of some of the features that were added to Pandas library from version 1.0 to version 1.3. Please note that these may not be the latest changes:

  1. Enhanced IntervalIndex for interval-based time series data.
  2. New .loc and .iloc accessor property for DataFrames.
  3. The new .split() method for arrays.
  4. Improved support for datetime and time delta operations.
  5. Improved support for handling missing values.
  6. Enhanced support for categorical data.
  7. Improved performance for data grouping and aggregation.
  8. Enhanced support for time-based data operations.
  9. Improved support for working with arrays and data frames.
  10. Improved handling of NaN values in data frames.

Version 1.4 got released in 2022 and version 1.5 got released in 2023.

Some of the new features added in Pandas from version 1.4 to version 1.5 are:

  1. New csv compression options — added support for compressing CSV files using bz2, gzip, and lzma compression formats.
  2. DatetimeIndex improvements — added support for timezone-aware datetime indices and improved performance for time-series operations.
  3. Improved Interpolation — added new methods for interpolation, such as “nearest” and “zero” interpolation, and improved performance for existing interpolation methods.
  4. New Merge Function — added a new function “merge_asof” for merging datasets based on the nearest value within a specified tolerance.
  5. Numeric aggregation — improved performance for aggregation operations on numeric data.
  6. DateTime slicing — added support for slicing datetime indices by time intervals, such as “3 hours” or “1 day”.
  7. Improved handling of NaN values — added new options for handling NaN values during operations, such as “propagate_na” or “skipna”.
  8. Improved categorical handling — improved handling of categorical data, including improved performance for categorical operations.
  9. Better handling of missing data — added new options for handling missing data, such as “dropna” and “fillna”.
  10. Improved Indexing — added new options for indexing and selecting data, such as “loc” and “iloc”, and improved performance for existing indexing methods.

In conclusion, the Pandas library has advanced significantly and become a vital tool for data analysis and manipulation. Pandas is a tool that you simply cannot do without, whether you work in finance, economics, statistics, or any other field that requires data analysis. And it’s always a good idea to check the Pandas library's official documentation for the latest features, updates, and bug fixes.

Follow me on LinkedIn for more insightful data science talks and content

https://www.linkedin.com/in/karthik-sa/

Karthik Saravanan

Adios!

--

--