Member-only story

How to Quickly Compare Data Sets

How to get a quick summary of any differences between two data sets

Costas Andreou
Towards Data Science
4 min readJul 21, 2019

--

Photo by Joshua Sortino on Unsplash

Every now and again, the need will arise where you will need to compare two data sets; either to prove that there are no differences or to highlight the exact differences between them. Depending on the size of the data, you may have a number of options available to you.

In my previous article, 3 Quick Ways To Compare Data in Python, we discussed numerous ways of comparing data. None of the options we discussed, however, was able to give you a quick, detailed summary or allow you to cater for minor differences between the data sets.

In this article, I would like to provide you with another option; one that I believe could prove to be very helpful. This time around, I would like to bring to your attention the DataComPy python library.

What is the DataComPy library?

As highlighted on their GitHub page, the DataComPy library is a package to compare two Pandas DataFrames and provide a human-readable…

--

--

Towards Data Science
Towards Data Science

Published in Towards Data Science

Your home for data science and AI. The world’s leading publication for data science, data analytics, data engineering, machine learning, and artificial intelligence professionals.

Costas Andreou
Costas Andreou

Written by Costas Andreou

A technologist with domain expertise in Investment Banking

Responses (7)