Parsing Ripe Bulk Whois Data

Thomas Gorman
Parsing Bulk Whois Data
2 min readJun 18, 2019

At one time, I was working with large amounts of whois data from numerous sources like Arin, Afrinic, Ripe, etc. Some of these sources suggested using Perl or Ruby to parse these out. But I am a Python guy, so I set out to parse these with my language of choice.

In the following, you will see how it is done with Python.

After downloading the latest Ripe DB file from ripe.net, we need to read this data in. I chose the pandas library for that.

The data looks like the following after the initial read containing 87,447,972 rows:

To be nice, the data is a mess. We have several rows for the same CIDR block that start with the same words like ‘descr:’ and ‘remarks:’ and rows that have useless information like the first 5. So some cleaning needs to be done.

We are looking a lot better after the “cleaning the data” phase.

Next comes the parsing portion where we will split, group, map, and unstack the data.

View specific fields from out refined dataframe:

It’s looking a lot better. Now your data is ready to import into a MySQL db, Hadoop, or location of your choosing.

--

--

Thomas Gorman
Parsing Bulk Whois Data

IT Security Professional and Big Data Analytics Developer