Navigating Petfinder’s API using Petpy Wrapper

Jessica Bow
Analytics Vidhya
Published in
6 min readJan 26, 2021

Struggling shelters throughout California are faced with a major ‘catch-22’. The shelters with the greatest need, are found in the communities with the fewest resources. This leaves many shelters stretched thin, often led by skeleton crews largely supported by volunteers. This means that the animals in shelter spend much of their time in solitude.

Shelters are loud, chaotic, often overpopulated and underfunded. Studies show that the longer a dog spends in shelter, the more likely they are to have long term or irreparable emotional and behavioral damage. While California has set the intent to become a ‘no kill’ state, I set out to learn how we can do better now to minimize a dog’s time in rescue (stay tuned).

In order to infer what features can minimize a dog’s time in shelter, I needed data including a dog’s duration of stay from arrival to adoption. Access to comprehensive shelter data is sporadic and inconsistent at best, but anyone who’s ever debated adopting a pet has likely traversed the pages of Petfinder. In this article I’ll walk you through navigating Petfinder’s API to gather data on previously adopted animals using the Petpy wrapper.

Petfinder API
You will need to first create an account and submit an API request from the ‘Developer’ page. This key is quickly approved and from there all relevant information can be found on your user account under ‘Developer Settings’. The standard level API access provides you with an API and Secret key, account status, as well as request usage and rate limit.

Unfortunately, navigating the Petfinder API from terminal is not so transparent. An hour of troubleshooting , I had retrieved one listing for one previously adopted dog in the form of a json file to show me that it was possible, but I could make it no further. It was then that I discovered Petpy and thirty minutes later I had a robust, intact DataFrame containing the information I sought.

Petpy Wrapper
API wrapper’s are packages that ‘wrap’ API calls into simpler functions in a way that provides for a more user friendly experience than working directly with the API. Petpy is no exception. In Jupyter run:

!pip install petpy

Once the download is complete, you can import Petpy and get started. You’ll want to first instantiate your API and secret key utilizing the etpy wrapper as follows:

From there you are now ready to make your requests. Petpy has a variety of capabilities and their documentation is clear and well explained. You can begin by searching through the possible parameters and testing the capabilities of Petpy guided by one of the many examples. Or if you know what you’re looking for, you can get down to business.

For my specific purposes, I wanted data for all dogs marked as adopted near the Bay Area spanning all of 2019 and 2020. The format begins with pf.animals(), where all specific parameters will be included inside of the parentheses. Following the documentation and knowing that I wanted my result to be in the form of a pandas DataFrame I set the name of my DataFrame to ‘dogs’, the animal type and df parameter:

Had I left animal type blank, the resulting output would include all animal types listed on Petfinder. Using Petpy I could pull all dogs of a particular breed, adoptable horses within 10 miles of me, only cats with long coats adopted last April and so much more. To narrow down my search to only adopted dogs within the last two years, I added several more parameters:

It’s important to read up on the default parameters before making any request. As I did not include results per page or number of pages, Petpy defaults to 20 results per page. The maximum results per page is 100 but the maximum number of pages seems unclear. There did not appear to be a drawback from including a high number of pages, seeing as if the pages exceeded the amount of data available an error would inform you of the page that was reached but the DataFrame would still be made available.

I first toyed with the idea of pulling adopted dogs throughout all of California, but the documentation states that location must be listed in the form of ‘city, state’, ‘zipcode’ or ‘latitude, longitude’. Instead I narrowed my focus for the time being to the Bay Area and utilizing the default setting, I pulled the data for dogs within 100 miles of the given zip code. I eventually dropped the few cities included in my DataFrame that are considered outside of the Bay Area, as well as a few listings that slid through from 2021. Had I further specified time as well as date, I believe I could have avoided this.

To obtain my final working DataFrame, my request looked like so:

The result was a 20,000 by 49 pandas Dataframe shown below.

From here we can confirm that listings in this DataFrame are in fact of species and type: dog and status: adopted. Details are provided on organization, age, gender, size, primary and secondary breeds and so much more. I did find some duplicates appeared in my request, as id is unique to each dog this was easily dealt with:

If you’ve followed along to this point, you may come to realize we are missing a crucial element- the duration of time listed on Petfinder. Taking a closer look at the features in this DataFrame, I am given ‘published_at’ and ‘status_changed_at’. With the goal of inferring features which minimize time in shelter and maximize adoption speed, I can engineer a feature which will get me as close to my target as possible by encompassing the total time a dog is listed on Petfinder. I created my target column as follows:

By first converting to DateTime, then subtracting the date at which the listing status was changed to ‘adopted’, minus the date at which the listing was published gives the amount of time the listing was active on Petfinder in a new column titled ‘days_on_petfinder’. From here we convert this column to days in DateTime formatting, and finally round that down to whole numbers.

After these steps I saved my DataFrame as a .csv file and then I was ready to continue exploratory analysis and feature engineering. I hope this helps any analysts and data scientists out there looking to study animal adoption with the hope of decreasing an animal’s time in shelter. By being able to infer the features that maximize adoption rate and implement changes we can hope to ease the heavy burden placed on our local shelters and the animals they care for.

--

--