Open Data Project

TA
Design Computing
Published in
1 min readApr 28, 2017

This week I have started working on the Open Data Project. My dataset is a list of all registered dogs and cats in Geelong. I started using the csv/pandas module but then found out pandas already has a built-in read_csv function:

import pandas as pd; pd.read_csv(“registeredpets.csv”)

And that’s it.

Lots of issues with the actual data:

  • Loads of inconsistencies. Ages range from 0–70+ (Human or dog years?)
  • Breeds may be the same for rows but typed in differently (eg one row has a dog breed “Labrador Cross Breed Dog”, which I’d consider the same as “Labrador”)

What graphs I want to implement:

  • Dogs v Cats
  • Ages of both / compare
  • Registered/Not registered
  • Suburb/maybe a geographical map
  • (Un)common names

Columns: Suburb, breed, type (dog/cat), colour, registered, age, animal name.

Data of the first ten pets

There are around 46,000+ pets

--

--