Open Data Project
Published in
1 min readApr 28, 2017
This week I have started working on the Open Data Project. My dataset is a list of all registered dogs and cats in Geelong. I started using the csv/pandas module but then found out pandas already has a built-in read_csv function:
import pandas as pd; pd.read_csv(“registeredpets.csv”)
And that’s it.
Lots of issues with the actual data:
- Loads of inconsistencies. Ages range from 0–70+ (Human or dog years?)
- Breeds may be the same for rows but typed in differently (eg one row has a dog breed “Labrador Cross Breed Dog”, which I’d consider the same as “Labrador”)
What graphs I want to implement:
- Dogs v Cats
- Ages of both / compare
- Registered/Not registered
- Suburb/maybe a geographical map
- (Un)common names
Columns: Suburb, breed, type (dog/cat), colour, registered, age, animal name.
There are around 46,000+ pets