Lucky guy! I’m struck by how kind your internship hosts were for this project: they gave you a simple task with clear parameters and visible impact.
But I hope they also gave you a glance into the rough and tumble of the real world of user-generated data. Your breezy account didn’t mention the overwhelming variability of names for the same place (check out geonames.org for some synonyms then think about possible typos, translations, and transliterations for each: sooner or later they all show up) or the depressing frequency of missing or incorrectly stored information (ex: wrong lat and long for places). And then there are the often just plain impossible things to work out (Is this particular San Josein California or in Costa Rica? What does here or the museum really mean?). Tagging and searching for locations are huge and difficult tasks!
And I fervently hope that you didn’t leave your internship thinking that mutually exclusive and collectively exhaustive might be even temporarily workable as a constraint for taxonomies! It’s not. In practice, mutually exclusive means “agonize over which might be best option then guess”, simply because we can classify anything in many ways. El Botín in Madrid is an attraction, a location, a restaurant, a historical landmark, an activity, etc. If you picked one and only one category, then you’ve just crippled your taxonomy and murdered your search engine. In my experience, collectively exhaustive means “when you don’t know, then make up a category”, because everything has to be forced into some place in the taxonomy, whether or not that makes sense. This constraint means that there’s no way to represent “don’t know” — why not simply leave the category blank? Exhaustive just leads people to invent lots of spurious categories that we have to clean up later.
Moral of the story: the stuff that you worked on is waaaay more awesome and challenging than this your first taste. Hope to see you coming back for more after graduation! We can use your help!
