New Foundations of Ontology

For some students, the ideal library shelf arrangement for discoverability (photo by Pietro Bellini)
“An open mind is like a fortress with its gates unbarred and unguarded.”
 — Isador the Librarian (Warhammer 40,000: Dawn of War)

The capital of Finland is Helsinki. I know this with certitude because I just double checked the Wikipedia entry. If I felt suspicious, I could have looked it up in the World Factbook or Country Study, or even gone full retro and called up a library reference desk to consult their Encyclopædia Britannica. But why bother? Wikipedia can be wrong, but the same’s true of any information source, with a comparable level of accuracy. Possessing impractical and paranoid beliefs is no way to go through life.

Libraries used to organize books by a hierarchical nomenclature using controlled vocabulary. They still do. It beats shelving books by color, I suppose. A remote facility in a closed stacks environment, on the other hand, can arrange materials more optimally by size, just as computer files are stored without regard to how they may be best physically browsed. The search engine technology that provides access to billions of webpages likewise operates with crawling agents and ranking algorithms, by and large without direct human involvement spent cataloging individual entries.

This is nothing new. The Yahoo! Directory, once the ultimate finding aid on the Web (and itself reminiscent of Gopher-era pathfinder lists), closed down a few years ago, while the company is similarly not what it once was. As an editor of the Open Directory Project, I spent my time, along with an army of volunteers, organizing and describing webpages so people could better find what they needed online. The ODP is still around, although today it seems more a relic of how web searching used to be done.

Back in the library world, cataloging is alive and well. Nowadays even the databases which aggregate hundreds of millions of articles are built on linked data generated by humans. Books and articles and digital collections are classified by subject headings, the original hashtag. More automated mechanisms for discerning the aboutness of a work, such as full-text indexing, co-citation mapping, and usage analytics (a la Amazon and other’s “people viewing this product also bought”), are still not the primary method of categorizing library materials.

Archaic taxonomies and nominalist pitfalls aside, it remains to be seen if this system will ever change. Those responsible for creating manually-generated concordances, back-of-the-book indexes, and descriptive subject analyses, unsurprisingly, think they do it better than machines. One common refrain is how a robot would classify Jonathan Swift’s “A Modest Proposal” as cooking advice. Compare to the profession’s longstanding antagonism of the Internet as a technology, regardless of how it has advanced information access.

Search engines possess more data than the human brain is capable of holding. Computers are making inroads in the areas of voice and image recognition and natural language processing; ambulatory robots and self-driving cars are becoming more commonplace; while Siri-like agents are taking hold in even the medical and legal fields.

Yet terms such as “deep learning” and “neural networks” make many people uncomfortable. If my job was to do readers’ advisory, I could see how the growth and advances in software such as the Netflix recommendation engine might be interpreted as a threat worthy of maligning or suppressing — and merely considering self-obsolescence viewed as heretical.

Machines improve our lives in countless ways. There are frustrations and bumps in the road, to be sure, but in many cases the best tool for the job has no human element. It’s now therefore easier to do research than it was in the era of chained books. I don’t know if it’s inevitable that librarian-created metadata will be an unnecessary surrogate in the future, but it seems to be a very real possibility. Google sure doesn’t use subject headings, and it works rather well. The real pipe dream is believing that the existing methods we have for organizing information cannot be improved upon by impending technological advances.

Further Reading

Check out my other posts for related commentary.