Search vs Discovery

Published in

The Graph

9 min readJun 10, 2015

Search is great — if you know what you are looking for.

However, as the amount of online content (especially video) increases exponentially, general search is often struggling to deliver meaningful results, unless you’re very explicit and goal oriented. And you will never know what you missed out on. Search is brilliant for quick, specific answers, terrible for discovering and exploring new ideas. Generally search results are biased towards popular links from the top 1% of all content and mainstream websites, trapping us all in a filter bubble of Wikipedia articles and increasing personalisation. Similar to a classroom environment, search is best focussed on specific paths towards preordained outcomes.

Down the corridor from this classroom is the library. This is where discovery and curiosity lives. Movement is free, more self-determined and can cross many subjects. Discovery reveals worlds you didn’t know existed, powers-up your critical thinking and enriches your understanding in ways the classroom isn’t designed for. Just as there is a fundamental difference between the role of the classroom and the library, there is a fundamental dichotomy between search and discovery. Search promises to make vast swathes of information available, whereas discovery aims to make more relevant knowledge immediately accessible.

What impact will discovery have on the future of education?

Arm yourselves

Those that work in learning have been scratching their heads for some time over the notion of being truly educated. Noam Chomsky, the eminent cognitive scientist and philosopher, gives his view:

“To be truly educated means to be in a position to enquire and create on the basis of the resources available to you.”

He refers to a colleague at MIT who always started a course by telling his students:

“It’s not important what we cover in class, it’s important what you discover.”

The argument both gentlemen make is that learning always takes place inside an individual, and that discovery is a fundamental part of the educational process. How else can we attain something we don’t know?

Accepting that better discovery, at least indirectly, means better education, how can we make sure the world’s best resources will be increasingly accessible to all?

Ask your librarian

The unstoppable tsunami of global information makes the chances of the right resource surfacing at the right time, incredibly slim. There is a latent supply of resources everywhere on the web and in archives, and a latent demand for great resources from millions of educators and learners all over the world, yet both sides seem poorly serviced. However, some would say the solution is actually just a library card away. English author Neil Gaiman once wrote:

“Google can bring you back a hundred thousand answers. A librarian can bring you back the right one.”

How does a librarian make this happen? Danny Hillis, who created the technology behind Google’s Knowledge Graph, describes the ways of his school librarian:

“Mrs. Wilner was somebody who got to know every kid that came into the library. At that time I was obsessed with rock collecting and I would always go to her and ask for books on rocks. And she would find me the books on rocks, but she would also bring back a couple of other books she thought I might be interested in too.”

Hillis goes on to say Mrs. Wilner greatly influenced his life:

“She didn’t write any of the books and she didn’t do any teaching. She did this brilliant thing of knowing exactly where I was and what I was interested in, and bringing me the right material at the right time.”

What great discovery and the librarian have in common is the ability to narrow down what you are looking for, match it to content that may not yet be known to you, and help you make new connections. This last bit is particularly important. The ever-optimistic Arthur Schopenhauer once concluded:

“Every person takes their own field of vision for the limits of the world.”

For all its utility, search will largely return a more detailed reflection of your existing field of vision, not expanding and challenging it in relevant ways.

Breaking free

What about other initiatives? Online learning in its broadest sense is transforming each stage of the education sector, and Massive Open Online Courses (MOOCs) are one of the most established distribution channels.

MOOCs are no doubt doing a great job in delivering structured courses to a lot of learners across the world, but what about the experience Mrs. Wilner gave the kids in her library? For every MOOC platform, we need a library and a librarian to guide the self-determined part of the learning journey.

Understand the learner

Unfortunately there are too many learners and too much content for Mrs. Wilner and all of the librarians in the world combined. We can’t all enjoy the personal touch of a full time curator who knows you, knows what is out there, and provides you with what you need — and what you didn’t even realise you needed. Instead we need to look for solutions in emerging semantic and digital technology.

To solve the role of the librarian and deliver successful discovery, we need to answer at least two fundamental questions.

What are the goals and interests of the learner?
What relevant knowledge is most closely aligned to this?

The first part of this process aims to understand the context and intentions of the learner; the second attempts to map this understanding against resources that could expand their world.

The right stuff

In recent years discovery and recommendation engines are quickly becoming the norm in popular digital services — just think of Facebook’s Newsfeed or your Instagram feed. Back in 2006, Netflix held a now famous competition to find the best way to recommend content to their viewers. The winning team used machine learning techniques to reveal many new aspects of the discovery and recommendation process. They realised that the rating criteria users adopted for old movies was very different to that used for the latest blockbusters. Also, ratings given on a Friday were significantly different to those submitted on a Monday morning. Subsequently its slick discovery and recommendation mechanisms are now one of Netflix’s strongest assets.

Similarly, the next step in the librarian puzzle involves recommending and delivering the right material to each individual learner. How can we fulfil the promise of better discovery in the knowledge domain, and perhaps occasionally create genuine moments of serendipity?

All Dewey-eyed

To find this out, we could start by asking ourselves how Mrs. Wilner organised her books. The writer Finley Peter Dunne once quipped:

“The first thing to have in a library is a shelf. From time to time this can be decorated with literature. But the shelf is the main thing.”

Wise words. The chances are Mrs. Wilner used Dewey Decimal Classification (DDC) to organise the shelves in her library. Before Melvil Dewey’s system, libraries gave books permanent shelf locations based on when they were acquired. Dewey introduced the concepts of relative location and relative index which enabled Mrs. Wilner to systematically organise her books by subject.

The DDC is now one of many taxonomies that play a role in organising and curating our learning resources. Most of them, however, have a couple of fundamental weaknesses when seen through contemporary eyes: they struggle to deal with multimedia content and often fail to recognise the value a resource can have across multiple knowledge domains.

Dewey, like most librarians before and after him, ultimately had to pick just one shelf to place each resource on. This naturally focuses our attention on a single classification of the content rather than exposing its many possible connections to other subject areas.

Historically, the world of ontologies have been built in 2D, but with the advent of new graph database technology, we can now re-engineer the shelves themselves. True discovery often happens on the edge of existing knowledge or in emerging fields — in the rich seams between subjects. Dewey’s system could never capture and represent these connections due to the technical limitations of his time and the physical confines of shelves.

In his talk, Hillis describes his wish to build what he calls The Learning Map; a dataset of all there is to know in the world. The map would show any teachable skill as a node, with relationships to other skills or assessments. The learner can then navigate through the map. For this to work, instructional resources need to be tagged in such a way that those relationships can be plotted on the multi-dimensional map.

Back to the future

Three years later, Hillis’s vision is slowly coming true. Using technologies like advanced natural language processing, which enable computers to derive meaning from human language, it is now possible to automatically process many types of instructional resources.

To borrow the words of Christopher Nguyen, an expert in the field;

Philosophers, psychologists, linguists, and neuroscientists have studied these topics for a long time. The connection to machine learning and computer science is more recent, especially with the advances in big data and deep learning. When fed with huge amounts of text, images, or audio data, the latest deep learning architectures are demonstrating near or even better-than-human performance in language translation, image classification, and speech recognition.

Making significant headway, the team here at Bibblio has identified and analysed over 150,000 quality video and audio materials in the last twelve months. During that time we have also learned a lot about the building blocks of discovery — rich metadata. Our system now uses rules to determine if a resource is suitable for learning and conducts a multi-layered semantic process to extract as much information about the content as possible. At times this is extremely challenging as the metadata generated still makes it unclear what value the learner could glean from it. The context in which a learning resource is presented can also significantly dictate its value to the audience.

Smart Graph

To make Hillis’s Learning Map a reality, you need to process lots of content from all corners of the web. You must think big or go home. This is the challenge that Bibblio has taken on with our Smart Graph, an ever-improving knowledge map that will empower educators and learners in their quest for better discovery and a wider lens to explore the world through.

Bibblio’s London and Cape Town teams are currently mapping and indexing millions of instructional resources. For scalability, Smart Graph is using automated ingestion, transcription, keyword extraction and concept recognition to gather detailed data on each content item. Once merged with pre-existing metadata, a unique content fingerprint is generated and matched to curricula and subject categories, before being delivered to educators and learners the world over.

The stars align

We’ve hit a Cambrian moment. An ocean of fragmented digital resources is rapidly evolving into a rich ecosystem of quality knowledge. This is forcing the disjointed value chain of publishing, packaging and distribution, to be completely reimagined as a single process. The simplification of this infrastructure will increasingly allow providers to focus on what really matters — creating high value content and learning design.

At Bibblio we are excited to help solve the urgent need for greater and smarter discovery across vast expanses of knowledge content. We are reaching across the web and around the world to work with archives, broadcasters and institutions of many shapes and sizes. Fundamentally, we are working to put educators and learners firmly in the driving seat, giving them the controls to discover the wealth of resources that are now available. That is the end goal. After all, what we cover in class is less important than what we discover for ourselves.

Robbert van der Pluijm, Learning Coordinator
Rich Simmonds, Co-founder
Mads Holmen, Co-founder

Bibblio is a content recommendation platform that helps content businesses and publishers deliver more relevant and engaging discovery experiences to their users. Visit us on Twitter, LinkedIn and Facebook.

More juicy posts by Bibblio:
Clicks vs Satisfaction
Popularity vs Diversity
Trees vs Networks
Education vs Learning
60 YouTube Channels that will make you smarter