The Future of Search Engines: Next-Gen Conversations Through Contextual Search
“Now I see what you’re saying!”
You’re fresh out of school and you just started your first job. You walk into a meeting and sit down with your computer. The CEO launches into his presentation. You’re sitting there, laptop out, ready to go, but the first sentence in he says “We need to double MRR by end of Q2, here’s how.” Oh, no. You know you’ve heard “MRR” before, but you can’t remember what it stands for. Also, you know these “Q’s” separate out the year, but when exactly is the “end of Q2”? You reach for your laptop, open your browser, and are about to type in “what is MRR” when you realise that your manager, and his manager, are sitting at your sides — you’re not sure you want to advertise your ignorance so soon. So you sneak out your phone and enter into a search engine “what is MRR”. A card pops up “MRR stands for ‘monthly recurring revenue’.” Oh, of course! You knew that, you just couldn’t remember it. Then you do a search for “q2 months of year”, switch over to images, and see a chart that shows Q2 ending in June — oh yeah, of course, they’re quarters of the year. You glance up and realise the CEO is on the third slide, and you’ve missed the last 40 second explanation of why this MRR deadline is so important. Now you’re behind.
How could this have gone better? You already knew what MRR is from reading about it, you just weren’t able to recall it when your CEO spoke it aloud. You also have an intuitive understand of “the end of June”, but you’re not used to thinking in quarters. Your search engine queries met your needs, but the process of pulling out your device, opening the app, and forming a query took you out of the context of the meeting and into the context of your phone, making you miss critical information that came next. What if that information had been available to you immediately, without requiring you to switch context to your phone? What if, as soon as you heard a word or concept that you didn’t remember or already know, you had immediate access to the information that you needed?
Imagine you have a personal team of assistants that are remotely hearing what you hear and seeing what you see. The team is comprised of experts in many domains — technology, business, neuroscience, history, etc. This team has been with you for several years, and has a detailed representation of everything that you’ve learned, your experiences, and even the ways that you think. In your meeting, when the CEO says “MRR”, your “business-expert agent” suspects you don’t remember this term, so he presses a “talk button” and whispers in your ear “monthly recurring revenue”. As soon as you hear that, you understand, and it happened so quickly that you didn’t lose track of what the CEO is saying. When “Q2” is said, your “data-expert agent” sees on his computer that you have never used quarters to refer to a time of year, so he quickly whispers in your ear “late June”. Again, this simple information takes only a brief moment to attend to and integrate, but it drastically increases your understanding of the conversation.
This type of system is a fundamental way to improve our ability to execute and increase our knowledge, intelligence, and rate of learning. Unfortunately, it’s not reasonable for 99.9% of people to have a full time personal assistant team. However, with the modern ecosystem of AI, wearable devices, and powerful edge compute + storage, it’s just now becoming possible to fully automate this process and upgrade human cognition in the process. The automation of this process results in a system I refer to as a “Contextual Search Engine”.
The Cognitive Workflow — Search Engines of Today (2023)
Search engines are the prime example of a technology that improves our cognitive capabilities. Today, they form a fundamental step in our cognitive workflow, so much so that absolutely everyone — despite their age, occupation, or interests — uses a search engine everyday, multiple times a day. This article is about the next generation of search engine, so let’s start by looking at what a search engine actually is, and why they’re fundamentally important to humanity.
When thinking, solving a problem, brainstorming ideas, or performing various cognitive tasks, one often comes across a knowledge gap. A knowledge gap is missing knowledge, or some knowledge that you may hold, but are not able to recall in the current moment and current context. One identifies a knowledge gap as information that, if accessible, would help one better perform the current cognitive task. The next step is to formulate a query whose answer would fill in the knowledge gap one has come across. One then switches context to their device, opens a search engine, types or speaks the query, and receives an answer. With the new information in hand, one is then freed to continue their cognitive task, empowered by their new knowledge.
1,000 years ago, things weren’t so easy. When one came across a knowledge gap, they had to live with it, or search for a learned person who might hold the knowledge they seek, or individually spend countless hours trying to figure it out for themselves. Needless to say, the difficult of bridging knowledge gaps 1,000 years ago would stop most cognitive processes in their tracks. The problem of going from query to answer was very, very hard.
50 years ago, libraries had already become the chief method of solving the problem of going from a query to an answer. When one came across a knowledge gap, they could go to the library and search for books on the topic. The semantic ordering of the Dewey Decimal system and references in books allowed one to find related information. What used to take years could be done in a few days at the library.
Today, search engines have brought that process down to seconds by automating the most time-consuming step — going from a query to an answer. Instead of manually identifying useful sources and manually sifting through the information, search engines have indexed everything, allowing us to skip over resource identification, searching, reference hopping, etc. and go straight to the answer. This automation is not a trivial thing — it has completely revolutionised the way that humans think. The average person performs half a dozen to dozens of queries per day to find the answers or resources that they need. In conversations, while working on problems, or while learning new information, we’re constantly met with things that we don’t know. Search engines are the first place we turn to fill in these knowledge gaps.
Problems of Modern Search Engines
However, today’s search engines have weaknesses that drastically limit their power.
- They’re too slow to use. When you’re in the middle of a conversation and you don’t understand something that was said, you don’t have time to figure out the right question to ask, pull out your phone, and search for that thing. It’s very common during conversation that extra information would be useful, but the average time of 20 seconds that it takes to pull out one’s phone, open a browser, search, and find an answer is too long. This action is also too mentally resource intensive — it takes our attention away from our conversation as we switch context out of our conversation and into our phone. For this reason, search engines don’t live up to their potential in conversations.
- They help us when we know what we don’t know — they don’t help us when we don’t know what we don’t know. Our usage of search engines today is usually explicit and directed — we realise there’s some knowledge we lack, and we use a search engine to find it. The opportunity to discover new things things — knowledge that we don’t know that we don’t know, is untapped by today’s search engines.
- They only act on public knowledge, not private knowledge. So often, the most valuable information in a given context comes from a previous conversation, an email, a book we read, etc., the presentation of which would trigger our existing memory. But today’s search engines consider none of these data sources.
A contextual search engine is an upgrade to the modern search engine that solves these problems. It solves them by automating the manual steps of today’s search engines — knowledge gap discovery, query formation, and search. A contextual search engine listens to your conversation, identifies a knowledge gap, and immediately provides an answer in a modality that doesn’t require you to switch context. The contextual search engine doesn’t just tell you what you know you don’t know — it continually searches for relevant information to the current conversation and presents that as a prompt for further thought, discussion, and exploration. The data that a contextual search engine searches includes not only public knowledge, but also all of the information that you have experienced in the past — your private knowledge base.
Why Upgrade Conversations?
I care a lot about conversations. Human technology has been hyper focused on continuously improving our remote interactions, but we’ve done almost nothing to improve face-to-face interactions. A conversation with your friends in the living room today unfolds in a manner that is nearly identical to what we would have done 50 or 1,000 years ago. We augment ourselves in every way, yet the most important bit of our existence — our relationships and connection to others — remains largely unenhanced.
In fact, we are regressing in this regard. Whereas the early internet was about about connection and communication — chat and conversations — promising a future where we might enhance and extend the possible feelings of felt-presence that we share with each other — our modern internet has devolved to a circus of feeds. One-to-many distribution can be extremely useful and valuable, but this format does not replace social interaction and synchronous communication. Feeds are an asynchronous, depersonalizing channel that has largely been overtaken by attention hacking information. “Social media” isn’t social — most of the content one views on social media is created by people that users don’t know and never will, and the modality of engagement is not a social one of presence and experience.
Conversations are a fundamentally important aspect to our existence as intelligent agents. They’re the foundation of our relationships with other intelligent beings. They’re where we learn, grow, explore, laugh, and cry. Conversations are the reason we achieve higher intelligence, as language encodes our knowledge and models of the world, and conversations are how we pass it around. Convo is king, yet it’s stuck in the stone age — it’s time to upgrade our conversations to allow us to understand each other deeper, learn more, and go further. It’s time to add a new layer to the synchronous channel of communication from person to person. It’s time to upgrade conversations.
Another aspect of “why conversations” is practical — conversations are the only time that live human thought is encoded in a way that we can digitally represent. Conversations cause us to make our stream of thoughts into speech, which we can capture with computers, and then do all sorts of wonderful things with.
Finally, conversations are form of thinking. Speaking thoughts aloud, working ideas out with friends, and presenting information in a way that can be understood by others is an effective tool for critical thinking.
In summary, conversations are fundamentally important for two reasons:
- Communication, felt presence, and relationships are the most important thing to humans.
- They are the only time we think in a way such that we can capture the stream of thought digitally.
- Conversations are thinking — and thus improving them is improving thinking.
Why Just Conversations?
However, conversations are not the final frontier for contextual search engines — thoughts are. This article explores the use of a contextual search engine when thoughts are being spoken aloud, and specifically when thoughts are being communicated. This is achievable in the near term using technologies like automatic speech recognition (ASR) and smart glasses. However, it’s likely in the not-so-distant future, we will achieve wearable semantic decoding brain computer interfaces which allow for the direct reading of thoughts from functional brain imaging in daily life. Importantly, this is not about subvocalization (another form of speech that people don’t normally employ while thinking), it’s about directly decoding the semantic content of thoughts in a user’s brain.
When this type of interface is achieved in a practical form factor with real-world/in-the-wild accuracy, then the methods, arguments, and ideas this article presents for conversational contextual search engines will apply directly to the contents of our thoughts. Once this type of interface is achieved, it might not be long before we achieve semantic input directly into the brain — allowing us to not just provide visual or audio information, but to directly inject thoughts into our brains by reactivating the neocortical spatio-temporal patterns that represent those ideas — directly recreating neurological latent space states. This feedback loop of thought, external search/inference, and re-injection of thought might very well be a “merge” killer use case.
Examples User Stories of a Conversational Contextual Search Engine
The discussion so far has been somewhat abstract — knowledge gaps, queries, context, etc. What are some actual use cases of a contextual search engine, and how is it better than the search engines of today? Let’s look at a few concrete example that just scrape the surface of what a contextual search engine could do:
- You’re pitching your startup to a venture capitalist. They interject and ask how your solution is different from “X Inc.”’s solution? You’ve heard of X before, you have a big spreadsheet on your computer of all competitors, but you can’t remember exactly which one they are right now. Before you even realise your ignorance, the contextual search engine has searched the web for “X” and presented their logo and quick summary of their business, jogging your memory. Milliseconds later, the contextual search engine pulls up the row in your spreadsheet where you describe your competitive advantage. You’re able to answer the venture capitalist with confidence.
- You’re talking to your coworker and they mention their kids aren’t practising piano, and they’re thinking about paying them to practice. You start explaining there’s some related research on extrinsic vs. intrinsic reward that you read about years ago. Your contextual search engine hears this, finds your notes on the paper you’re discussing, summarises them, and presents a couple of bullet points that trigger your memory of the paper immediately. You’re now able to remember and explain the relationship of the research to your coworker’s decision. A quick gesture allows you to send the paper reference to your coworkers, backing up your claims.
- It’s your first time meeting your boyfriend’s parents, and you’re a bit nervous, you know his dad is big into politics, something you don’t know much about. As the conversation at the dinner table picks up, his dad starts talking about “Guantanamo Bay”. Before you can even panic that you don’t remember what that is, your contextual search engine has overlaid a summarised definition of Guantanamo Bay on your vision. To help you recall what you know about the place, it finds a documentary you watched years ago about George Bush and extracts a picture of the entrance to Guantanamo Bay to trigger your recall. Your memory comes flooding back, and you’re able to provide the viewpoint you’d developed years ago when you watched that documentary.
- You’re taking a 3 week vacation to travel the world. You want to see places and cultures unlike you’ve seen before, so you decide to go to Southeast Asia. Once there, you meet people from all over the world. It used to always be a little awkward and slow when people mention they’re from a place you’ve never heard of and know nothing about. Now though, whenever someone mentions a city or country, you instantly see a world map in the corner of your vision with a pin on the place mentioned. The map zooms in to a satellite view and reveals a closer look at the location, with information about the local language and population.
- As in the introduction, you just started a new job and you’re in a meeting with the CEO. She says “We need to double MRR by end of Q2, here’s how we’re going to do that.” As soon as the words are said, the contextual search engine has identified your knowledge gaps and filled them in. You see “MRR : monthly recurring revenue” and “Q2 : April — June” overlaid on your vision. In the amount of time it’s taken your CEO to take a breath, the contextual search has filled in your knowledge gaps, and you’re still listening and alert.
These are just a few of many, many examples in conversation when extra information, whether personal or public, could aid in improving user’s capabilities.
Private vs. Public Knowledge
There is a “public knowledge base” which contains all of the collective knowledge of the human race that has been made public — books, articles, lectures, courses, recordings, etc. that is available to anyone. This is the type of information that one usually accesses when utilising a search engine.
At the same time, every individual has their own collection of notes, experience, emails, conversations, internet browsing history, etc. that forms their private knowledge base.
These overlap, as the user’s personal knowledge includes a list of all of the public knowledge that a user has consumed. If two books discuss the same topic, but one of the books has been read by the user, it would be better for a contextual search engine to reference the book that they user has read, as it will serve as a memory trigger for what the user has already learned and encoded in their own brain.
The approach of a contextual search engine consumer product should be to provide immediate value to users by injecting relevant public knowledge into the users conversations and workflow. Then, as time goes on, the system collects data about the user’s conversations, media consumption, education, etc., and starts to search not just public knowledge, but private knowledge. As this transition takes place, a contextual search engine transitions from a tool to not only discover and pull in public knowledge, but as a tool to augment and extend the memory of the user — a memory augmentation device.
Ignoreability
The information presented by a contextual search engine must be ignoreable. This means that if the user doesn’t want any extra information right now they can easily ignore whatever the system is presenting. The act of ignoring should place minimal cognitive load on the user, not affecting their other activities. Ignoreability is also temporally constrained — if the user wants to ignore the information right now while they finish their sentence, and then attend to the information when they’re done their sentence, they should be able to do so (time restricted ignoreability).
Audio interfaces are ignoreable, but they don’t have time restricted ignoreability. If you ignore the info, you can’t pay attention to it later in a few seconds, because it’s already gone by then
Visual interfaces are ignoreable, the do feature time restricted ignoreability, and they are constant and thus can present information to the subconscious mind, which can filter and pass up only relevant suggested information to the conscious mind.
Platform — Human I/O
A contextual search engine requires a hardware platform that can:
- Sense and capture the user’s environment to understand their current context and conversation.
- Provide information to the user.
- Work in a way that is ignoreable and doesn’t require a context switch to use.
There are a few options for how this type of system could be achieved:
- Smart-phones and/or desktop computers. One can display contextual search engine output on your screen. This is low hanging fruit for video calls and a good environment for rapid prototyping. While a useful and valid contextual search engine could be built on just a phone or desktop computer interface during video calls, these platforms don’t work well in the real world — switching attention away from your conversation and into your phone is a bad user experience, taking you out of your conversation.
- In ear microphone and speaker and/or audio smart glasses. Earbuds/airpods and audio smart glasses (smart glasses without displays) can sense the environment with omnidirectional, far-field microphones and use be paired with text-to-speech (TTS) to read out information to the user. This is a promising platform as it solves the context switching problem, but suffers from the fact that speech is temporal and linear — you have to listen to it when it’s being said, or you miss it. You don’t have as much attentional control. Speech is ignoreable, but if ignored, it can’t be absorbed afterwards.
- Smart glasses with display. Smart glasses have microphones to capture context and speakers to play speech. They solve the problems of context capture just like a wearable audio solution — but they also solve the problem of ignoreability and context switching. The do so by presenting information visually to the user, such that they can ignore that information completely, or delay paying attention to it until they have a chance.
- Neural Interfaces. A semantic encoding and decoding interface that could read your thoughts and inject new thoughts, paired with all of the visual and audio wearable sensors on smart glasses.
Why Smart Glasses?
This article will explore the use of smart glasses as the platform for widespread consumer adoption of contextual search engines. This is because they meet the fundamental needs of a contextual search engine while remaining possible today (wearable, real-world-ready semantic neural interfaces are still some years away, as of this writing).
- Always with us — no matter where we are or what we’re doing.
- No context switch to receive information — always available visual and audio human input.
- Environment/contextual sensing — sensors hear what we hear and see what we see.
Contextual Search Engine Prompts
A “prompt” or “trigger” is the information the contextual search engine provides to the user.
Memory Triggers — Less is More — Cognitive Summarization
The modality of contextual search engine that is most useful for live conversations is one in which the information presented needs to be incredibly terse. There is no time to read a long winded description of something mid-conversation.
Whenever the information that a contextual search engine wants to present is something that a user has already learned, the focus should be on triggering the users’ existing memory. If the information is new, something the user doesn’t already know, then the information should be summarised to the minimum necessary to allow the user to continue to follow and understand their current conversation. In summary, contextual search engine prompts should maximize user knowledge triggering while minimizing consumption time.
Visual vs. Audio
Visual and audio information are the two main modalities of receiving digital information, today. This is quickly growing to include other modalities, and will eventually grow to be a direct neurological semantic interface, but for now, consumer-facing products will likely rely on audio or visual.
There are benefits and downsides to each, too in depth for the scope of this article. Briefly, audio is beneficial in its ability to be used in a wearable immediately, it’s been shown to be easy for humans to ignore audio, and speech is a basic and easy to understand input method. Audio suffers from the major problem that it’s linear — if you ignore it in the moment, you won’t get it back. Visual information, on the other hand, can be ignored for some time, and then glanced at briefly when the moment is right. Visual information also affords the ability for subconscious perusal — one’s subconscious brain can be scanning visual information in the periphery, and only cause one to consciously glance at that information if it’s deemed relevant and useful in the moment.
The modality of the trigger also depends on the nature of the memory — if you’re trying to remember a song, a visual cue probably won’t do much. If you’re trying to remember a face, audio won’t help. The modality will often follow the information.
The issue of memory trigger modality could use an entire article (book!) on its own, but let’s continue.
Challenges of Contextual Search Engines
False Positives
Imagine you wake up, walk into the hallway, and run into your girlfriend in the kitchen — she got up early today. She looks up at you, smiles as you walk in, and says “I brought the dog for a walk already, honey.” As she says this, your contextual search engine pulls up the page for Canis, giving you a quick summary of what a dog it, what they look like, and any recent conversations you’ve had about dogs. The system isn’t wrong, per se, but it’s providing you with information that is not useful to you at this current time, in this current context.
If a contextual search engine were to constantly provide you with information that is not valuable to you, it would lead to a number of negative consequences, like information overload (overwhelm) and annoyance. This would ultimately lead to you ignoring the contextual search engine output, destroying any chance of you getting value from it when the information provided is useful. This is a fundamental challenge of an implicit interface — we want to proactively provide information or do things for a user before they even realize they want it, or faster than they can command the device to do so, but we don’t want to ever do anything the user doesn’t want us to do.
How do we solve this? There no golden ticket, but there are a few approaches to solving this problem. The first comes from personalization. The system will build a database of every conversation you’ve had, book you’ve read, website you’ve visited, etc., so it knows what you know and what you don’t know. The system can search through your memory and ask “does the user already know about this thing?” If I’ve used the word “dog” 1,000 times in the past — I likely don’t need a definition of it. This is on a per-individual basis. If someone mentioned smart glasses to me, I don’t need to see a definition of what they are. For some family friends and aquaintances, however, they’ve never heard the term “smart glasses”. For them, their contextual search engine would mine their past, come up blank for any previous exposure to ‘smart glasses”, and thus pull in the data that they need. For first time users of a contextual search engine, they have not yet built up a database of their knowledge. In this circumstance, a “20 Questions” style survey could be utilized, where questions about education, background, occupation, media consumption, etc. could be asked of the user to build a general model of their knowledge in a very short time. All of this personalization needs to be done to ensure that only valuable and relevant information is displayed by a contextual search engine.
The public-knowledge frequency or popularity of a thing is another way to determine if information about it should be displayed. If someone mentions a word, concept, event, place, etc. to me and the contextual search engine needs to decide whether or not to define that entity, it can reflect on how common or popular that entity appears in the public knowledge base. When someone mentions “Canada”, a new user might not have “Canada” saved in their personal knowledge base yet, but this is such a well-known place that it’s likely not useful to give the user more information about “Canada”. Metrics such as word frequency lists on the web, the Wikidata Qrank of a concept, the number of search results for a specific thing on a classical search engine, etc. can serve as proxy metrics for the global frequency of an entity. For example, someone might say “smart glasses in the 1950s were an anachronism…”. Even though you might have heard that word once or twice before, the system can recognize that “anachronism” is a very rare word, and this will increase the likelihood that it will be defined.
Another metric to consider is relevancy. Above, we decided not to show more information about Canada because it’s a common, well-known country. However, if the contextual search engine heard you say “where is Canada?”, then the relevancy of that information is increased to the point that it would be worth showing. Or, perhaps you’re sitting in a talk about the Canadian oil industry and “Canada” is mentioned 3 times in the first 20 seconds, the relevancy of the term might have increased to the point where it’s worth showing more information about Canada.
Finally, to determine what information to show, we can look inward and consider the user’s response to information. I have performed an experiment on myself and others as follows: watch a conversation (example, a podcast) and pay attention to any moment in which you experience hearing something you don’t fully follow or understand. Myself and others have experienced a distinct, repeatable feeling when a foreign word or concept arises. It’s as if there is a discrete shift in our mental state that happens when one goes from following along semantically to to no longer following. There is a pulling back, going inward, and trying to figure out what’s going on. Psychologists refer to this type of event as a “gestalt shift”, and there are downstream effects in our physiology — from P300 ERPs, to EEG Shannon Entropy deltas, to discrete shift in auditory attention entrainment, to increased heart rate, decreased HRV, head and hand twitches, facial expressions, eye movements, etc. The brain and body can be sensed to understand our understanding — to identify when we are following along and when we are confused. These types of implicit cues can serve as powerful inputs to tell a contextual search engine when to pull in new information. Interestingly, the consumer-facing BCIs of the future that sense these metrics will require significant scalp real-estate and a head-mounted computer in order to operate. Smart glasses are the ideal candidate as the form factor for all-day, everyday BCIs.
Wearable Hardware
It’s Q2 2023 right now, and we still don’t have smart glasses with displays that can be worn all day, everyday. I personally own an array of the world’s most advanced smart glasses, some of which are not even on the market yet, and none of these have hit the physical comfort, social comfort, and usability bar to be worn all day. However, as I discuss elsewhere, the optics technology is coming of age, and the hardware OEMs are finally realising the need to cut back on features and focus on comfort, such that I expect to see all-day everyday smart glasses (with displays) hitting the consumer market in ~ 2024. That is to say, this problem won’t be around for long, and so now is a good time to start building.
The smart glasses hardware industry has had a problem for quite some time. Hardware makers define an overly ambitious feature requirements for their hardware — stereoscopic HD displays with cameras, WiFi, multiple microphones, speakers that replace your headphones, a full applications processor, etc. The reality of the optics, wireless communications, processor, and battery technology today is that these requirements need to be massively rolled back to create a pair of smart glasses that are light and small enough for consumer adoption. Monochrome, monocular, camera-less, BLE/UWB/HBC communications, microcontroller as processor, ultra-low power designs are what is required to hit the form factor requirements — something that only a few companies actually take to heart, with most spending millions to build a 150 gram brick that you can’t wear for more than 30 minutes (if the battery isn’t dead by then).
Fortunately, most of the immediately valuable and sought after use cases that have real ROI in users lives don’t need immersive mixed reality (MR) glasses — they can run on light, slim, feature-light smart glasses. Contextual search engines, intelligent assistants, live translation, live captions, shopping assistant, notifications, etc. are all possible with this type of glasses. All they need is a display and microphone to enable those things to happen.
The realisation that the first wave of smart glasses will all use microcontrollers means that apps will have to run on a connected smart phone. This realisation is the reason we’ve built the Smart Glasses Manager — a way to run apps on your phone that you see and interact with on your glasses, allowing developers to write 1 simple app that runs on any pair of smart glasses: https://github.com/TeamOpenSmartGlasses/SmartGlassesManager
Context Capture: Signals and Sensors
The signal to noise ratio of wearable environmental sensors needs to be high in order to understand the context and use it to provide helpful information.
Today, we have very powerful ASR systems when they’re running on high quality speech audio. However, the real world contains a lot of noise, and necessarily includes a distance between the worn sensor and the people the user is trying to transcribe. This noise and separation leads to poor signals. We will need our wearables to employ audio sensing technology that is enhanced for user speech recognition and also environmental speech recognition. Systems like omnidirectional, far field microphones are a start, but we’ll likely need microphone arrays in our wearables to sense high quality audio to pump through our ASR models.
Some of the challenges in sensing have led me to develop a wearable microphone array (https://github.com/CaydenPierce/MSA) that eventually was integrated into a pair of smart glasses (https://github.com/TeamOpenSmartGlasses/OpenSourceSmartGlasses). However, these are prototypes, and we’ll need consumer-ready products that are physically and socially comfortable, run all day, and achieve high SNR to achieve powerful contextual search engines.
Related Work and Where Does This Idea Come From?
“It would actually be quite useful if somebody would listen to your conversation, and say, oh that’s so and so actress, tell you what you’re talking about… that can just about be done today where we listen to your conversation, listen to what you’re saying, listen to what you’re missing and give you that information.”
- Ray Kurzweil
No one person or company can lay claim to the idea of the Contextual Search Engine. It’s a concept that has been growing and shaping in research, science fiction, and the public zeitgeist for many years. This article and my own thinking aim to clarify, refine, and discuss the use case and how it might be achieved. This application has gone by many names, and probably will continue to do so, but I hope we can start to recognise it under one name, and Contextual Search Engine is a prime candidate, because that’s exactly what it is. For some excellent past work in this area, checkout the Remembrance Agent project (among many others).
Feel free to look into the keyword “implicit interface” for more information. Another critical keyword is “Search and Discovery” — specifically the “discovery” part.
Shareability and Improving Face-to-Face Communication
The idea of a contextual search engine has largely been explored here through the lens of providing information directly to the user, and only to the user. However, the most transformative technologies have strong memetic tendencies — they spread because they don’t just help the user, they help the people around the user.
In a conversations, the participants become part of a system that is greater than the sum of its parts. Information technologies like a contextual search engine have the potential to improve that system, helping all interlocutors better understand each other and express themselves. A step in this direction is what I’ve termed a “Wearable Referencer”, which listens to what you’re saying, searches your personal knowledge base for your sources, and streams them to your conversation partner. This type of system could create a new layer of semantic exchange. It could allow us to spend less time proving facts and more time discussing their implications. Of course, I could just make up references or send you unrelated references, and you wouldn’t have time in the conversation to check them. But your wearable intelligent assistant will be representing you, sifting through the large amount of data I am streaming to you from my wearable intelligent assistant, fact checking and argument mining what is said. One can imagine a beam of light connected from my head to yours, a new cyborg data layer in our conversations, atop language, that helps us better express ourselves and understand each other.
Since your system knows what you know, it could constantly be translating what I say and my contextual search engine output into a representation that you understand. Maybe I learned about a concept from a mathematical view, and you learned it in a programming class, so we use different names and language to discuss the same topic — our intelligent assistants would recognise this and translate from one representation to another.
Conversations could rise above linear streams of language and instead begin to become semantic objects that are modifiable and queryable by interlocutors. A simple example would be a mind map representation of your conversation. Two conversation partners, both wearing smart glasses, could see a 3D mind map floating in the middle of the room. As they talk, concepts they mention could be added to the graph of the mind map. Each concept is represented by its own bubble, which grows and changes color as it becomes more or less important to the conversation. Maybe the graph is mapping to a public knowledge graph and shows “semantic holes” — areas that are closely related to your convo but that you haven’t yet discussed. If you mention a bit of data — say how many smart phones were sold over the last 10 years, the graph is automatically loaded and connected as part of the graphical representation. You could then interface with that data in natural language with you conversation partner — manipulating it, comparing it to other data sources, normalising it for world population change, etc. This ability to turn speech into a semantic and programmatic representation could allow us to better hold the content of our conversations while pushing us to explore further and go deeper.
Implementation: Continuous Inference Task
The implementation of these systems are an interesting continuous inference task. They require a new kind of operating system for our devices which has an overarching intelligent arbiter that can retrieve data from any source. The natural language processing, wearable technology, human factors, biosensing, etc. engineering and science required to achieve this is vast and difficulty. It’s likely the current recent advancements in large language models and downstream named entity recognition, augmented retrieval, summary, etc. will serve as fundamental building blocks to the realisation of contextual search engines.
It would require a much longer article to describe all the details of implementation, and much of that would change in a month or two, so I’ll leave that description for another post. I’m working with multiple institutions and teams to develop contextual search engine technologies — please reach out if you want to collaborate.
Conclusion
A contextual search engine is an always-available system that answers your questions before you ask, provides you with useful information exactly when you need it, enhances your memory, and deepens conversations by understanding your context and knowledge.
Appendix
- Nils Pihl on synchronous conversation — https://youtu.be/4xu4_NThoZ4?t=665
- The Rememberance Agent — https://link.springer.com/article/10.1007/BF01682024
- Emex Labs — Building Contextual Search Engines — https://emexwearables.com/
- The AR Show — Cayden Pierce Interview about Contextual Search Engines — https://www.thearshow.com/podcast/148-cayden-pierce