Transforming Notes by Tagging for Search

Ryan Julyan
ILLUMINATION
Published in
3 min readJul 17, 2024
Photo by Keila Hötzel on Unsplash

The Challenge of Tagging and Searching

Tagging is a common feature in systems and is often sold as the backbone of a digital organization. In theory, it's simple: assign labels to your files so you can find them easily. But in practice, it's a nightmare. Inconsistent tags, human error, being ignored entirely, and differing terminologies turn what should be a straightforward task into a Herculean challenge, forgotten in the history books and treated as mythology.

Every time I encounter a tag feature in a document management system (cough, cough, SharePoint, 🤫), it is often overlooked and hidden.

However, Inefficient tagging and searching can cost businesses time, money, and sanity. Teams can spend hours sifting through files, often missing critical deadlines or re-working. You might think of trying various solutions, from rigid guidelines to expensive software, but nothing seems to work long-term.

I hypothesize that it is due to the search, not the tagging itself. Issues include the searcher's frame of mind being different from the set tag and the tags missing some essential context, so the correct documents are not found.

Example:

Let's say I tag my document with the Client. What if when searching, I use the plural, Clients, or instead of client, I use the word Customer? Basic tagging with basic search does not allow for this to be catered for, and will result in no content returned.

A Taxonomy Triumph

Let's pause for a moment and consider Wikipedia. With millions of articles, it's a mammoth repository of human knowledge. How does it manage to organize such a vast amount of information effectively? The answer lies in its robust taxonomy and tagging system.

Wikipedia's taxonomy is meticulously structured, with categories and subcategories that ensure information is logically organized and easily searchable. However, this system has its challenges. Wikipedia relies heavily on human editors who must consistently apply tags and categories, a process fraught with potential errors and inconsistencies.

Despite these challenges, Wikipedia's model demonstrates the importance of a well-structured taxonomy. It shows that combining human oversight and automated tools can create an efficient system for managing vast amounts of information.

Enter NLP: The Game-Changer

Inspired by Wikipedia's example, I propose a somewhat unconventional solution to support better searching: Natural Language Processing (NLP). This isn't just a technological upgrade; it is a paradigm shift. NLP could understand and process human language, automating and standardizing tagging across a vast repository of digital assets.

This approach might be met with skepticism and doubt. Could a machine really understand the nuances of business and documents? While NLP isn't a magic bullet, it could help us treat tagging as a first-class citizen.

This shouldn't just be about keywords; NLP understands the meaning behind the word and provides more relevant and reliable search results.

Double Win

This relates to the feedback I got from the form in the previous post. meaning this could almost immediately provide value to a potential audience.

My Plan to Include Traditional NLP Thinking in the Meeting Notes App

Starting with a careful consideration of the flow could be the key to bringing this task/process to life. Here is my proposed flow:

  1. Categorize File Type/Document Type/Media Type etc.
  2. Convert A File/Document to Plain Text
  3. Extract Keywords from the Text
  4. Look for Specific/Provided Keywords (User Provided)
  5. Analyze Sentiment
  6. Topic Classifier/Categorization
  7. Get Sentiment by Topic/Category
  8. Extract Entities
  9. Extract Nouns
  10. Extract Verbs
  11. Get All Word Forms
  12. Get Synonyms
  13. Store this in metadata in a vector store for better searchability
  14. Extend with Custom Tags (Prompt/Encourage this tagging)

Future enhancements:

  • Describe images and then follow the above flow, including image understanding
  • Text mining from URLs
  • Deal with unstructured data types like Audio (Oh wait, already have some of that mechanism 😉)
  • Predict Intent
  • Longer Context of Each Aspect
  • Define interactions between Subjects and Nouns using Verbs to define better “Scope”

What do you think about enhancing the meeting notes with tags as a first-class citizen? Are there any features your current process lacks that would be a game changer for you?

Please share any feedback here: https://forms.gle/3yszPZ8Ayp6rdeyz6.

--

--

Ryan Julyan
ILLUMINATION

Optimise your business by systemizing solutions! For executives, entrepreneurs, and techies who want to innovate and systematise operations with technology.