Search Anything Model: Combining Vision and Natural Language in Search
In the current AI boom, one thing is certain: data is king.
Data is at the heart of the production and development of new models; and yet, the processing and structuring required to get data to a form that is consumable by modern AI are often overlooked.
One of the most primordial elements of intelligence that can be leveraged to facilitate this is search. Search is crucial to understanding data: the more ways to search and group data, the more insights you can extract. The greater the insights, the more structured the data becomes.
Historically, search capabilities have been limited to uni-modal approaches: models used for images or videos in vision use cases have been distinct from those used for textual data in natural language processing. With GPT-4’s ability to process both images and text, we are only now starting to see the potential impacts of performant multi-modal models that span various forms of data.
Embracing the future of multi-modal data, we propose the Search Anything Model. The unified framework combines natural language, visual property, similarity, and metadata search together in a single package. Leveraging computer vision processing, multi-modal embeddings, LLMs, and traditional search characteristics, Search Anything allows for multiple forms of structured data querying using natural language.
If you want to find all bright images with multiple cats that look similar to a particular reference image, Search Anything will match over multiple index types to retrieve data of the requisite form and conditions.
We have launched the Search Anything Model in Encord Active on Product Hunt. Check it out here.
What is Natural Language Search?
Natural Language Search (NLS) uses human-like language to query and retrieve information from databases, datasets, or documents. Unlike traditional keyword-based searches, NLS algorithms employ Natural Language Processing (NLP) techniques to understand the context, semantics, and intent behind user queries.
By interpreting the query’s meaning, NLS systems provide more accurate and relevant search results, mimicking how humans communicate. The computer vision domain requires a similar general understanding of data content without requiring metadata for visuals.
What Can You Use the Search Anything Model for?
Let’s dive into some examples of computer vision uses for the Search Anything Model.
Data Exploration
Search Anything simplifies data exploration by allowing users to ask questions in plain language and receive valuable insights.
Instead of manually formulating complex queries and algorithms that may require pre-existing metadata, you can pose questions such as:
“Which images are blurry?”
Or
“How is my model performing on images with multiple labels?”
Search Anything interprets these queries to provide visualizations or summaries of the data quickly and effectively to gain valuable insights.
Data Curation
Search Anything streamlines data curation, making the process highly efficient and user-friendly. Filter, sort, or aggregate data using only natural language commands
For example, you can request the following:
“Remove all the very bright images from my dataset”
Or
“Add an ‘unannotated’ tag to all the data that has not been annotated yet.” 2
Search Anything processes these commands, automatically performs the requested actions, and presents the curated data all without complex coding or SQL queries.
Data Debugging
Search Anything expedites the process of identifying and resolving data issues.
To investigate anomalies to inconsistencies, ask questions or issue commands such as:
“Are there any missing values for the image difficulty quality metric?”
Or
“Find records that are labeled ‘cat’ but don’t look like a typical cat.”
Once again, Search Anything analyzes the data, detects discrepancies, and provides actionable insights to assist you in identifying and rectifying data problems efficiently.
Cataloging Data for E-commerce
Search Anything can also enhance the cataloging process for e-commerce platforms. By understanding product photos and descriptions, Search Anything enable users to search and categorize products efficiently, users can ask: .
“ Locate the green and sparkly shoes.”
Search Anything interprets this query, matches the desired criteria with the product images and descriptions, and displays the relevant products, facilitating improved product discovery and customer experience.
How to Use Search Anything Model with Encord?
At Encord, we are building an end-to-end visual data engine for computer vision. Our latest release, Encord Active, empowers users to interact with visual data only using natural language.
Let’s dive into a few use cases:
Use Case 1: Data Exploration
User Query: “red dress,” “denim jeans,” and “black shirts”
Encord Active identifies the images in the dataset that most accurately corresponds to the query.
Use Case 2: Data Curation
User query: “Display the very bright images”
Encord Active displays filtered results from the dataset based on the specified criterion.
Use Case 3: Data Debugging
User Query: “Find all the non-singular images?”
Encord Active detects any duplicated images in the dataset, and displays images that are not unique within the dataset.
Can I Use My Own Model?
Yes, Encord Active allows you to leverage your models. Through fine-tuning or integrating custom embedding models, you can tailor the search capabilities to your specific needs, ensuring optimal performance and relevance.
Conclusion
Natural Language Search is revolutionizing the way we interact with data, enabling intuitive and efficient exploration, curation, and debugging.
By harnessing the power of NLP and computer vision models, our Search Anything Model allows you to pose queries, issue commands, and obtain actionable insights using human-like language. Whether you are an ML engineer, a data scientist, or an e-commerce professional, incorporating NLS into your workflow can significantly enhance productivity and unlock the full potential of your data.
Originally published at https://encord.com.