Using machine-learning intelligence to go beyond simple keyword search

4 min readOct 18, 2017

Below is a post about how Cloudtenna uses machine learning, natural language processing, and an understanding on who is using which files to deliever a far more effective file search than traditional desktop search.

It is a mess to manage files.
(1) We are creating more content than ever.
(2) That content is lost, scattered across more apps than ever.

Employees spend upwards of 20% of their time searching for or recreating documents that already exist (IDC). It is crazy to think that one full day a week of productivity is lost because people don’t know where their files are. Clearly, file search today has not evolved to match the pace of data growth.

Traditional file search sucks. The search built into Windows File Explorer or Apple Spotlight is not effective.

Search is the tool that makes browsing large datasets feasible. Google is able to index the entire internet and, more often than not, return you the exact webpage you are looking for at the top of it’s search results. Why is searching your files so much more difficult? And with data growth exploding, search will only become more important.

What does traditional file search lack that intelligent search would help solve?

Search must encompass all of your files
Search is only as good as the data it indexes. Google indexes the entireinternet. Traditional desktop search, on the other hand, only has access to the subset of files stored locally on your computer. For search to be effective, it must search all the places you store files.

Search must go beyond simple file name queries
It is unrealistic to expect you to remember to remember every file name. Search must index both file attributes like file names and dive into the full-text of the document. That way, if the term you are looking for is buried in the content of a poorly named file, that file will still jump to the top of the search results.

Search must include intelligence beyond keywords
Keywords inside a file are only part of a file’s DNA. A file is a living and breathing organism. Effective search needs to take into account additional search parameters beyond simple keywords.

(1) Natural language processing (NLP) takes into account search terms for similar words and phrases.

(2) File activity tracking understands who touches a file to add additional context to what project that file is associated.

When Google searches the internet, it does more than just search for a word on a webpage. It recognizes that when the New York Times, which Google knows to be credible, links to a webpage, that webpage is likely to be credible as well. Google search results are driven as much by content as they are to who is reading and linking to a page.

Cloudtenna brings this intelligence to an enterprise.

Cloudtenna’s cross-silo search searches on-prem repositories, cloud storage, and SaaS apps like Slack and Salesforce. It is an all-encompassing birds-eye view of all your work files. If Cloudtenna can’t find the file that you are looking for, it probably doesn’t exist.

Cloudtenna performs machine learning in near real-time to help sort through the mound of files inside your enterprise. It creates three core data graphs to determine file relevancy: the Data Graph, the User Graph, and the Corporate Policy Graph. The Data Graph includes information about the file for Natural Language Processing. The User Graph generates a shadow organization chart to understand how files are being used in your organization. And the Corporate Policy Graph adds in additional mission critical file permissions and other compliance functions.

With Cloudtenna search, you can perform a simple search query for the “Walmart” and the system will be smart enough to understand that files relating to the “Sam’s Club” are relevant. It also knows that your boss recently touched the document “Sam’s Club Financial Forcast.xlsx” and will bubble that up to the top of the search results.

In a previous post, I talked about how Cloudtenna uses SPARK in-memory processing to enable this type of real-time data modeling, personalized for each individual user. You can read more about that here.

It’s time we reinvent file search. New forms of search will go beyond the simple keyword search used by traditional desktop search. Cloudtenna, and others, will begin to add context to your files. It’s important that we make this leap to truly intelligent search because we are quickly dealing with more files than we can easily organize and those files are now scatter across more repositories than we can manage. It’s a mess, but search can solve it. Google did it for the internet. Cloudtenna is doing it for your work files.

Using machine-learning intelligence to go beyond simple keyword search

Written by Aaron Ganek, CEO, Cloudtenna