Making the Leap from Biased Taxonomies to Data-Driven Text Analytics

Modern computing architectures have trained us to think in taxonomies — using defined keywords to refine data search. Knowingly or not, you’re probably using taxonomies in everyday life. A quick example, search engines require you to enter keywords to retrieve relevant data. As consumers of information, we leverage taxonomies for a simple reason, to make analysis efforts more focused and results more fruitful.

The same holds true in analytics albeit it’s a much more involved process. At Taste Analytics we’ve seen many complex taxonomy structures used within the walls of corporate America. We’ve seen small armies of analytics experts devote hundreds of man hours to filling pages and pages of spreadsheets to create logic strings to help them to parse data.

Anyone tasked with managing taxonomies knows that this process comes with plenty of downside. Specifically, taxonomies are limited because they:

  • Require a lot of effort. Much time, resources, and expertise are needed to accurately capture the jargon used to depict certain types of conversations.
  • Need constant updating. Existing taxonomies need to be updated as language continually evolves, while new ones need to be created as business needs expand.
  • Are biased. A small group of people tasked with creating a taxonomy fight a never ending battle determining how an entire population will express themselves in text.

Alongside these challenges, a greater question arises. Is a taxonomy-focused approach still viable when handling the explosion of information being created in the dynamic market of human generated unstructured data?

At Taste Analytics, we believe that taxonomies are a valuable tool that will always have a place in an analysis regiment. That said, there’s an easier and more effective way to glean insight from the many forms of language used in daily communication. When it comes to the discovery process, using machine led insights as your starting point will depict a truer representation of what data is trying to tell you — without bias!

Data Driven Text Analysis

At Taste Analytics, the discovery process is completely data driven. Algorithms and statistics crunch data objectively and in an automated fashion. Using deep machine learning algorithms our engine is able to statistically identify the themes, topics and emerging issues from any data set without using taxonomies. This provides the end user with an unbiased starting point from which to depict patterns and trends for each and every data set processed by our engine. In very simple terms, a three-step process takes place.

  1. Our platform classifies the data. We extract the “who”, “what”, “when”, and “where” from each piece of content labeling it appropriately.
  2. Our platform categorizes the data. We break down the content into n-grams (e.g. bigrams — two word phrases) and build categories of semantically similar conversations.
  3. Our platform makes results easy to interpret. While still in its high-dimensional form, results are converted into intuitive visualizations that end users can consume on mobile, web or desktop workstations.

The net result — our taxonomy-free, end-user driven platform automatically provides a statistically relevant representation of what’s contained within a data set, augmenting the user’s ability to extract actionable intelligence.

Data Driven Insights in Action

It’s tempting to apply a taxonomy to a data set to start the analysis process. Why wouldn’t it be, you’ve invested heavily in creating and maintaining them. But resisting this urge can yield insights that you might have otherwise missed.

Some real world examples:

A large home retailer used Taste technology to analyze customer feedback from emails they receive. They correctly assumed they’d see complaints about their store layout, certain product lines, and a campaign that didn’t go over well. What they didn’t expect the data to tell them was that there were ongoing problems with stores not properly recognizing their military discount policy. Retail locations in certain geographic regions were denying the discount to qualified members of the armed services causing much frustration with their clientele. Once this issue was identified, the company was able to quickly reeducate their employees on proper military discount policies.

A large manufacturer used Taste technology to analyze feedback captured by their customer service representatives. They correctly assumed that certain product issues were prevalently discussed. What they didn’t expect the data to tell them was that there was an ongoing issue with registering their products online. Products were stamped with 10 digit serial numbers yet online registration required customers to input a 15 digit number. Once this problem was identified, the company was able to quickly update its website to accept serial numbers as printed on their products.

By eliminating the inherent bias of taxonomies, our end users were quickly able to find pain points that weren’t expected and remedy them before they grew into larger issues.

The Taste Takeaway

Taxonomies have and will continue to play an important role in analytics, but by choosing to lead with data driven insights first:

  • Analytical efforts will be based on an objective foundation
  • Time and resources to maintain complex taxonomies will greatly diminish
  • Focus can be directed on analyzing data rather than structuring it

For true discovery to occur employing technologies that are grounded in data driven methodologies offer a superior alternative to the end user.

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.