Uncovering the Value of Unstructured Data for Your Business
TAKEAWAY:
Most companies make business decisions based on structured data, which are a fraction of the total amount of available data. With simple creative approaches, any company can start getting business value out of unstructured data.
Before exploring how unstructured data can help businesses, let’s define it relative to structured data.
Structured data refers to data that have a high level of organization, that is, structure. Data that fall into this category are usually easy to analyze through a set of parameters. Unstructured data, in contrast, are harder to analyze because they lack the necessary level of organization.
Canvas.ly — an imaginary, but totally realistic, case study
To further illustrate, let’s imagine Canvas.ly, a fictional e-commerce site where you can upload your favorite picture, and then select sizes and types of frames to have them printed and shipped to you.
Canvas.ly has in its database all orders made online since its inception. To help grow the business, Canvas.ly’s digital marketing manager is exploring better online advertising targeting, and wants a list of all orders matching the following parameters: male customers, aged 30–40 who placed orders through an Android smartphone in New York City. This would be considered a simple database query where “orders” refers to structured data in Canvas.ly’s database.
Machine learning can, of course, be applied to structured data. One could create a model that is capable of automatically segmenting buyers into clusters that aren’t obvious from the orders database. Such a model could reveal a correlation between high-value orders and users matching certain criteria. Most of such correlations can, however, be uncovered by some sort of standard analytics approach, without requiring machine learning.
Unstructured data — finding meaning where there was none
In the case of Canvas.ly, it has myriad unstructured data at its disposal. All e-mails received by the customer service department, for instance, are pieces of unstructured data. Although there are certain levels of organization in e-mails like sender or order number, it’s very hard to sort out all emails at once in order to answer questions that the business may ask. Similar to e-mails, all social media content is available as unstructured data. An approach to turn text-based unstructured data into structured data using Google’s Natural Language Processing API is seen here.
Another important example of unstructured data for Canvas.ly is the collection of images that users have submitted with their orders. They are all saved in Canvas.ly’s database and they have attributes associated with them like user IDs, dates, order amounts, etc. But the images themselves are just pixels that are not simple to query. The images uploaded by users are very well organized, but still unstructured.
As a matter of fact, most of Canvas.ly’s accumulated data are unstructured. But ironically, the business will nonetheless make the majority of its decisions based on the easily searchable structured data, while leaving this potentially valuable data on the cutting room floor. Canvas.ly spends a sizable amount of money every year to target potential customers through online ads. Thanks to great analytics work, they have a sense of when and where to invest more to get better results. But what if they could do better?
How machine learning can turn unstructured data into a potential goldmine
So let’s say Canvas.ly decides to take a serious look at the images submitted by users in order to see if they could uncover insights to better target potential customers online. As pixels, the imagery would be of little help. But machine learning can make sense of all those pixels and bring back valuable information in the form of labels.
As an example, by simply running the Google Cloud Vision API on its images, it can get invaluable information that can be added to the structured database.
Now the wealth of extra information can provide the business a lot more dimensions in the structured data for richer analytics. As a consequence, it will be able to find out where and when to invest more of its advertising budget.
Going beyond the vision API, Canvas.ly can build specific object identification models using Tensorflow in order to identify higher-level abstractions contained in the image database. As an example, it may find out that wedding pictures sell exceptionally well in May across the U.S. and that users that can pay a higher price tag usually upload pictures where the brides are wearing dresses with certain features. The generic vision API will only get Canvas.ly so far to find out if a new picture contain dresses with such details, but the approach can be extended using custom models that will localize dresses and look for patterns that reveal a high-income user.
Building custom machine learning models is time-consuming, but the effort may pay-off as a highly competitive business advantage. The right place to start, however, is not in technology, but in business hypotheses that you want to test. Your once-ignored unstructured data may be a great source of insights.
________
CI&T helps fortune 1000 companies to transform unstructured data into value. We are an award winning Google Cloud Premier partner and the first to be included in Google’s machine learning specialization program. Contact us for more info on how we can help your company.