Jeff Jonas
Feb 20, 2018 · 2 min read

When asked about unstructured data this is all I have to say:

“Unstructured data is only useful if structure can be extracted from it.”

Let me explain: A picture taken in pitch black without a flash is useless as it contains no discernible features. The mobile phone call that suddenly goes bonkers and becomes all garbled is equally useless as there is no way to extract meaning from the noise.

On the other hand, a parking garage video has the potential to be much more useful because license plate reading software can extract plate numbers. Combine this with lat/long and date/time (metadata), and this becomes a truly useful observation.

The principle that observations are only useful if features can be extracted from them has helped me simplify system architectures:

Observe ->Feature Extract ->Contextualize ->Decide ->Act

When an observation arrives pre-structured e.g., a database transaction, the Feature Extract step is skipped. Because all inputs to Contextualizing are structured, Contextualization processing can be streamlined — indifferent to the nature of the original observation (structured or unstructured).

Some common feature extraction algorithms you may have heard of:

Optical character recognition e.g., converting a picture of words into a text document

Object recognition e.g., detecting pictures of cats

Facial recognition e.g., unlocking the iPhone 10 without a password

Acoustic fingerprinting e.g., detecting an artist/song based on a small audio sample

Named entity recognition e.g., suggesting a new contact based on an email’s contents

Unfortunately, commercially available feature extraction technology has a long way to go. The error rates are often just too high. As a consequence, downstream processes (e.g., Entity Resolution) become the victim. Technology breakthroughs in the field of unstructured feature extraction is much needed. I keep waiting — come on already.

JeffJonas

Senzing founder and CEO. Privacy by Design software. Ironman triathlons in my spare time.

Jeff Jonas

Written by

JeffJonas

JeffJonas

Senzing founder and CEO. Privacy by Design software. Ironman triathlons in my spare time.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade