Your AI is only as good as its Data

Marc Canter
AI Blogging
Published in
3 min readDec 18, 2016

Its kind of an obvious statement, but its one that has to be made.

You see I’m an entrepreneur building an AI system right now and we’re finding ourselves building all our own data — ’cause our system wouldn’t work without it. But who wants to do THAT?

All AI systems are based upon large (and in some cases VERY large) corpus of work — ranging from the complete works of Shakespeare to every comment ever left in Facebook.

Data is the heart and soul of AI — so why do I even need to write this article?

I just got off the phone with my new partner Steve DiPaola today with us both assuming that we’d soon be leveraging a nascent industry of AI “ready” databases of labeled ontologies around ever subject matter and topic imaginable.

These databases will be readily available for rent at something like $.0000001 per transaction which would enable an entirely new generation of “smart software” (which just happens to be what Steve and I are up to.)

Steve and I have no intention of building a business like this — so this article is a plea to entrepreneurs around the world to build these beasts (in every language), so Steve and I can leverage the hell out of them!

By leveraging I mean that we’ll feed this data to neural networks, train our software in the nuances and usage of this data — as it applies to [XYZ] and produce extremely fluid systems, which are dynamic and provide compelling experience for our users.

I’m talking bout every food dish that exists, including it’s ingredients, recipes, origination, history and variants (per culture.)

Yes — I know that escalivada is an awesome Spanish dish made of red peppers, tomatoes and [ABC] but what is its relationship to Ratatouille?

I’m talking bout every Movie, Song, Band, piece of Artwork, dance and performance ensemble — that exists today and that has ever existed.

Name for me the various ground breaking cinematic and sound techniques utilized in Citizen Kane? And while we’re at it “who was the writer who REALLY wrote Citizen Kane and who’s grandson is now the host of TCM?”

I’m talking bout sports — every sport, every player, every team, every league, across all nations, kind of statistic and style of play.

Who was Walter “Sweetness” Payton and who did he play for?

Turning real world knowledge into algorithms which can then be fed to neural networks which can then teach AI systems to get smarter over time — is the holey grail of AI. They call it “feature engineering.”

That’s why we need Data on all knowledge, properly labeled and have it all readily available via APIs; dirt cheap and distributed around the world — with no latency or delay.

Please world — can you have this data competitive priced, with multiple players and vendors in each product category — and hurry up — ‘cause we need this data — somewhere in Q1 2017.

And you can hold the mayo, on rye with extra pickles and onions. Please.

Toasted.

With a schmear of Chutzpah and Mensch — to go.

--

--