New kind of AI Data play — please

Marc Canter
AI Blogging
Published in
3 min readJan 1, 2017

I’m going to reiterate an idea I stated in my previous article — “Your AI is only as good as its Data.” There’s a new kind of AI Data play that needs to happen.

This Data Play would be an SaaS Database of a wide range of knowledge and information, that is accessible for $0.00001's per transaction and is utilized in all sorts of AI applications and services.

I’m writing this reiteration because — surprise surprise — Robin Bordoli includes this new kind of Data play, as one of his 7 predictions for 2017.

Prediction #4: Budgets for Training Data explode

For too long corporate IT budgets have had 2 main line items: people and technology. So they have budgeted for employees and contractors and software. But given the overwhelming evidence of the value of business data (Uber, Facebook, Alibaba, and Airbnb have each dominated the transportation, media, retail, and lodging categories respectively by owning and curating data rather than physical assets) it’s time that there is a 3rd category: data.

Specifically within the realm of machine learning, CIOs need to budget for training data. Buying a machine learning cloud service without budgeting for training data is like buying a car without budgeting for gas. You’ve just purchased an asset with no value. In economic terms, training data is a complement to machine learning. With the cost of machine learning set to fall (see prediction #3 above), then economics tells us that the value of the training data will rise. When something increases its value it requires management attention. Budgeting is the highest form of management attention possible within a corporate environment. If you don’t budget for it and allocate capital, then it will be ignored.

So 2017 will be the year smart CIOs build budgets to automate business processes which have 3 categories: people, technology, and data.

BigData image by Spencer Levine — http://spencerlevine.co.nz/big-data/

I came upon this same assumption myself as I looked at the budget for our current startup, which is building a Deep Learning Conversation engine.

To create anything resembling a compelling experience, we’re going to have to invest $100,000’s in data entry, data wrangling, data massaging, data science and data — just about everything else.

Our databases will feed our deep learning engines and provide the foundation for unique experiences for our users. Like I said — your AI is only as good as its Data.

We’ve discovered all sorts of existing databases that we can beg, borrow or steal — but this is just the tip of the iceberg. Academic databases cannot be reilied upon. Partially scrapped, hacked together and connected by chewing gum and kite string — just ain’t gonna cut it.

What we need are industrial strength, .9999’s databases — with solid API endpoints — designed to scale.

For AI to succeed (and certainly everyone believes that now) we’re all gonna need a whole bunch of these “off-the-shelf” databases of vernacular language, mainstream content data, corpii of every kind of knowledge there is, data specifically designed for learning, Statistical Data and Big Data —in just about in every way, shape and size.

So my tip — to those entrepreneurs out there contemplating the next big Unicorn? Build us access and usage of these kinds of databases “dirt cheap” — and we’ll all come running and sign up.

I know we will.

--

--