Proprietary Data or New Math

Twitter had a bunch of chatter today on AI after a fantastic post by Shivon Zilis titled “The Current State of Machine Intelligence 2.0”. She did a great report on “Big Data” a few years ago that brought much needed order to an industry that was in chaos. Her report on MI (which is her label for an industry including AI & ML) is another fantastic effort to sort out the many emerging products utilizing these techs.

The talk post-report is that in order for these companies to be successful, they will need to either have proprietary data or some tech advantage. Basically saying that Goog, FB, Amazon have vastly superior troves of data. I’m not so sure I totally agree with that because A) most companies are doing an extremely poor job of collecting data, and B) as we continue to digitize products and services, every company that survives will have a wealth of data that is driving its business. It’s tough to create new math. Especially when the current algorithms aren’t being fully utilized, so I don’t know that I agree that new math needs to be part of a competitive advantage. Simply creating UIs for ONE use case that allow companies to operationalize ML/AI tech could be immensely valuable to the private sector.

But one area that isn’t being discussed enough if the application of existing machine learning algorithms. A data scientist can’t simply select a clustering algorithm if it wants to cluster a customer base, it needs to know the limitations of each of these algorithms, how those limitations affect the current data set, and why one is more appropriate than the other. For this, we need data scientists to get real world experience. Unfortunately at this time there isn’t enough open data for training and educating these data scientists. Instead of just hiring ops research, comp sci, data sci talent, you need to hire EXPERIENCED talent in order for your company to get off the ground.

Shift business models to those that produce data. Analog to Digital, as Fred Wilson said in his blog today. If you look at the top 5 rated TV series on Rotten Tomatoes, 3 are on Netflix, 2 are on Amazon Instant Video. Why? Because they are able to collect data on viewers. They are able to better decide what the consumer wants by using proprietary data, than TV networks are by using their gut and focus groups. Again, I can’t understand why networks aren’t shifting to web as if their lives depended on it (because they do).

The other area that I am interested to see play out is anonymized open datasets. If the “little guys” are going to compete against Goog, FB, then they probably want to create a consortium that build a massive data platform that each member can build apps on top of. Everyone gets to share the data. Success depends on who uses it most effectively. Probably will take a few years, as execs will need to see that their time is up before making a move, but something that would be great for all businesses.

Regardless I am PSYCHED to see AI/ML being talked about as an area that could break out in 2016. We are sooooooooooooo far behind the curve of adopting these techs and I think this is the year that we will see companies that have adopted data-driven decision-making pull away from legacy competitors. Cable and media could see their mobile counterparts rapidly take eyeballs away, and retailers will see e-commerce continue to steal market share. Legacy companies need to move FAST if they want to be around in a few years. Start collecting data. Start utilizing companies that are EXPERIENCED in applying these technologies. And get ready to change everything you do.