Takeaways from StrataData London

Tomek Chudzik
Unit8 - Big Data & AI
6 min readMay 28, 2019

Strata is one of the largest conferences about (Big) Data. It is organised by O’Reilly every year in London, New York and San Jose. This year we went to London to get an idea of what the industry is up to, meet some extraordinary people, listen to the some of over 200 speakers and get inspired.

It was great to see how the world of data has grown in the last couple of years. “Data is most valuable resource on earth” this is quite obvious to everyone nowadays. Even one of the oldest and most famous department stores one earth — Harrods is working on its AI strategy. But as Data is now the driver of modern world, people start to notice and acknowledge the dark side of this revolution.

“Data is weaponised”, “Data is at risk”, “AI fairness”, “AI transparency” you could hear those or similar phrases many times on all sorts of presentations at Strata. Somehow this topic is even more true in the UK, a country that is facing Brexit right now and Carole Cadwalladr has something to say about this on her great TED talk — https://www.ted.com/talks/carole_cadwalladr_facebook_s_role_in_brexit_and_the_threat_to_democracy.

But it’s great that we are talking about these things because that raises awareness and drives innovation.

Begin with the end in mind

A McKinsey survey this year asked executives if their company had achieved a positive ROI with their big data projects: 7% answered “yes”.

It is easy to fall into the trap of thinking that buying the latest and greatest tools is going to turn a company into Facebook or Tesla. Indeed a lot of companies in the Expo Hall of Strata tried to promise an easy and affordable “Data Revolution”. But we have seen a number of times in Unit8 that “technology first architecture” is usual and dead end.

Mark Madsen, Todd Walter had a very interesting presentation “Architecting a data platform for enterprise use” where they talked about data architecture and data curation but also about how important it is to understand the big picture before looking at the tools. You need a purpose and focus on outcomes, business goals and use cases. Architecture is a pattern that supports that purpose. Without it you end up with accidental architecture which is hard to support, maintain and extend and in the end has to be replaced.

Shingai Manjengwa who is teaching high school students about data analytics also had a very good point saying how schools today teach students mostly about the tools (mathematics, statistics, computer science) but not how to solve problems.

DataOps is here for good

Without proper management a model in production can do more harm than good. If you forget about this your model can get out of control and in extreme cases cost you $440 million in 45 minutes, like it did for Knight Capital Group. Have a look at the story here.

This year the problem of developing, testing, releasing and monitoring of machine learning models was very popular. A couple of very successful companies showed how they approach the problem of repeatable and predictable model development and deployment. Once again the message is clear: it is not about the tools it’s about methodology and philosophy.

Everyone wants to be in The Cloud

Tech companies have been embracing The Cloud for many years now but it has always been harder for more traditional and established businesses to go in this direction. That’s why it was exciting to see how some very well know brands with more 100 years of history shared their “cloud migration” stories. One common pattern that we could see is that cloud deployments are very heterogenous mixtures of on-premise and cloud. Companies are investing a lot of effort into keeping those systems operational and secure while avoiding vendor lock-in. Some of them claim that trying to be cloud agnostic cost them more than it was worth and they decided stop worrying about this aspect.

No wonder why almost all of the vendors in the Expo Hall of Strata boasted that their systems are cloud native, cloud-agnostic but they can also run on-premise. Is it really possible ? Our experience shows that you have to be very cautious.

Empower your Data Scientists

Like the Chief Decision Scientist of Google said in one of the keynotes : “Data science should go at a speed of thought”. We could see a lot movement around tools and techniques that help increase the productivity of Data Scientists.

Data access was one of the topics here. Finding the right data set and consuming it without having to worry if it’s batch or streaming source is the key to success here.

When Data Scientists start to work on a project, they usually start with looking for the right data sources. If they are experienced they may know where to look for but if they are fresh they will turn to the more experienced colleague and ask basic questions. This leads to situations where experienced people spent hours solving basic problems of their colleagues. Lyft — a car sharing startup from US — described how people had to spend hours on monitoring a special Slack channel for Data Scientist answering questions about sources of data. They also showed how they solved this problem by introducing their own data catalog— Amundsen. This solution allows users to look for data in a consistent way. It has a built in ranking system, they call it “google search for data”. Result: much lest time spent on answering simple data related questions and more time for creative work.

Go-jek, a very interesting startup from Indonesia, showed a new concept of feature store called Feast a tool that takes concept of data catalog to a next level. The idea is to give Data Scientists ability to consume features available in the catalog using a very convenient Python API. It speeds up development and reduces time to production.

I want to know the future

We have to say that forecasting was one of the hottest topics this year. There was everything from low level descriptions of new types of NNs to hands on labs on time series forecasting using lates offerings from cloud vendors. Not to mention AutoML.

It is interesting to see the renaissance of this very old problem. More data and more attention form the business drives innovation and development of new advanced techniques. What is also interesting is that algorithms known for decades, like ARIMA, are still pretty successful and ware often mentioned in the talks.

This is something that we also see at Unit8. Sometimes old and well known tools are just the best.

Where is my T-shirt ?

In between Keynotes and many great session people gathered in the Expo Hall where number of vendors tried to attract their attention with stickers, T-Shirt, drones and other unbelievable gadgets.

If there is one lesson that could be learned from many great speaker at Strata it would be: “Build for use case”. This reassures us that we are on the right path at Unit8, always focusing on the problem and finding the right technology to solve it.

It was great to be part of Strata. We will come back next year.

--

--