The Future of Data Infrastructure From Source to Insight

Vipin Chamakkala
Work-Bench
Published in
3 min readMar 28, 2018
“Pipelines” Nikos Koutoulas

At Work-Bench I’ve spent a bunch of time in the data infrastructure space and we’ve made investments in companies such as Algorithmia and Tamr to that end. Recently I’ve been seeing an increasing need in data infrastructure and engineering tooling.

The primary question I’ve been currently exploring is what’s required for enterprises to leverage data and turn it into intelligence at the scale and speed of the internet giants. These are activities like automating workflows, uncovering insights and being able to deliver new products or better experiences to end customers (and employees).

My take is that intrinsic motivations of modern data workers (data scientists, engineers and analysts) are not being addressed by existing processes or tools largely because of poor data management practices, access/ownership issues, a lack of relevant training, and costs associated with talent. I think there’s been significant investment into model building platforms but very little for engineering tools, which has led to bottlenecks in accessing, understanding and transforming data so that it’s usable for analysis and production in the Fortune 1000. As companies think about adding AI to their businesses, there’s a lot of room for improvement in these areas, and we could all learn from folks who’ve done it already.

As I get up to speed, I figured I’d share interesting content that I’ve come across around data engineering, mostly at web-scale companies, which is telling of potentially what’s to come for the enterprise.

Writing

A Beginners Guide to Data Engineering (and part II) by Robert Chang (Airbnb)

  • “A data scientist’s capability to convert data into value is largely correlated with the stage of her company’s data infrastructure as well as how mature its data warehouse is. This means that a data scientist should know enough about data engineering to carefully evaluate how her skills are aligned with the stage and need of the company.”

The AI Hierarchy of Needs by Monica Rogati (Formerly Jawbone & LinkedIn)

  • “However, under the strong influence of the current AI hype, people try to plug in data that’s dirty & full of gaps, that spans years while changing in format and meaning, that’s not understood yet, that’s structured in ways that don’t make sense, and expect those tools to magically handle it.”

The Downfall of the Data Engineer by Maxime Beauchemin (Lyft)

  • “Watching paint dry is exciting in comparison to writing and maintaining ETL. Most ETL jobs take a long time to execute and errors or issues tend to happen at runtime or are post-runtime assertions. Since the development time to execution time ratio is typically low, being productive means juggling with multiple pipelines at once and inherently doing a lot of context switching.”

Videos

The State of Artificial Intelligence by Andrew Ng

  • Andrew Ng discusses the current state of AI and more importantly lays out how he believes businesses should build centralized AI teams and matrix in talent into various business units.

Scaling the Data Infrastructure @Spotify presented by Matti Pehrs

  • Walks through how Spotify works with data, the challenges that the data infrastructure experienced during the summer of outages in 2015 and how they solved it by building tools like Datamon, Styx and GABO.

Machine Learning in Uber’s Data Science Platforms by Franziska Bell

  • “Michelangelo is an end-to-end ML workflow and allows Uber teams to manage data; teach, evaluate and employ models; and create and track predictions. It also serves deep learning, time series forecasting and other machine learning models, and the company is focusing on improving developer productivity on the platform”

Stay tuned as I continue to explore this thesis and share more content and insights gleaned.

If you have any thoughts on this space, interesting content to point me to, or are a startup working on something in this space, please reach out to me via Twitter @V1P1N or email me.

--

--

Vipin Chamakkala
Work-Bench

/Director of Customer Partnerships @Sequoia. Prev. Investor at @Work_Bench. Linkedin: https://www.linkedin.com/in/vipinchamakkala/