Kyso: the future of data is collaboration

Tooling for better data insights from your data team

Mick Halsband
Lunar Ventures
8 min readMay 11, 2020

--

We’re excited to share our investment in Kyso, a collaboration platform and knowledge hub that allows data analysts to write better code, maintain it more reproducibly — and most importantly — to effectively communicate data-based insights across to all decision-makers.

‘Data Has A Better Idea’ courtesy of Franki Chamaki.

Ever since I learned about Kyso, I’ve been fascinated by its grand vision: making data science more collaborative, and more inclusive. Throughout my career, I’ve dedicated years to learning how to best spread data insights across organizations. But only once I discovered Kyso, I realized that the key is not so much in making data analysts more productive, but rather in improving the collaboration between data analysts and other data stakeholders inside the organization. Kyso’s trick is to improve evidence-based decision making by offering better tooling — making collaboration around data-insights seamless, powerful, and fun.

A long time ago (in a galaxy far away) my day-job was lead data engineer, toiling away on building mission-critical software for medical devices. Our product was meant to correctly diagnose brain aneurysms (while making sure to not kill anyone!). This was my first glimpse into data science, at the heyday of “big data” and long before the incredible rise of deep learning. In retrospect, this was the gateway drug that got me hooked on modern Machine Learning (ML). Not many years later, as founder and CTO of a hedge fund, ML was at the core of our secret sauce. In those days, Big Data was still a new buzzword. Most of the super-powerful tech stack wielded today by any fledging data scientist, was then novel or not yet available. (Hadoop, for example, was created in 2005, the first 1TB hard drives were released by Hitachi and Seagate in 2007, and MongoDB dates back to just 2009).

Image of Hitachi 1.0TB Hard Drive from 2007
“HUGE Capacity” — first 1TB hard drive by Hitachi (2007)

Since then, our data science pipelines have become super-powerful, and every modern business considers itself a “data-driven business”, making decisions based on the insights of an armada of data scientists, doing anything from A/B testing to deep learning. But one old elephant has remained in the room all along — we are still balkanized into two types of users: insight producers and insight consumers. Or, as we called them in the world of hedge funds: quants and traders. This balkanization is an effective killer of data based insights. Let me tell you how it worked for us: Our quants’ job was to come to work in the morning and think up a bright new trading strategy. They would then build it and “backtest” it, simulating its behavior in the markets. They would do this by using Python and Jupyter Notebooks to pull the data and run the research. The quants’ task is to prove to the traders their approach has merit. Once they finish their rocket-science research, the quants would render the results to a report in a PDF file, and send the report to our traders over email. In one fell swoop, the workflow of these highly-trained scientists would go from 2020 techno-magic to a 1980s Dilbert cartoon. That’s how it worked in our hedge fund, and incredibly it’s how it works in many businesses today. How well did this work for us? Well, if you ever emailed a spreadsheet back and forth, tried to follow a redline edit, or collaboratively wrote code before modern source control — you can take a safe guess. This kind of crooked communication kills any chance for effective evidence-based decision making.

But the real craziness started after we decided to deploy a trading strategy in the live market. Now, our traders — market savvy economists fluent in candlesticks and market behavior — would need to manually monitor it. The market is an unpredictable beast, full of flesh-and-blood humans who behave in messy human ways. No algorithmic trading strategy survives contact with the reality of the market. The traders’ job is then to document unexpected behavior, and to loop these potential anomalies back to our data science team, which would quickly escalate the email craziness (think screenshots with microsoft paint drawings on them flying back and forth). Many strategies were being worked on at the same time, and sometimes we’d find that the quants and the traders are not even talking about the same strategy anymore! This whole process was error prone, messy, and incredibly wasteful. The root cause was that the two groups — the insight producers and the insight consumers — were using primitive tooling to communicate with one another. This was driving me nuts: my background was in software development, where this kind of tooling was a vestige of the 70s. Code development evolved over decades and created extremely sophisticated tooling to allow all stakeholders to effectively collaborate. Software development uses revisioning and agile practices, rapid cycles of creation and refactoring through cultural and methodological collaboration. Software development, which started its life as a side-hustle of electrical engineers writing Cobol, has evolved into a discipline. Nowadays all it takes to develop, build and deploy projects of millions of lines of code — collaborating with thousands of strangers across the internet — is a $1000 laptop and a cloud provider — and it just works! But in the land of evidence-based decision-making, we are still in the old days of Cobol programmers sitting together in a room.

xkcd comic ‘Python’
“import antigravity” — coding with the likes of python feels like having superpowers (xkcd comics)

Over the last 10 years there has been tremendous progress in the tooling for data science. With today’s practically limitless on-cloud computation and storage, and with massive open data sets, with open-source ML, statistics and math libraries — data science is now a rapid experimentation discipline. Compared to 10 or 20 years ago, today’s data scientists have plenty of data, and don’t need to wait for experiments to finish running. The next frontier for data science is experiment management, with new tools for data set and experiment revisioning that allow better and easier experiment reproducibility, and continuous learning.

However, even with this new tooling, one thing continues to be overlooked: the balkanization of quants from traders, of data insight producers from consumers. We can’t just pretend these two groups are the same, and need the same treatment. The insight producers have quantitative skills, programming skills, scientific skills. The insight consumers have invaluable domain expertise that is required to guide the work and integrate it with the messiness of the real world. These groups must collaborate intensely, but they have distinctly separate needs, habits and cultures. In my hedge fund job, just as I dedicated my career to honing my skills at producing clear and high quality code, so did our traders spend years honing their financial insights. But getting that insight from the trading room and back to the research teams was proving very difficult. And this problem is not unique to hedge funds: while data science skills are mostly transferable between industries, the domain experts’ industry-specific insights are necessary for feature selection and to successfully design and deploy machine learning models to be applicable, valuable and robust. Whether in finance, energy, industrial manufacturing, health, or most elsewhere — it’s the domain expertise that enables data scientists to unlock value. And in all of these domains, the domain experts’ insights need to be transferred back to research in order to create strong models.

But can organizations really fully unlock the value that domain experts hold? Or are we destined to continue emailing PDF reports and hoping to magically get back valuable insights? — Enter Kyso.

kyso is your data analytics knowledge hub

Kyso is a collaboration platform that connects the data-expert insight producers with the domain expert insight consumers, while allowing both groups to continue using the tools that they are used to. Kyso takes the data science experiments’ code and the data repositories, and continuously pulls them from github, executes and renders them — ensuring that all reports are always up to date and in sync. All reports on the Kyso platform are searchable, dynamic and interactive, and always represent the latest codebase and datasets. When a domain expert reads a report and notices the models are misbehaving, they easily comment on that particular spot, and that comment shows alongside the relevant part of the report, allowing the data science team to address it effectively.

Kyso can be thought of as ‘Jupyter-meets-git-meets-medium-meets-confluence’, but the true value is in their integration into a seamless product. With Kyso, Dilbert-level stakeholder miscommunication makes way for effective short-loop communication, respecting the habits and competencies of both data scientists and decision-makers, and enables them to work together efficiently, producing more value than each could produce on their own.

Image of control room from the film ‘Avatar’ (2009)
Collaboration over visualization of data insights in ‘Avatar’ (2009)

Arguably, the role of Science Fiction is to point the way for science. In the case of data science, Hollywood’s vision of the future, in Cameron’s 2009 movie ‘Avatar’ (pictured above), or in Spielberg’s 2002 movie ‘Minority Report’, is about producing crucial insights, for the right users, at the right time — turning data into insights, and insights into decisions. In ‘Minority Report’, Tom Cruise’s character needs to go in 10 minutes from knowing about a murder that is about to happen, to discovering and apprehending the would be murderer. In my hedge fund, our traders would be lucky if they fixed a trading-strategy bug within the week! Data Science needs a systematic solution: without getting your data insights to the right stakeholders, without getting the stakeholders engaged with the data and contributing from expertise and experience — without closing the loop — we cannot harness the full potential of data science in the way we’ve done for software development. Much like we scaled software development using Atlassian, Git, Kanban, Scrum and other tools and methodologies, so do we expect new tools to scale data science.

If you are a data scientist who is also wasting time on miscommunication with decision-makers, on compiling PDFs, struggling to leverage domain experts, and losing valuable insights to poor feedback loops, hop on to Kyso and give it a spin. If you are a decision-maker and keep getting reports that are opaque, difficult to handle, or not amenable to specific drilling-down in a high-resolution feedback loop — consider adopting Kyso for your data team.

--

--

Mick Halsband
Lunar Ventures

Mick is a seasoned tech leader and early stage investor.