Unraveling Data Bootstrapping: Learnings From An Interview With Rodolfo Rosini

Keertan Menon
DataSeries
Published in
6 min readJan 10, 2019

“The world is one big data problem.”- Andrew McAfee, MIT scientist

Data is pretty much what rules the world. When it comes to AI startups, the absence of relevant data can be a bit of snag on their ambitions. Some of these companies could be targeting markets that are inherently fragmented and don’t have readily available data sets.

An investor’s lack of faith in such companies, owing to the data problem, could end up sabotaging their plans altogether.

So, how do they get around to collecting the required data sets?

Rodolfo Rosini, from Zeroth.ai, has some compelling answers! Having been an entrepreneur in the fields of cybersecurity and AI himself, he is now putting his experience to good use by helping AI startups find their footing in the increasingly competitive world that’s fixated on data!

I had the pleasure of interviewing the entrepreneur turned investor, to get some insights into Zeroth’s process of investing in AI startups, the problem of finding relevant data sets, and how we should go about solving this problem.

What Does Zeroth Do?

It’s common for investors to ignore startups which don’t have the necessary data to support their plans. Zeroth invests in such AI companies, at the earliest stages.

When it comes to AI, the absence of data can be a major snitch. Unlike other fields, AI startups need to compete with established giants like Amazon and Google, who enjoy the benefits of scale. The error rates in their data sets are lower owing to the size of their operations.

So, smaller companies and startups need a ‘cheat sheet’ to find a way to compete with the behemoths. Relax… cheating doesn’t imply anything illegal! It’s just a bunch of ways to ensure the smaller ones find their own footing!

Key Takeaways from our Interview With Rodolfo Rosini:

Lesson #1. Go After The Smaller Markets!

Most series-A investors tend to think that going after the bigger markets is where the action lies! However, big companies like Google and Microsoft play by a certain set of rules, which would prove difficult for startups to adhere to. Competing on common ground is extremely challenging!

It is ideal for startups to try and seek smaller markets, to gain a foothold, and then work their way upwards.

Lesson #2. Use Local, Specific Data Sets!

A lot of companies use more general data sets to capture larger markets like America, or China. However, in order to be able to capture a specific market, startups would need more local data sets that are specific to certain markets. These could be public or private data sets.

For instance, Zeroth invested in a company in Kenya, which focused on collecting data on feedback for a particular service. The company couldn’t trust the ratings provided by consumers and wanted a way to rank the ratings based on different parameters like social proximity. In such a situation, data sets specific to Europe or America wouldn’t come in handy — they needed local ones!

Lesson #3. Don’t Have Data? Find A Way Out!

Not having data is no reason to hang up your boots — it is a test of your ingenuity! It is upon you to devise a smart way to get the data you need! You can either go door-to-door collecting the required data, or access certain private or public data sets.

For instance, Fano Labs, a company backed by Zeroth, managed to find a headstart by going door-to-door to different companies in its target sector. The company develops speech recognition and natural language processing technologies focusing on processing and analysis of Chinese dialects. They needed data to define the efficacy of their AI system, that could be used to analyze phone calls (in various dialects) at call centers and provide the managers with business intelligence to control the quality of calls.

With Zeroth’s backing and a Pre-Series A round led by Horizon Ventures, the private investment arm of Li Ka-shing, they were able to test their solution at various call centers and avail necessary data! The call centers didn’t mind parting with their data either — after all, it was just a matter of recording the calls!

Another example is that of Seoul Robotics, a Korean self-driving car technology company. They had developed a Deep Learning based LIDAR Object Detection Software, that would help self-driving car manufacturing companies save on the costs of getting fully-stacked hardware, and power consumption. They sourced their datasets from Kaggle, a public source.

Lesson #4. Data sets Need To Be Dynamic!

At no point in time, can you be complacent about your data sets! They cannot be static. Consumer behavior will change over time, subject to various factors, and this needs to be factored in when you are trying to solve problems.

To make sure you aren’t left behind, the data sets need to be constantly streaming and dynamic. You must assume that the competition will always have an updated data stream, and to succeed, your data sets have to be updated — Always!

Lesson #5. No Data? Change The Rules Of The Game!

There are certain industries where you may simply not have the data at all. For instance, as Rodolfo explains, there’s the case of nuclear reactors. There aren’t enough reactors going to meltdown, so the data may simply not be available!

In such instances, the existing data sets may be garbage from a deep learning perspective. You will need to change the rules of the game, and look at other unorthodox architectures that are still ML, and will help you solve the problems. That’s right — Deep learning and big data solutions offered by the behemoths like Amazon or Google may not always give you the right answer!

Lesson #6. Don’t Underestimate The Older Technologies!

While it’s natural to go ‘what’s next?’ and focus on better technologies, it’s wise not to discount the older ones. There has been a lot of research in the past few decades, with probes into stuff that eventually didn’t work. However, these technologies that seemed useless then, are now finding a use. For instance, neural networks called extreme learning machines work extremely well with smaller data sets, and require fewer resources to run. In short, they tend to outperform deep learning (considering its need for scale).

Lesson #7. Big Doesn’t Necessarily Mean Better!

The future is expected to get bigger, and bigger still. Google and Microsoft offer models are going to make data bigger, and deeper, but may not always be the most efficient. In Rodolfo’s words, we are using a missile to kill a fly, or in other words, over killing the problem!

Given the rate at which data is changing and increasing in quantum, and the enormity of the deep learning models, there will be garbage data sets, but not over time. What is garbage today, may become relevant in a few years’ time.

Zeroth Is Solving A Pressing Problem By Investing In AI Startups…

Towards the end of the interview, Rodolfo spoke about Zeroth’s philosophy for investing in AI startups. He mentioned that there’s a need for companies to solve problems in a more sustainable manner, without putting too many of their resources at risk, and without (negatively) contributing to the burgeoning problem of climate change.

It is important to invest in these companies for future generations to have the necessary resources and the wherewithal to have a good way of life.

For AI and ML startups that lack the required data, and are struggling to find a foothold in the market that’s ruled by internet and tech giants, there is a need to buckle up and look for ways to make the most of their minimal resources — or disrupt the game entirely. Rodolfo Rosini, and Zeroth, are trying to ensure that these companies not only survive to see the light of the day, but also succeed in doing so, and in the process hopefully encourage similar companies to take a leap of faith!

Follow DataSeries on Medium, on Twitter, and head to the website to gain insights into the world of data-intensive companies through their charismatic leaders and executives!

--

--

Keertan Menon
DataSeries

Partner @ Sansa Advisors 🌍 Ex @cerberus @openocean @dataseries