How to Bootstrap a Machine Learning Startup

Zetta Venture Partners has a good article on the practical challenges Machine Learning (ML) startups. The final paragraph summarizes their position:

Startups find markets by specifically addressing customers’ needs. Startups can only specifically address customers’ needs with the power of ML if they build models that make accurate predictions. Building accurate models requires accurate input data and programming of features that are predictive in your customers’ domain. Thus, any company hoping to derive their competitive advantage from ML technology should figure out how to incorporate domain experts and form a data acquisition strategy on Day Zero.

IOW, startups don’t have access to clean data and clear problem definitions like textbook examples. In the beginning we have no data and only a vague idea of customer problems.

And that’s just where I’m at — the beginning of a ML startup. Stephen and I started working on ScribbleIQ 2 months ago and have dealt with of these challenges. Here is what we’ve learned — I hope it helps get your ML startup off the ground.

Primer — ML reality for startups

In the mid-1990s I studied Neural Networks at Stanford. I found it captivating that machines could make predictions — like stock prices. Then I tried creating a simple Neural Network to solve a real problem.


I quickly realized these algorithms need a lot of training data and guidance by people. Real data is messy and incomplete. Goals get redefined when we get real results. There wasn’t enough data and computers were just too slow to build anything practical.

Times are better — but we’re not Google

20 years later startups can finally build practical ML systems for 2 reasons: computers are faster and we have more data. AWS offers unlimited cheap computing. With APIs and platforms like Segment even tiny startups can process petabytes of customer data.

For Facebook and Google this is enough — they have good problem domains for ML and more useful, proprietary data than anyone.

Unfortunately startups are not so lucky:

  • Customers need to give us access to data.
  • Even if they do, there might not be enough of it.
  • We need to know how to train ML systems.

See the dilemma?

We can’t build a ML system without customer data and the knowledge of how to train it … and without a ML system why would customers bother working with us?

I know of only 2 ways to overcome this dilemma:

  1. Throw a big bucket of money at it.
  2. Bootstrap your ML system with customers while delivering value for them.

To date we’ve taken the latter approach. Here is what we’ve done.

We started with a big, long-term vision

The inspiration for ScribbleIQ came from my own experiences doing content marketing. Long, long gone are the days when I could bang out a blog post and watch it float to #1 on Hacker News. Over the past 2 years content has become too hard. Even companies like Buffer experience “content shock”. I saw ways that ML might make it more effective

Thus the big, broad vision for ScribbleIQ: To use AI to make B2B content marketing more effective.

Pick early customers who buy into your vision

This is our second-biggest unfair advantage — we started with a network of >100 prospective customers. We were able to get dozens of meetings a week and test major assumptions in our Lean Canvas.

We met earlyvangelists who “get” our vision and are willing to consider a proposal. A few (thank you — you know who you are!) took a chance on us based on personal trust and a shared belief in our vision.

Dogfood. Yum.

The fastest way to iterate a product is to use it yourself. We decided to use our AI to write our own content and even started calling her Mika. I use Mika to generate our ideas and she found the post which inspired me to write this.

If at all possible start by using your ML in your own work.

Deliver a complete solution

Our original vision was to generate content ideas based on our customers’ CRM data. We had lots of good conversations but struggled to close any deals. An advisor with a lot of experience in the space gave us some feedback:

“Guys, when I’m feeling sick I don’t want you to cure ½ of my disease”.

For many B2B companies content marketing is an awful, nagging, distracting problem. “Act like a media company” is wonderful advice — unless you don’t have the skills/desire to research and write for 8 hours/week. These customers didn’t want a tool — they wanted a solution.

So we changed our offer and started writing content for them. Since research is one of the hardest parts of content marketing we can use Mika to do it — and eat more yummy dog food.


Concierge MVP overcomes the data dilemma

Building a concierge MVP has helped us overcome the ML data dilemma.

We build as little “product” as possible. We instead run scripts and do analysis to identify features, set goals, and evaluate data. The customers could care less how we implement the solution — only about results.

We gather the data. We generate the ideas. We write the content. We help measure results.

And thus we’re in the best position to train our own ML system.

ML is a long-game

We’re realists. We’ve created ML solutions at places like FINRA, Google, Apple, and the US Intelligence community.

Creating models that make accurate predictions takes a long time, lots of data, and years of trial-and-error. Harder still is building a defensible moat around your startup based on proprietary algorithms.

Not every startup can use our approach — some customer problems are too complex and dogfooding may not be an option.

But before you rush off to Sand Hill road and start pitching, see if you can do some services work for your customers. Challenge yourself to monetize early start creating value. Solving real problems is the only way to train your ML system and validate your business.

Originally published at ScribbleIQ Blog.