Are you really an AI startup?
AI Research has long been the domain of universities, public institutes, and large corporations. Thanks to some amazing developments in the field over the past few years, and an unhealthy dose of PR hype, every startup, VC, agency and hot dog cart is scrambling to find a way to get on that bandwagon; to be AI-powered, AI-adjacent or just mechanical-turks-masquerading-as-machines-until-we-actually-build-it-AI.
This is Peak AI. AI conferences, AI events, AI podcasts. Flash. Bang. $$$$.
My rule of thumb: the first time a buzzword is spotted on some manager’s keynote at some generic industry conference, all news related to that topic should be taken with grain of salt moving forward (see: big data. IOT. SoLoMo. Ephemeral Messaging. Topical Graph. Growth Hacking).
Doing research at a startup is anything but easy.
Of the startups doing AI out there, the majority can be put into one of the following buckets:
- AI practitioners building the equivalent of a very expensive, convoluted job application, getting acqui-hired at a markup before any real product launch.
- AI researchers pursuing an ambitious goal; typically, they’ll struggle to raise money, and alternatively build tailored solutions (vertical implementations) to create revenue streams to keep their business growing (consulting, etc). In the rare case (DeepMind) a company breaks through and manages to become a powerhouse.
- Startups that do everything manually (mechanical turks), hoping to one day have the skills and resources to automate, but still want to ride the AI wave.
- Established companies with clear models and revenue streams, investing in their own future.
To complicate things further, the typical VC is not setup for long-term research; any research really worth doing is typically a multi-year effort that could last longer than a fund’s lifespan. Raising future rounds requires clear traction on core KPIs (I’m not sure most VCs count p-value and f-score as such). Revenue growth required for future rounds is also not easy to show when you need years to bring a product to market. Enough has been written about the difficulty of raising VC money for companies with a longer lead time.
The remaining companies, ourselves included, are the irrational bunch. In retrospect, it’s crazy for a company that’s a couple of years old to start building a research team with the mandate to “solve search”. That quickly evolved to “research the future of photography”. Having a vision that grand, in my mind, is a huge gamble. One that can only succeed if it’s backed up by a solid business model, has a clear market that it targets (horizontal solutions such as frameworks and general purpose models are the domain of much larger corporations, and have more or less become a commodity) and built on a clear foundation of targets and KPIs.
Granted, that gamble paid off, but believe me, there were some terrifying months (read years) before we started seeing any significant results or impact.
With EyeEm Vision, we are now able to confidently answer question like “what’s in this photo?”, “is this a beautiful photo?” or “is this a commercially viable photo?”. We’ve trained models that can understand the aesthetic preferences of individuals or brands, and we’ve managed to make these models run very quickly and cost effectively.
Our forays into research started with an acquisition (sight.io), whose founder, Appu, became our Head of Research & Development (R&D). Together, we’ve hired dozens of really talented researchers and engineers to join our mission, and help us build the future of photography. The work that team does has a direct impact on revenue, user experience, and cost.
R&D challenges every principle you might know on building agile, lean, scalable startups (buzzzzzzz). There is no easy way to build an “MVP”, and good luck working in sprints! There’s no such thing as lean academics. It is very difficult to define simple KPIs that we can truly and comprehensively track. Initial funding is readily available (Peak AI, remember), but beyond that, it becomes very difficult.
Business (vs?) Research
Machine vision for us started as a means to an end. In order to empower any photographer to find their best photos and earn money with them, we had to move away from a manual review and keywording process to an automated one. While the solution is a highly technical, research-heavy one, it can only generate real value if it aligns with what our business team needs, how our photography team works, and how our product team wants to translate those needs and workflows.
Working cross-functionally, in this fashion, is a very challenging process, requiring our researchers to detach from “pure research” and deal with real-world requirements, and our product and business teams to understand and plan around high levels of uncertainty. You need a special set of people on your team that can work in an environment like that, alternatively putting on a business or a research hat as we move forward.
Hiring
Surprise. Hiring highly qualified people in technology is hard!
After emerging from their AI winter, machine vision researchers are highly in demand, and all the big boys are throwing money and perks at them like it’s Christmas. You only need to go to a conference like CVPR or ICML (I’ve heard 30–40% of attendees work for Google, Microsoft or Facebook) to experience that first hand. Globally, there is a candidate pool of a few thousand people that fit the profile we are looking for. That’s terrifying.
Fortunately, finding the right researchers for EyeEm also meant finding those with a scientific/research itch that are also intrigued by the entrepreneurial side of things. People with the desire to build something innovative and push the boundaries of what’s possible, and that understand the need to ship quickly/often (as irritating as that might be). People comfortable with uncertainty and pushing the boundaries of their comfort zones, who want to collaborate cross functionally - with engineers, photographers, designers and business people! It’s a symbiotic relationship. An expensive, symbiotic relationship.
A series of experiments
Another major challenge is that the process of training deep learning models is an inherently experimental one, akin to turning and twisting metaphorical knobs and levers. Looking in from the outside, this means that the team would disappear into their world for weeks (read: months) on end. Sometimes they emerged with amazing results after two weeks, sometimes with a failure after three months. Back to the drawing board.
And this is for iterative work; improving the quality of existing models. Very often, trying to solve a new problem takes months before any initial results are available, and more often than not, we end up solving a different problem than originally planned. Try building a short term product roadmap with that!
The nondeterministic nature of this beast makes it a veritable limiting step, a black box around which other engineering, product and marketing processes had to be designed. We generally knew what the team needed to build/iterate on one of their models, just as we knew how that model would be put into production. In between, we (non-researchers) wait and pray. Fortunately, working at a startup means being comfortable with that dark cloud of uncertainty constantly looming over your head.
BTW, while it’s MUCH cheaper than it was a decade ago, training these networks still requires special hardware that isn’t exactly cheap. I shiver at the cost when I read that Facebook conducts 1.4 million experiments a week. Try to be lean and build AI.
“Done”
The main challenge of detecting what’s in a photo was addressed as soon as we found a reproducible, scalable method to learn new concepts from an R&D perspective. Beyond that point, a lot of the work became rather iterative. While significant statistical jumps in accuracy still require larger algorithmic improvements, a lot of gains could be made by iterative work (training with larger data sets, etc). This kind of work does not qualify as exciting research.
From a company perspective on the other hand, “done” meant 100%. as long as we didn’t fully automate keywording every single photo, it was still an open problem. You fix your precision, optimize for recall to solve search, fix your recall, optimize for precision (let’s face it, 100%/100%) for keywording individual photos. It’s a vicious cycle.
Innovating
If we spend all our time iterating and improving on known unknowns, we lose out on the long run. I always say that photo classification is a race to the top and the bottom at the same time. Our models regularly outperform those of MUCH larger companies — but as this becomes a commodity, economies of scale kick in, and you don’t want to be competing with AWS when your unit of currency is $/GPU hours.
Not to mention that once the research problems are tackled, the iterative work will eventually get unattractive for a researcher. We believe the solution to this problem is to clearly delineate how we approach projects:
- iterative applied research (3–5 week cycles): improving existing models mostly through more data, but occasionally algorithmically as well.
- new applied research (5–10 weeks cycles): implementing new algorithms where just adding more data simply doesn’t cut it.
- pure research (3–6 month cycle): working on a vague problem that will still be relevant in the future. This is where the magic happens.
Not to mention work handing over libraries to engineers, cleaning up code, working cross-functionally with our photography, product and business teams to define how our tech is integrated into products.
Moving forward, we also want to make sure that we have room to write articles and publish papers on the work we’re doing. This can only work if we find a healthy balance between those approaches, and requires people that are comfortable wearing different hats.
Working closely with our R&D team these past few years has been an inspiring (we can do that?!), humbling (I thought I knew math!) and frustrating (how accurate is it? Just give me a %!) experience. It’s different than anything else I’ve had to do before.
In three years, we have taught machines to fully describe the contents of photos, rank them based on their beauty and commercial value, and personalize these ranks for individual tastes. We managed to compress the algorithms so much that they run in real time on mobile devices, we’ve built technology that lets us train these algorithms in real time, and expand their vocabulary as needed, and we’re just getting started.
Startups are hard enough as it is. Doing serious research at startups is as close to the edge as you can get.
Do you want to help build the future of visual media? hit us up!