Applied Research at Reasonable Scale

Building ML-heavy products without breaking the bank

Jacopo Tagliabue
The Techlife


ML Research is hard. How hard? If we consider e-commerce tech (our main focus in the last three years), it is so hard that almost all contributions come from a handful of large, public, B2C companies:

Publications in e-commerce tracks at top-tier conferences in 2020. While the KPI is obviously simplistic, the spirit of the argument will stay intact for most conceivable metrics.

We discussed elsewhere (scholarly and not) the barriers to entry, but never told the missing piece of the story: how did we end up in this chart? This post is our attempt to answer this question, and share what worked and what didn’t when running our applied R&D practice at Coveo — which we code-named Coveo Labs, CL for short (for somebody so often engaged in eccentric paper titles, I do see I could have done better here!).

There are many flavors of R&D in ML, and today we are only interested in the applied side of it, since at “Reasonable Scale” the value is 90% in product-driven questions: think more Home Depot research than Open AI papers. Certainly, not every scale-up company needs applied research. However, it is equally certain that there are terrific opportunities for many companies, but management may not have any blueprint they can tweak for their own scale.

As we have forcefully argued in the case of MLOps, it is simply false that Big Tech is the only game in town: at the moment of writing this post, CL is responsible for several successful collaborations with practitioners from industry (Microsoft, NVIDIA, Outerbounds, Farfetch), and academia (Stanford, Bocconi, Tilburg, Oxford).

If you’re still interested in reading more, this blog post has 3+1 sections:

  • The case for Applied R&D: where we explain what applied scientists do and build a case of why you may want them.
  • Three years of Coveo Labs: where we show a concrete example of R&D practice; having a “successful implementation” in mind will make it easy to understand, going backwards,what makes it possible.
  • R&D at Reasonable Scale: where we share values and strategies that can sustain applied R&D.
  • FAQs: where we answer Frequently Asked Questions (duh!).

Finally, we are unapologetically proud of what CL achieved, but this post is about general motivation and methods: while the story we tell is ours, we emphasize what could work irrespectively of the people involved. People are indeed fundamental, but we are interested in what can be replicated, not so much in what (if anything) makes us unique.

In the end, our open-science mantra has always been the same “What one fool can do, another can”: I’m your fool.

The case for Applied R&D

“You know it when you see it” applies to obscenity as well as to “good applied AI”: good and applied are obviously a spectrum, and all companies have their own flavor. In our flavor, typical projects are things like RecList, “The Embeddings that Came in From the Cold” and Query2Prod2Vec.

While a general definition is impossible, the role of an applied R&D practice is broadly construed as “looking over the product horizon”.

Instead of incrementally improving a feature within the current setup, applied scientists tackle the question with less constraints — they may use new modeling techniques, more computational resources, or tap into completely new datasets. As they venture into the unknown, they aim to discover something innovative that could become part of the product in 6–12 months. Compared to traditional engineers, applied scientists often enjoy being part of the scientific community, disseminating their work in public through papers and talks.

Not everybody needs to look over the ML horizon, but some do: in some cases, ML is the product, so there is obvious financial and marketing value in being good at it (for example, during our tenure at Coveo the company has been cited next to Youtube and Netflix for its innovation pace). In others, better ML improves your business directly: the better your churn prediction model, the better your relationship with your customers.

Even if your company does not sell models, or revenues are not directly tied to ML, there may still be good reasons to invest in applied R&D. If you’re building a case for management, all the following worked out well for us:

  • Motivation: Hiring good AI people is hard, and a good and visible R&D practice may work well when it comes to lure practitioners into startups. Our experience: Several people mentioned CL as a reason to join Coveo as opposed to similar companies.
  • Motivation: Writing a paper is a great example of multi-task learning: we found that improving a feature and sharing it produces better outcomes than working on the feature alone. Even if not building with a paper in mind, having it as a side objective encourages good reproducibility, and the clarity of thoughts that comes from writing stuff down for a demanding audience. Our experience: the documentation produced as a side-effect of scholarly writing has typically a great ROI.
  • Motivation: Being part of a community is important: going to conferences puts younger people in contact with new peers and mentors, and greatly broadens the pool of interesting feedback they can get. Our experience: we became trusted members of both the MLOps and the e-commerce tech communities, as we serve in various roles in events such as ECNLP, EMNLP, SIGIR eCom, CIKM.

Three years of Coveo Labs

To encourage replication of CL, it may be useful to give a precise picture of its structure: while we don’t expect our blueprint to fit all companies, we wish to provide enough details for you to map your situation into ours.

The following chart depicts all people involved in our research projects (note: this does not include our open source collaborations, such as DAG cards with Outerbounds, or RecSys with NVIDIA Merlin). The inner circle captures frequent collaborations, while the outer circle contains one-hit wonders:

Coveo Labs resources, by strength of contribution and type: people on several projects are in the inner circle, people with multiple projects in the middle, people with one project to the right.

We leveraged three types of resources: FTEs (Coveo employees), Industry Practitioners (colleagues in other for-profit organizations), and Academic Researchers (colleagues in universities). Interestingly, some strict collaborators are not FTEs, and some FTEs made only infrequent contributions. In other (more prosaic) words, FTEs are the only people the company needs to budget for, everybody else is donating their time and skills to projects of common interest. Since CL has not been the full-time job of anyone, the budget won’t total more than a few FTEs a year: working with other institutions is therefore not just desired, but the only way to achieve high throughput (if you’re interested in concrete examples, check the FAQs at the end). By aligning our roadmap with other people’s interests, we have been able to strategically form alliances that benefit everybody.

R&D at Reasonable Scale: From Principles to Operations

While we never wrote down our principles (all these years in Silicon Valley, and we learned nothing!), there’s a few things we always kept in mind:

  1. Clear product focus: we know the domain well, having built a product search company before — we are autonomous in defining interesting problems, evaluating solutions and connecting features with innovation.
  2. Startup mentality: we move fast, we are mostly generalists, we have no time for bureaucracy or formalities. Young scientists write code with directors, professors work with interns: while some division of labor is necessary, we act as peers the majority of time.
  3. Build in the open: AI success is rooted on ideas and code others have shared before, as we stand on the shoulders of open source giants. While we recognize how hard it is to strike a perfect balance between competition and openness, we release code, ideas, and even datasets whenever possible.

When it comes to living and breathing these values, it turns out that most items in the “to-do” list unsurprisingly revolve around people: how you pick them and how you treat them. Since growing CL means working with people we don’t pay, it is imperative that working is interesting, easy and fun — a breath of fresh air from corporate life, if you will. Our strategy is based on three main rules:

  1. Only work with fun people: being fun and easy to work with is the main “hiring factor”.
  2. Only work with great people: being great at something is valuable.The general SFI rule-of-thumb applies: in every project, there should be at least one person that knows the domain well; as for the rest, contamination is encouraged (we worked with psychologists, computer scientists, physicists, logicians, ethicists).
  3. Empower people to do their job well and easily: onboarding time should be almost zero. You should streamline NDAs and invest early on in tooling: applied R&D is mostly about speed of exploration, so tooling is crucial. We have an entire series on tooling, so we won’t comment further here. On the data side, note that e-commerce is a particularly great use case, as all data is anonymous at the source.

Of course, we often failed to work with person X or institution Y, as incentives may not be properly aligned for a project to take-off. And that’s ok! Working with academia has been especially helpful, and it is a fantastic way for smaller companies to do R&D: academic researchers are competent and curiosity-driven; they won’t mind collaborating for free if the project is interesting, and they provide a depth of knowledge that even the best industry researcher may not have.

Finally, success is partly shaped by your competitive advantages: our success in recruiting didn’t come from brand recognition or grant money, but from culture and productivity; these traits are attractive to postdocs more than tenured professors, especially ones at institutions used to Big Tech modus operandi. For similar reasons, working with big groups in large corporations is challenging: they are very busy, and may not be as open as they need to be for this model to work.

In the same vein, success didn’t really come from new ML architectures, but from a deep understanding of the domain and the unique datasets we have; for example, our roadmap in session-based personalization started from realizing how few shoppers are indeed recurrent users. By picking appropriate venues, we both maximize the chance of acceptance and the quality of the feedback we receive: comments from workshop reviewers have often been more helpful than general ones (surely, the sad state of peer review in ML deserves its own post).

When a paper is published, it counts as a win for our friends in academia, and a chance to disseminate our work in the community.

See you, space cowboys

We had a lot of fun running Coveo Labs: we worked with old friends, met new ones, and did a few cool and impactful things along the way.

We believe that building bridges with academia and other practitioners is not just effective, but the only way to really do R&D at “Reasonable Scale”: getting the culture right is tricky, but the product reward can be immense.

We added some FAQs at the end of this already long post, based on common questions we received: hopefully, this post will not just be a closure operation for us, but an encouragement for more scale-ups to start successful R&D practices (why leave all the fun to Big Tech?).

What one fool can do, another can: I’m your fool.


What concrete examples of “achievements” can I use to pitch CL to my company?

While the value of these initiatives vary depending on your business and culture (also: the concept of “achievement” is very relative), you may use some the following examples to ask yourself if R&D is worth it. The answer for us has been “yes!”, as in three years CL produced:

  • 20+ peer-reviewed papers, between conferences, journals, workshops (including NAACL 2021 Best Paper);
  • 20+ blog posts written together with product marketers to evangelize the field and our customers;
  • 50+ talks between conferences, industry talks and meetups, including a KDD 2022 invited talk and a NVIDIA Summit Keynote;
  • >1200 GitHub stars across our open source projects, 2 data challenges (SIGIR 2021, CIKM 2022) and 3 datasets, including the most complete session-based dataset of all time.

What’s the ideal company size to start an applied R&D practice?

Explicit, purposeful R&D tends to be rewarding when a company has (some) market fit already and therefore (some) ideas on how future success would look like — in the case of CL, it can be “giving every shoppers the same experience they get on Amazon”. For this reason, companies after series B are usually the best equipped to start, as a combination of capital, talent, data and ROI.

How do you lead projects with multiple collaborators involved?

Our MO is the same from our startup days: one research brief as Google doc, one kick-off call with all the people involved, and then one Slack channel (one GitHub repo, one Overleaf, etc.) where everybody shares ideas. While the project channel keeps track of important milestones, and hosts strategic discussions, we encourage one-to-one parallel threads to figure out directly how to get unstuck. We do expect interns to ping a director if necessary, and allow distributed decision-making when possible.

How can you best align incentives for collaborators?

Running a project at CL is possible only if the situation is a clear win-win for everybody: how do we make up for the huge opportunity costs of talented researchers donating their time and skills to our projects? In our experience, it is important to be able to break down a general roadmap — say, session-based personalization — to “bite-size” chunks of work: e.g. how to build great product vectors first, then how to inject them in type-ahead. By mapping roughly features to projects, it is easier to spread the work through multiple collaborations and make sure everybody keeps being engaged. On the other hand, hard, time-and-GPU consuming work is harder to pursue: applied R&D cannot be “unbounded” as resources are constrained, so some “big questions” cannot be properly answered in this framework.

Is there also room for theoretical work?

Sometimes, collaborations are born not out of a product question, but out of stimulating discussions in our network: Language in a (Search) Box and On the Plurality of Graphs came from Federico and Nicole. Considering our resources and needs, theoretical work is kept to a minimum, and it’s typically pursued in an “opportunistic” fashion. In other words, we rarely wrote a paper that was not for answering product-driven questions.


As it should be abundantly clear by now, research is a team sport. Everybody mentioned played a crucial role in our story: today we emphasized the process, but in the last three years we cherished the people.

First and foremost, thanks to the Tooso boys, Ciro, Luca, Andrea, Mattia: without that clumsy but special AI company, none of the above would have been possible.

Asking in advance forgiveness for omissions, immense thanks also to Giovanni, Diogo, Giuseppe P., Lucas, Giuseppe A., Nicole, Ville, Reuben, Ana Rita, Valay, Gabriel, Tobias, Jean-Francis, Borja, Piero, Marie, Chloe, Brian.

Special thanks to Federico Bianchi for our long-lasting collaboration: he taught us way more than we could teach him, but he is gracious enough to not really keep the score. Finally, special thanks to our affectionate, fantastic, joyful minions, Christine and Patrick John: the future is all yours, and I’ll always be your biggest fan.



Jacopo Tagliabue
The Techlife

I failed the Turing Test once, but that was many friends ago.