Part 2: The proteomics marketplace

Longstanding problems demand new solutions and unlock newer revenue streams. As the technological landscape of proteomics matures and evolves, so does the business model of both incumbents and new entrants.

16 min readSep 26, 2020

This is the second part of a larger series exploring the field of proteomics. In the first part, we covered the basics of proteomics: why we need to study it, where the challenges lie in the domain and what is expected to happen in the very near future. In the next chapters, we will be looking at some disruptive companies and their offerings. But before we get there, it is good to understand the revenue streams and business models in the proteomics space. Proteomics, by its nature, is a multilayered marketplace that enables multiple revenue streams. Not surprisingly, there are multiple ways to slice the market.

To develop an overarching perspective, I have found it useful to look at the marketplace through the lens of the user. The user has collected a sample of interest and is seeking to interrogate it to answer a specific question. We are trying to understand what does the user need and what would they like to have, in order to get to the answer they are seeking from their sample.

Generally, the question will take some form (or combination thereof) of the three possibilities described below.

A. What are the biomarkers of a clinical condition I am interested in?
B. What are the levels of some known biomarkers in my sample?
C. What is the effect on the proteome for a particular perturbation I am testing (environmental stress, biochemical cues, administering therapeutic, etc)and how can I create a map of cascading links based on this effect?

As we follow the journey of a sample from being collected to becoming data being mined for insight, we will be able to discuss in context how every step represents a revenue stream and how that space is evolving. This will allow us to dive deeper and compare better. In turn, this would allow us to survey the landscape with clarity to see connections and identify whitespaces.

Let’s begin.

1. Sample collection

Standard consumables and tools are required for a user to collect sample for the source of their choosing. The choice of the source is highly dependant on the question being investigated. Since the source can be varied (cell culture media or lysate, clinical samples like blood, faeces, urine), the reagents and tools required for collection can vary a lot as well. However, the product offerings that seek to monetise this layer of workflow are, for the most part, undifferentiated from sample collection for any form of life sciences research or diagnostics. Specialised tools may be required if the sample needs to be collected in an invasive manner or through a clinical procedure (like cerebrospinal fluid).

The reagents, tools and expertise required to properly collect samples from a source varies highly depending upon the choice of the source. The more clinically relevant the question, the more complex will be the sample. The same will be true for the sample’s preparation, raw data obtained from its measurement and finally, its analysis.

There is an increasing push towards automating the process of collection such that we can reduce or totally eliminate human and human-factors related errors. But those movements are not unique to proteomics and they are sweeping through the the sections of the clinical practice where the workforce has been depleted over the years, like clinical research assistants and nurses.

What is being sold: tools and devices that ensure sterile collection and storage of samples.

2. Sample preparation

In part 1, we have discussed at length, the importance of sample preparation in proteomics workflow. Sample preparation is critical because it

a. Gets rid of contaminants which can reduce the sensitivity of the instrumentation that will be used to measure proteins in the sample. Native samples are NEVER a good fit for any measurement technology because they lead to a lot of “background” or non-specific interactions with the instrumentation.

b. Modulates the biomolecular environment around the samples such that the instrumentation’s strength can be utilised fully. Suspending samples in a buffer that allows maximum specific probing by the analytical technology platform of choice increases both sensitivity and specificity of measurements.

c. Tries to make sure that lower abundance proteins are sampled correctly. This involves sample fractionation, or splitting up the sample into various classes, the individual members of which will be within detectable abundance of each other.

Expectedly, this layer of the proteomics workflow needs highly differentiated offerings. The specific goals of this step have created a huge marketplace that offers specialty reagents that allow a user to accomplish some or all of the the objectives described above. The sheer number of consumables and reagents that are employed in this step have caused this layer to become totally fragmented.

Additionally, the outsized impact of sample preparation on the measurement has led to the proliferation of bespoke workflows often combining reagents from a multitude of suppliers. However, bespoke workflows don’t scale well and our frowned upon in the age of transparency in science. Further, more steps and more reagents required mean more active “hands on” time from the researchers and users. Companies have found that users will pay premium price for offerings that reduce the active time required for sample preparation. This has given rise to a new breed of products built around the concept of “walk-away” time. The value proposition can sometimes, simply be, that a closed loop benchtop device performs most or even part of the same workflow but requires no attention. The user can just “walk way” and be more productive in the meanwhile.

What is being sold: reagents, tools and automated platforms for sample preparation

Startup to watch for: Seer Bio

3. Measurement

This step captures the act of measuring the abundance of proteins in the sample through an analytical instrument of choice. For clarity, we would break this further into two subclasses:

A. Pre-measurement processing:

A good example of this is chromatography columns. Chromatography columns are used to further separate out a complex sample by using some physical or chemical difference between the individual proteins. The general idea is to NOT create a scenario where ALL proteins in a sample reach the measurement source at the same time. In the case of mass spectrometry, for example, if ALL proteins do reach the ionisation source at the same time, only the proteins which are “good” at ionising will dominate the readout. As a result, the experiment becomes completely flawed and deviates significantly from the true identity of the sample. No matter what the measurement technique is, it is ALWAYS a good idea to separate out the individual proteins in the sample before they are interrogated.

Chromatography columns make it possible to fractionate the sample such that they are similar sub-groups within the sample can be resolved from other sub-groups and their interaction with the measurement instrumentation can be well-controlled.

The ability to separate out classes of proteins from a complex sample depends on thermodynamical considerations regarding how specific proteins interact with the chromatographic column. Without too much technical detail in this piece, imagine the driving force behind separation to be how proteins interact with a set of microstructures (with or without some specific chemistry about them, maybe hollow, maybe not). But with state-of-art methods in manufacturing, we don’t have any control over the dimensions of the microstructures. We can only say that on average, each microstructure is 60 micrometer in diameter. Note the word, average. Each of the lines below represent a population with average size of 60 micrometer. But it is very clear they couldn’t be any more different in reality.

You can have the same average size for multiple particle size distributions. Advanced manufacturing techniques are allowing production of columns with more predictable and robust performances.

Variation on the microstructure level gives rise to irreproducibility on the performance level. That is why the next generation of technologies in this space are utilising precision engineering and advanced manufacturing techniques to better control the performance.

What is being sold: Chromatography columns, reagents, HPLC setups

Startup to watch out for: Pharmafluidics

B. Analytical instrumentation/platforms

This is where the user spends MOST of their money. There are multiple options to quantify the abundance of proteins in a sample. Any technology platform chosen comes with its set of strengths and limitations. Cost of instrumentation itself can become an influencing factor.

Up until now, mass spectrometry has been the technology of choice for discovery. Clinical applications have been handled by immunoassays. Both these technologies have their limitations. Radically new approaches are now on the horizon. Since this will comprise bulk of the discussions in the future parts of this series, I am keeping this section short for this part.

What is being sold: technology platforms for quantifying abundance of proteins in a sample

Startups to watch out for: Somalogic, Nautilus Bio, Quantum Si, Erisyon

4. Data analysis

The user has now obtained a set of experimental data. The most fundamental problem in analysing this is that, at this stage, there is actually no direct connection between the intact peptide and its fragments which are used for identification.

It is like trying to put together a jigsaw puzzle. It is actually even more complex, because sometimes, there is more than one possibility that has an equal chance of being the right identity.

The instrument usually comes bundled with software to make sense of the data and link it to prior knowledge, thereby making it possible to identify the proteins in a sample. This is easier said than done, though. As proteins can have multiple layers of changes (post translational modifications, single amino acid variants, etc) it is a big computational challenge to readily identify species from the raw data. This creates a whitespace for developing algorithms which push the level of accuracy of prediction from obtained data. Traditionally, this has been filled by a mix of open-source and proprietary packages.

New companies are trying to build software products such that customers don’t have to wait for days or weeks to process their data. To the best of my knowledge, there is gleaming whitespace here:

Codebases are rarely, if ever optimised for efficiency and speed of operation.
Software suites are licensed to the institute and typically disallow multiple simultaneous usage. As a result, users have to wait for the previous user’s analysis to finish before commencing theirs. This can take weeks (depending upon study size)
Computation is always performed locally, on-premise and hence their is no scope of cloud-based acceleration.
The algorithms are not at all optimised for high throughput are only updated yearly (or even less frequently).

What is sold: software packages for analysing raw data obtained from analytical instrumentation of choice.

Startup to watch out for: Infineq

5. Insights

As the field of proteomics explodes in terms of promise, demand and technological maturity, users are looking at increasingly larger sample pools. This is creating a data deluge which is very different from the scenario of the early 2000s.

In a couple of days, a user can generate terabytes of data. The first challenge is to turn this into something meaningful and hopefully, draw some previously unknown insight. The second is to bank these datasets such that researchers everywhere can have access to them and keep building on the acquired knowledge. The proteomics community — and increasingly the scientific community in general — realizes that data transparency improves the level of trust among researchers, even with those in different fields. But there is no standardized format in which data is shared. Over and above standardization of format, there are also rising concerns about ensuring that quality of data clears some mutually acceptable threshold.

The proteomics community has seen the benefits of democratized clinical data of consistent quality being made available for the machine-learning community to run algorithms on. Whether data is easy to share or not, depends sorely on the format chosen by the instrument manufacturer. Institutes with thought leadership in the domain are now insisting on analytical platforms that include technologies which can simplify data sharing. The incumbents have been responding to this demand. Bruker developed its trapped ion mobility spectrometry time-of-flight mass spectroscopy platform to create a format that is available to anyone.

To the best of my knowledge, their is no commercially available tool explicitly catering to this layer. The biggest academic clusters have developed their own data processing, handling and sharing pipelines.

State of the market: fragmented demand, fragmented supply

Given the nature of the workflow, it is no wonder then that the proteomics marketplace is extremely fragmented. The same is true for the supply side. In such marketplaces, where both the demand and supply side is fragmented, there is always additional value available to be captured by consolidation. Specifically in these markets, consolidation can come through three routes:

A. Disruptive offerings that are well-differentiated from incumbents

A fragmented supply side is indicative of major disruption being long due. This is because the absence of a clear market leader in any layer means every product offering is undifferentiated in functional capabilities from its competitors. This leads to loss of pricing power for the incumbents which in turn leaves less resources being available for R&D.

This is prime territory for startups. With offerings that are definitively better than ALL incumbents, startups can start to capture large swathes of the market even at premium pricing. Illumina’s rise to near monopoly in the genomics space is an example of consolidation through this route. Over years, high margins allowed Illumina to build out technological moats which reinforced their position as the market leader.

B. Commoditisation of complements

Complements are products that are often purchased and used together: cars and gas, razors and blades, computers and software packages.

In order to use a mass spectrometry instrument, there are several other products that are needed by the customer as the sample is collected, processed, fractionated and finally analysed. These products can be thought of as “complements” to the core offering.

Let’s say the core offering of a company is the mass spectrometer. What are the complements of this offering? In other words, what will the user definitely need/buy when they use a mass spectrometer? HPLC setups, chromatography columns, reagents for sample preparation. These are now complements of the mass spectrometer. You can choose to go as far up or down the workflow as you want. Typically though, complements that are commoditised, at least as defined historically, are from adjacent layers. As an example, consider Thermo Fisher. For the proteomics market, their most expensive offerings are the mass spectrometer systems. But there is a lot of value to be captured in the complements. So Thermo Fisher also offers HPLC, columns and even reagents through Fisher Scientific.

C. Mergers and acquisitions:

Largest players in this space are public companies and they are under pressure to show growth for a continued increase in stock prices. If they can’t gain the aspired growth by capturing value in the market through product offerings, they will want to boost revenues through mergers and acquisitions. This is the primary force behind M&A activity. For instance, the two biggest players in this space, Thermo Fisher and Danaher have growth rates of 4.4 and 4.1% respectively (2019). Both have revenues in excess of $ 20B (2019). If they wanted to add 1% growth in total revenue, they need to find a company with $ 200M in revenue. There simply aren’t that many life sciences “tools” company which have that kind of revenue. Usually, the sweet spot of revenue for acquisition is between $ 30–50 M while the acquiring entity has $ 2–3B. An example of this is Danaher acquiring Labcyte when it was clocking a revenue of $ 60M. The recent acquisition of GE Healthcare by Danaher which adds $3B to Danaher’s revenue is rather large for this space. However, there are expectations that this is indicative of a trend. Biggest players in fragmented markets are willing to do unusual deals to move towards consolidation. This trend is also being seen in the revenue multiples that companies are willing to pay for acquiring assets, moving towards 7X (as seen for Danaher-GE and Thermo Fisher-Brammer Bio) from the usual 5X.

Enterprise Value/Revenue tracked across 10 years. (Source: S&P, Capital IQ)

The incumbent advantage

Traditionally, customers have been able to buy reagents, tools, instrumentation and occasionally, software packages to solve their problems in this field. That has been the dominant revenue stream. The size of the customer can vary, from the academic researcher to the clinical diagnostics company itself. What does not vary however, is what is being sold and how.

Technology development in this space is both difficult and extremely expensive. It is tough to find data to support this claim, but it would seem that a lot of promising technology doesn’t mature enough to reach product level. In the event that a new technology does manage to reach product level maturity, it struggles to replace incumbents if the benefits are not striking enough. This is due to the fact that incumbents enjoy “lock-in” privileges.

As a user, if you have spent a lot of money buying an equipment (and building workflow around it) that performs as state-of-art does, you are much less likely to use something else only marginally better.
From the users’ perspective, there is a lot of scrutiny to overcome. If your peers are reporting data acquired on the incumbent platform of the space and you choose a different platform, you are bound to face additional scepticism in a field which already struggles with reproducibility. This is equally true when trying to build diagnostic assays that will ultimately need to be regulated. A trusted platform that is widely used and has historical precedent in literature will see lower regulatory hurdles when used in clinical applications.

Hence the target customer often wonders, “do I really need it?”

Another tall hurdle that new technologies struggle with is cost. It is a fact that R&D in this space is expensive and companies typically require several years to translate a concept to a product, often pulling little or no revenue during this period. After a lot of expensive R&D has been incurred and there is a product to be offered, new companies would ideally want to amortize the cost into several years worth of sales cycle. This can, sometimes, push the price of an offering into unviable territory. In turn, it creates a dichotomy that emerging players in this space have always had to juggle with.

Need to get to the sweet spot of revenue for being a prime acquisition target
But can’t take on unsustainable amounts of capital to get there

New business models

As entrepreneurs hone their disruptive products in the life sciences space, market realities force them to think of new business models to capture the most value that they can in a fragmented market place. Broadly, there are three trends through which this is happening.

Emerging business models in the proteomics marketplace

Fee-for-service: bundling

Even when the value proposition of a new approach for quantifying the proteome might be very clear to the customer, it can end up being prohibitively expensive for a single user to procure a system. Companies are trying to solve this problem by vertically integrating across layers of sample preparation to insights. Essentially, the customer collects the sample and ships it to the company offering their service for a fee. The company will prepare the sample, measure it, analyse the data and send the findings over to the customer.

The customers doesn’t need to buy an instrument, pay for its consumables or maintenance or even the manpower to operate it. On the other hand, the company can enjoy economies of scale because they can procure/manufacture reagents at lower costs, run their instrumentation at maximum possible utilisation and continue to grow their eventual customer base. However, fee-for-service models have generally proven to be lower-margin businesses and have hard scalability limits. It makes sense for startups to offer this model to grow the customer base while they can drop the cost of the core offering.

Fee-for-value: the data advantage

When selling to research users in academia or corporate labs, fee-for-service can quickly hit some pricing limits. However, when selling to customers procuring services for clinical proteomics or even drug discovery in biopharma, the prices can be much higher and potentially lucrative. Additionally, at population level studies, such as the two use cases described above, datasets are orders of magnitude larger than the users in research. Hence, there is now additional value to be captured if companies can mine the data for insights and thereby transition from a fee, simply for service, model to a fee for value model. Going forward, you can expect to see more companies book revenue through this channel while using the margins to develop a consumer-facing offering.

Complement the incumbent

For lack of better terminology, we will use this phrase to describe a model when a company finds the weaknesses of the incumbent technology that a customer is already locked in to, identifies the most critical bottleneck they can relieve for the customer who is locked in and offers a product to attack those pain points specifically. These companies build technology that essentially fit into existing workflows and cover up the limitations of the incumbents. They don’t aim to replace the incumbent, they build to complement them.

We will discuss this model with Seer Bio as an example. Seer has identified that their biggest opportunity is that their customer already have a mass spectrometer they have invested roughly $ 1M in and is still not able to probe the proteome with the depth they would like to using usual workflows. Hence, Seer built their offering around the accepted workhorse, rather than try to replace it altogether. Seer’s offering is an automated platform that performs proprietary sample preparation (deep dive in later chapters) which in turn enables the mass spectrometer to “see” 10X better. Additionally, they have built software to analyse data and calibrate the mass spectrometer to its newly acquired “visual prowess,” finally returning the actual value of abundance of particular proteins to the user.

Concluding thoughts

The next wave of companies building product offerings that seek to address the longstanding problems in the proteomics domain will face a new set of challenges that, perhaps, the incumbents didn’t. On the other hand, the cost of iteration in building deep tech products has also been dropping due to convergence of fantastic progress in biotechnology, precision engineering, advanced manufacturing and computation-driven design. It remains to be seen what effect this new paradigm has on

the proportion of early technologies that mature into products and
the cost of bringing new products into the market

This article is by no means exhaustive and only based on personal study. If there are interesting developments that you are aware of and I have not covered, please feel free to reach out. I am always looking to learn more. If you would like to collaborate, again, please write to me.