A biotech’s product is data. Design your IT accordingly.

Jake Feala
Dec 27, 2017 · 15 min read

You might think you’re making new medicines, but if you’re a biotech startup, your primary deliverable over the next 5–10 years will be data.

At every critical stage — from the weekly research meeting, to the CEO presenting to her board of directors, to the first IND submitted to the FDA, to the IPO filing with the SEC — the outcome depends on successful delivery of a data package. So why is the choice of software tools for managing and communicating that data — such an essential part of your company’s operations — often left as an afterthought relegated to the IT department?

IT directors choose software based not only on user requirements, but also cost, security, compatibility with existing systems, ease of administration, vendor relationships, and familiarity. However, those in IT often don’t have a deep understanding of the user requirements of scientists, so the other aspects receive a disproportionate amount of attention.

Instead, software decisions should be made early, should involve all senior leadership as well as a representative set of users, and should be made using sound guiding principles.

In this short guide, I’ll be making a few recommendations for enterprise software for your small but rapidly growing biotech. More importantly, however, I want to try to frame the problem to help you make these decisions on your own.

A guiding principle: choose the best tool for each job

A principle is essentially a generalized mental model, distinct from the unique, case-by-case requirements. While user requirements define “what” you need, principles help explain “why”.

Most decision principles will be specific to your company, but I’d like to propose one that is often overlooked:

Use the best tool for the job.

This might also go hand-in-hand with the principle “use tools that do one thing well,” but not always. Products tend to try to absorb more aspects of your workflow (i.e., add features) over time, and some might actually succeed at being the best at more than one job. It’s also often cheaper to simply leverage the Microsoft Office suite of tools to do as much as possible. But there’s an inverse correlation between the number of distinct jobs a tool can do and how good it is at at those jobs.

Trade general-purpose for high-quality if you must, but remember that you can’t get both. I much prefer using a larger number of streamlined tools that do one thing very well, rather than only a few general-purpose, bulky software suites.

Now think about your own decision principles. For example, where does your company stand regarding security, transparency and openness, remote access, scalability, price, and flexibility? Are there any other aspects where you have a strong, well-considered opinion?

Reframe requirements as “jobs to be done”

Choosing the best tool for the job requires that we first define those jobs. A good approach is to scan your current data-related workflows for the various “jobs to be done” by software.

Here are some common categories of data and communication that I’ve encountered in the day-to-day world of biotech companies, each of which can be thought of as a “job” to be handled by well-designed, specialized software.

  1. Research knowledge base
  2. Interpersonal communication
  3. Project management
  4. Journal articles
  5. Metadata
  6. Protocols and process details
  7. Raw data
  8. External experts

No doubt these needs can be addressed by email, PowerPoint, Word, and Excel, and most companies do. But these are the companies that feel overwhelmed by their data, and their missions suffer for it. I’ll go through each of these categories, describe the problem and some example use cases, and, where possible, recommend a software tool that I’ve seen some success with in the past.

1. Research knowledge base

Problem:

The primary mode of communication within biotech is through email and Powerpoint

In academia, the primary medium of communication is the scientific journal article (also called “publication,” “paper,” or, collectively, the “literature”). From the breadth of the literature survey provided in the introduction, to the depth of detailed methods and supplementary data, journal articles are the gold standard for storing and transferring scientific information. However, the fast pace, constant deadlines, and tight-knit teams in the world of biotech startups have caused us to ditch the formality and rigor of the peer-reviewed article that we relied on in academia.

This isn’t all bad, as it would of course be ridiculous to write an entire article each week on your latest research results. The problem is with what has replaced articles. Unfortunately, the primary mode of communication within biotech is by email and PowerPoint.

I don’t need to bemoan the laziness and imprecision of a PowerPoint culture, because it has already been done for me a hundred times. The main takeaways are that

  1. Slides were designed for the purpose of providing talking points for an in-person presentation. This verbal context is lost when slides are stored and shared electronically.
  2. The constant use of bullet points required by the slide format has little room for nuance. In the words of Amazon CEO Jeff Bezos: “Powerpoint-style presentations somehow give permission to gloss over ideas, flatten out any sense of relative importance, and ignore the innerconnectedness of ideas.”
  3. Individual slides get copied from deck to deck, making it hard to find the original information, track modifications, and search for figures or information.
  4. The use of PowerPoint caused the Columbia shuttle to explode.

We use PowerPoint for one reason: it’s easy. In fact, there’s nothing wrong with slapping together figures and highlights in a slide deck to chat about recent results in an informal weekly update meeting. The problem arises when your presentation slide deck is saved to disk, emailed around, and, by default, becomes the “official record” of the data contained within.

Solution: Wikis

You’re not alone: almost every biotech communicates its internal research results via PowerPoint slides. But I propose a better solution: web-based wikis.

The wiki has several properties favorable for scientific communication, that are in fact even superior to the traditional publication model:

  • Live document, can grow with time
  • Web-based, making it easy to link out directly (e.g., to websites, lab notebooks, LIMs, internal databases, or other research pages)
  • Can be linked to, even to specific sections or figures
  • Enables collaboration
  • Enables and encourages comments
  • Tracks revisions
  • Can be organized by hierarchy and tagged with one or more descriptors
  • Can absorb legacy PowerPoint decks

The advantage of using web links to connect scientific documents cannot be overstated. In a scientific environment, the hyperlink is like an electronic surrogate for our synapses. Knowledge is essentially a web of connections of information, and the more of these connections that can be made available across the company, the stronger its shared knowledge. A wiki can serve as a repository of these connections, matching electronic lab notebook entries to visualizations as well as raw datasets in a database. In fact, empowering and requiring all of your enterprise software to create and accept hyperlinks allows it to integrate into an ecosystem where humans and machines freely exchange knowledge, centered around the wiki knowledge base.

I’ve seen this wiki/web/hyperlink culture in action at former biotech companies, and when implemented right (i.e., by fiat from the executive team, and with ongoing attention and emphasis), it can allow information to flow freely throughout the company.

Summary

Primary use case: Centralized knowledge base

Examples:

  • High-level reporting and communication of scientific research results
  • Shared operational knowledge
  • Onboarding guides

Best product: Confluence

2. Interpersonal communications

Problem

Email is great for many types of communication, but not all. While excellent for external communications and long-form private letters (incidentally, the same type of communications that snail mail was good at), email suffers many drawbacks for the high-volume, internal communications required to run a company.

I don’t plan to cover all of the shortcomings of email, but for me, there are two that are the most sinister:

First, it requires too many choices, producing friction and unnecessary cognitive load. When you write a message, you have to actively choose the recipients — do you send to the group, the whole company, or just a few people? Who should it be “to:”, and who should just be “cc:ed”? What should the subject line be? Do I bother to sign it, or can it be informal enough to dash off without my name?

Second, and more pernicious, is that it is closed-access by default, and by convention. Nobody wants to cc the company-wide email address, blasting everyone with spam, but if you don’t, then anyone not on the recipient list has no access to that information, ever, unless it is actively forwarded their way. New employees have no access to a legacy of information locked up in emails between the existing team members.

A bonus, third gripe with email is that, in long email chains, it can be incredibly confusing just to figure out who wrote what, and in what order. We shouldn’t be dealing with this problem in 2018.

Solution: instant messaging

Instant messaging is far superior for internal, operational, shared, conversational communications (everything that in-person chats are good at, and more). Modern group messaging platforms are based around topics, where both the subject and the recipients are specified as a “channel”, so you don’t have to put much thought into either when composing a new message. No signatures are necessary, and scrolling back through old messages is much more intuitive and in chronological order.

One huge benefit is the transparency offered by communication channels open to all by default, which allows managers and employees in other groups to drop in on day-to-day conversations.

By now we’re all used to having instant messaging in our personal lives, and tools like Slack, Hipchat, Glip, and Teams can provide this for work. Slack is the most popular, and for good reason, but the others are gaining ground.

Though chat can be abused, the productivity gains are massive when used well. It is now unforgivable not to provide your employees with some flavor of modern group chat platform.

  • Primary use case: fleeting communications, operational discussions, and shared conversations
  • Examples: “the plate reader is broken”, “are you free at 2?”
  • Best product: Slack

3. Project management

Problem

In biotech, the team’s work plan is often something shared periodically by the boss on a PowerPoint slide in a weekly or monthly update meeting. Unfortunately, this plan is quickly set aside as soon as the team dives down rabbit holes due to unforeseen problems. Nobody knows much about the status and details of the other team members’ work, and the manager and other stakeholders remain in the dark until the next update. Someone else may already know how to solve whatever problem is holding up a team member, but without real-time knowledge of the task status, these problems don’t surface until much time has been wasted.

Solution: Agile and the shared task board

In contrast, the tech industry has widely adopted web-based task management platforms, which are used daily and by all members of the team. These companies need a sophisticated tool for task management, because industrial software development has almost universally adopted Agile methodology. Agile has many variants, but is essentially a style of project management that breaks all work into small tasks, assigns a status (e.g., Backlog, Doing, Done) and responsible party, and makes the entire set of tasks visible to the entire team.

While I believe full adoption of Agile principles can provide massive benefits to biotech companies (a topic I’ll cover in a future post), merely having a shared task board can provide a huge portion of that productivity gain. Being able to quickly glance at a web app to see what everyone is working on, and the status of each task, eliminates the need for most status update meetings (which are generally reviled by your employees).

  • Primary use case: task definition, assignment, due dates, and status
  • Examples: Yesterday I worked on X. Today I’m working on Y. Nobody is working on Z yet, but it’s 3rd in the queue.
  • Best product: Trello

4. Journal articles

Problem

The current state of knowledge from the literature is a crucial driver of a biotech company’s research directions. Unfortunately, key journal articles are emailed around, and, at best, stored as PDFs in a shared drive. This mostly works — until the team grows, and the shared library multiplies— and eventually it becomes difficult to bring new employees up to speed or even keep current employees on the same page. Further, searching becomes difficult, annotations (e.g., highlights) do not always survive, and discussions about the article are kept in an email chain completely removed from the article contents.

Solution: Reference management tools

Journal articles are unique in that they have a shared format. The standard metadata (e.g., author, title, abstract) and distinct patterns of user interaction (read, tag, keyword search, annotate, cross-reference, share, discuss) lend them well to a specialized software tool.

Fortunately, several enterprise-grade software tools exist in this space, and Papers and Mendeley are the leaders. While Papers was originally a Mac app and has beautiful, user-friendly interface, Mendeley may be the best cross-platform, web-based solution.

Primary use case: storing, searching, and sharing published literature

Examples: “check out this paper”, “where’s that article about gene X?”

Best products: Papers, Mendeley

5. Metadata

Problem

Many early stage biotechs track their cell lines, reagents, samples, and other critical information using a collection of Excel spreadsheets scattered across the company.

This creates a variety of problems. Information sharing becomes difficult as people have to search for the person or shared folder containing the data they need. Versions are tracked in filenames, which loses granularity of the changes and can be named differently depending on who saved the file. Different names arise for the same entity (e.g., “K562” and “K-562”), making it difficult to harmonize and merge data from different tables or spreadsheets.

These problems are exactly what databases were designed to solve, but there is an activation energy, if not fear, of implementing a heavyweight database solution without a large, in-house IT and software team.

Solution: LIMS eventually, Airtable in the meantime

A biotech company’s relationship with LIMS comes in three phases:

  1. What’s a LIMS?
  2. We desperately need a LIMS.
  3. We hate our LIMS.

For very young biotechs in phase 1, a LIMS is a Laboratory Information Management System. Every life sciences company above a certain size has one, and its purpose is to track all of the lab inventory and operations.

For small companies in phase 2, I have a secret. Yes, you will eventually need a LIMS, but there more lightweight database solutions that can get you 80% of the way there, and very quickly. The process of evaluating LIMS alternatives, getting buy-in from all of the stakeholders, contracts, and migrating all of your data will take many months. Contact a few LIMS vendors to start the process now, but in the meantime there are some steps you can take to get your data in order.

I recommend Airtable as a “gateway drug” to a database or LIMS. The interface (either web-based or native app) feels like a pared-down spreadsheet, but it has enough features of a database to solve the problems listed above. Airtable provides:

  • Tables that live in one place and are accessible by everyone, without having to “check out” or lock files for editing.
  • Data type restrictions (number fields can only contain numbers, for example)
  • Drop-down menus for columns that have a small, predefined set of values.
  • Links between concepts. Links are native to the interface and encouraged, so connecting tables is easy (no VLOOKUPs required).
  • An API. For the technically inclined, programmatic access is available via a full-featured, modern REST API.
  • Tracked changes, full snapshots, formulas, and lots of other nice features.

Most importantly, moving your data from Excel to Airtable will force you to model your data. This essentially means deciding what you want to track, and how it can be represented solely as tables, columns, and rows. If you don’t have a database expert on-hand to help you, use the following rules of thumb from Hadley Wickham’s Tidy Data paper (and please, please, please read this paper. It is extremely accessible, and will change the way you think about data):

  1. Create a separate table for each entity (e.g., experiments, samples, cell lines, reagents).
  2. Create a column for each entity attribute (e.g., ID, name, description, organism). Each linked entity should have its own column as well. For example, the Sample table might have an Experiment field with Experiment IDs (or links if using Airtable).
  3. Create a new row for each instance or observation of the entity.

These rules seem simple and obvious, but there are many subtle ways they can be broken. Wickham provides several nice illustrations of this in the paper.

And finally, if you’re a larger organization in phase 3, sorry! Switching LIMS in a company with tons of data and many stakeholders is a complex process that will probably require expensive consultants and years of work. My only advice is to do it in small chunks: trial and prototype as much as possible. Also, consider an in-house solution if nothing fits your unique bioprocess or product. It will require a major upfront investment, but having an informatics platform tuned to your needs can be invaluable.

Primary use case: Tracking sample and reagent metadata. Manual data entry and updates.

Examples: Which cell lines do we have in house? What experiments were run on cell line X and treatment Y? Where’s my reagent?

Best product: Airtable, then a LIMS (I’m not ready to make a specific recommendation on LIMS yet)

6. Lab protocols and process details

Problem:

The detailed record of lab work is kept in a notebook. Every scientist would agree that lab notebooks are an incredibly important data resource, but there are an astonishing number of organizations that still use paper notebooks.

Paper documents are inferior to electronic data in so many ways that ELNs should be a strict requirement amongst your scientists. There needs to be a culture shift, though the lack of good electronic lab notebook (ELN) software has been partly to blame for the holdouts.

Solution: web-based ELNs

There is hope on the horizon for better ELN software, however, as several tech startups are now targeting this space. Benchling is probably the leader among the new entrants, with a web-based app that provides utilities for molecular biology as well as some lightweight LIMS capabilities.

Primary use case: electronic lab notebook

Examples: What protocol adjustments were made in last month’s experiment? How did other scientists run this protocol?

Best product: Benchling

7. Raw data

Instrument data is distinct from metadata in that, while metadata can be manually updated and allowed to evolve, raw data should be preserved unmodified, for perpetuity. Furthermore, these are so-called “structured” data, in that the data are created in a regular, fixed format that can be readily converted to tables of rows and columns if not already. And, as discussed above, the correct way to store structured data is in a database, not across multiple Excel files.

Problem:

Moving raw data from an instrument to a database seems like it should be an easy problem to solve, but laboratory instrument vendors have been woefully inadequate at providing solutions to retrieve structured raw data from their machines. Every vendor seems to have invented their own proprietary (sometimes insane) format for storing the data, and many force you to export in Excel format. Many person-hours have been spent on extracting, parsing, and reformatting instrument data, usually using Excel.

Solution: Lab informatics automation

Some commercial solutions are emerging. Many LIMS can now connect directly to lab instruments and ingest the data into their database. One startup to watch, Tetrascience, aims to connect all lab instruments to the cloud. This will allow scientists and lab managers to monitor instrument health and utilization, control experiments remotely, and, most important in this context, retrieve and structure the data to make it accessible to scientists and software applications.

A promising step toward standardizing lab instrument data is being made by the Allotrope consortium. If successful, data standards would make extraction, parsing and ingestion of instrument data much easier, accelerating the development of software in this space.

Primary use case: Retrieving data from an instrument for analysis

Examples: What was the result of my latest experiment? Where is last month’s raw data, so I can apply my new normalization method?

Best product: LIMS, TetraScience

Rolling out your software ecosystem

In this guide, I’ve outlined a number of common data and communication needs that I often see in growing biotech companies, along with some potential solutions. However, please do not follow any of these recommendations without first deeply considering your own needs.

One way to reframe the problem for your company is to lay out the “jobs to be done.” The categories I’ve provided above can get you started, but are there other roles for software in your day-to-day work? These choices don’t have to be made immediately, but any time there is software, data, or communications involved, make sure these decisions are made in the context of the larger IT and informatics ecosystem.

Pilot early and often

Only after you’ve established some basic principles, laid out the jobs to be done, and considered the alternatives, should you start piloting your short list of choices. Every good software now offers a free trial, and although there is often some up-front work to customize the software to your needs, it’s crucial to test every software tool in a real world scenario.

Software tends to be sticky once it’s used in production, regardless of whether it’s the best tool for the job. Periodically check in with your team to re-evaluate your software choices. Quickly run through your list of jobs to be done to see if the tool for that job is still suitable, or whether any of the alternatives have since improved and become the best choice.


Smart biotech executives know the importance of information technology to their success. Informatics, data science, and IT are part of a continuum and should share a coherent strategy, which itself must be closely aligned with the scientific and commercial strategy of the company as a whole. Think carefully about where your most precious assets — your data — will live and grow, and don’t overlook your software choices as an integral component of your mission.


Jake Feala is a consultant for the biotech industry, as well as founder and CEO of Outlier Bio, a platform that aims to make top-quality bioinformatics expertise available to every scientist, in every biotech, at every stage of research.

Outlier Bio blog

Thoughts about the field of bioinformatics, and how to make it better

Jake Feala

Written by

Full-stack genomics data engineer. Independent consultant. Entrepreneur in a love-hate relationship with the field of bioinformatics.

Outlier Bio blog

Thoughts about the field of bioinformatics, and how to make it better

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade