What is the cost of bioinformatics? A look at bioinformatics pricing and costs

Published in

truwl

23 min readFeb 2, 2022

Chances are, you might not have a clear picture of what your genomics analyses cost; particularly on the individual job or sample level. It’s especially hard to compare across running on the cloud vs. local compute infrastructure and working on the command line vs. using a commercial platform. My colleague, Jeremy Leipzig, asked a question about costs of running analyses on the cloud nearly a decade ago, and it’s still not clear. I started looking at what bioinformatics costs as an exercise on how to price analyses on our platform, truwl.com. As expected, it’s hard to get a clear picture because of the lack of transparency, non-straightforward and non-standardized ways that bioinformatics is priced, and what an ‘analysis’ includes.

I’ll go over some of the considerations and provide some pricing mechanisms and example costs that I have found or been told about. Some of the numbers are from open and public sources, some can be obtained from platforms after making free accounts, and some have been shared with me by users of different systems or that have been quoted prices from sales reps. As such, some of these numbers should be seen for what they are: an educated guess based on what I’ve heard or been shown with one or a few data points which may not be representative of all cases. My focus here is also on secondary bioinformatics analyses and not so much on exploring the data after it’s gone through processing workflows. Things to take into account are the time and effort from trainees and staff, computing costs, storage costs, and egress costs. Thank you to everyone who provided pricing information, either through conversations or by posting online. I appreciate it.

Time and effort

Doing analyses takes people. Even for high throughput projects where there is a highly automated system things go wrong, there is maintenance and updating, there are always special cases, and results need to be validated. Plus there’s the time and effort up-front to set up a system in the first place. For lower throughput experiments, time and effort are easily the most expensive part of an experiment; a researcher can spend months or longer to get a workflow to work and they might only have 10 samples. Add in the time to compare and evaluate different methods and learning how to use them properly and the amount of time balloons quickly. I’ve seen some very long time scales multiple times especially for individual researchers in academic environments. Although time and effort is often the biggest cost, it is also the hardest to quantitate because the amount of time spent on specific tasks isn’t tracked well, the difficulty of analyses varies by experiment, and the value of people’s time is different in different settings. In academia, spending large amounts of time doing an analysis can be completely acceptable because the analysis is being carried out by a trainee whose time is not expensive, speed is not imperative to the success of the project, and the trainee gets valuable experience by going through the process. In industry environments, the amount of time spent setting up and implementing analyses can be a much more significant pain point because the cost of trained staff is more expensive and the analysis is part of a larger initiative that has a deadline. In some cases, results are also of clinical importance and need to get back to the primary care provider in a reasonable amount of time to be of clinical utility. Although it is very situational specific, I was able to find a few cases that quantify the time and cost of bioinformatics.

Time and effort is often the biggest cost

The complete costs of genome sequencing: a microcosting study in cancer and rare diseases from a single center in the United Kingdom [1] is a great breakdown of the costs of doing genome sequencing in the clinic. According to this study which looked at genomics costs for paired tumor and germline samples for cancer cases and and trios for rare disease cases, total bioinformatics costs were 11.85% and 7.3% of the total test costs, respectively. Of the total bioinformatics costs, over half was for staff time. The total cost of a cancer case was $9326 with a bioinformatics cost of $972, with $586 of that cost coming from staff time (60% of the bioinformatics cost). The total cost for a trio was $10,145 with a bioinformatics cost of $618, $451 of that cost coming from staff time (73% of the bioinformatics cost). These costs were determined for a medium-sized lab (399 samples per year) with a complete genomics process already in place.

The UC Davis Bioinformatics core has a table of time estimates for three project types ranging from 15–40 hours. [2] Presumably, the bioinformatics core has already developed a series of tools to help with the tasks that are common for them such as these.

UC Davis Bioinformatics Core time estimates.

What about starting an analysis from scratch? In Quantifying Reproducibility in Computational Biology: The Case of the Tuberculosis Drugome [3]the authors determined how much time it took to reimplement an analysis published in the literature. The effort was a coordinated between the original author of the analysis and the lab reproducing it. They estimated that it would take 280 hours (seven 40-hr work weeks) for a novice bioinformatician — a researcher with basic bioinformatics expertise — to reproduce the analysis steps. Although this study was published in 2013 and it’s not a genomics experiment, I don’t think much has changed in published analyses becoming more reproducible and this remains a relevant study.

Clearly, when considering bioinformatics cost and pricing, time and effort from staff and trainees is a serious consideration.

Compute

Compute covers the cost of actually running the analysis on CPUs or other hardware and costs need to be looked at differently if running analyses on the cloud such as Amazon Web Services (AWS), Google Cloud Platform (GCP), and Azure, or local hardware including HPC.

The benefits of the cloud have been repetitively extolled and is the right choice for many applications. With the cloud there is no upfront cost to building infrastructure, there is little chance of downtime for maintenance and equipment failures, there are no hard limit quotas (although there are some default quotas on projects, these can be easily raised once the cloud provider has some assurance that you’re not going to run up a huge bill with the inability to pay), the amount of computing power can scale nearly limitlessly so analyses for many samples can run in parallel to complete in a fraction of the time, there is no cost when you’re not using it — assuming you remember to turn off your VM — and access can be given to collaborators across organizations and geographical locations. On the surface, pricing for compute on the cloud is pretty transparent. All three major cloud providers publish their prices on public pages: AWS, GCP, Azure. However, this is not always that informative as computing is charged by how long you use the machine and what machine size you use. When running bioinformatics tools for the first time, you might not have any idea how long a task will take. If you use a workflow language and executor such as WDL/Cromwell, or Nextflow, it becomes even more complicated because different tasks in the workflow can be handled by different machines with different resources and billing rates. Usually, the clearest path forward is to try a job with a subset of the data to get an idea of time and cost (and errors) then scale up from there: next run a full sized job, then another if there are common steps across jobs where call-caching might reduce the load, then move up to larger batches. Costs can also be decreased by taking advantaged of discount programs, which are offered by all the cloud providers. These include spot instances which take advantage of unused compute capacity, long term compute commitments, and bulk usage discounts. The caveat with spot instances is that they can be interrupted when a user that requests an ‘on demand’ instance needs that capacity, so these are recommended to only be used with fault-tolerant workloads. These programs can offer significant discounts, up to 90% or more.

Running compute jobs locally on compute clusters and individual servers can also make financial sense. Organizations that have consistent compute loads over a long period of time can save significant amounts of cash by building out their own infrastructure compared paying a cloud provider. There are significant upfront equipment and set-up costs in addition to maintenance but these can be more than offset by cost savings on compute resources if it is used efficiently. In order to still take advantage of the cloud, organizations can also take a hybrid approach to run their “normal” computing load on local infrastructure, but shift jobs to the cloud when there is a need to scale up or there are issues with the local system. Determining costs for running analyses on local infrastructure is very situational dependent impossible to quantify in a general fashion. A lot of times, an individual lab doesn’t need a whole cluster and can do a heck of a lot with a single server with a bunch of RAM that costs under $10K or even a loaded up desktop. If an organization already has local compute infrastructure in place and there aren’t significant drawbacks to using it such as time and effort (see above) and inability to install needed software (e.g. Docker) then running on local systems is the most economical thing to do as money has already been committed for the hardware and staff. Cost of bioinformatics per analysis/sample for on local compute infrastructure can be estimated by total cost (equipment costs + staff and maintenance costs + other costs such as power and space) divided by number of analyses. This is obviously over-simplistic and the costs and lifetime of the system are guesses but can be used to help decide whether or not to invest in local computing. Similar calculations can be used to determine cost on a price per hour basis to compare to cloud compute pricing.

Storage and egress

Raw sequencing data and processed results files need to be stored and this can be a much more important cost consideration than compute because it can be a recurring cost. Presumably, you only need to do compute intensive steps once, but you probably want to store the data for a long time, often indefinitely. In clinical applications primary results files (.fastq, .bam, .vcf) need to be kept for a minimum of 2 years and reports need to be stored longer (10+ years). The storage quotas on local systems that I’ve been told about are usually much too small to be used for long term storage. Buying network attached storage devices can work for some small labs were losing drives in a fire wouldn’t be completely catastrophic and a quick search shows that you can buy a Synology box with 2x4TB drives for under $1000. But for most use cases, storing data on the cloud makes the most sense.

Data storage on the cloud is priced in dollars per GB-month and in Practical estimation of cloud storage costs for clinical genomic data [4] the authors take a thorough look at storage costs for genomic data for several scenarios. They even built a nifty-looking tool for estimating costs at https://ngscosts.info/, but at the time of this writing, I couldn’t get a secure connection to the site. There are different tiers of cloud storage that are appropriate for different levels of data access. The names of tiers vary across cloud providers but they span accessibility levels of high to low availability. High availability storage tiers are for data that needs to be accessed frequently. As tiers progress from higher to lower availability (with slower sounding names like ‘Glacier’ in AWS), data storage costs decrease rapidly but other factors come into play such as minimum storage durations and paying fees if you actually need to get to your data again.

Accessing and moving data around can incur network usage charges called egress fees. Cloud providers typically don’t charge fees to put data on their platforms but there are fees to download data from the cloud to your local computer and move data out of lower accessibility storage tiers. The physical locations of where the data starts and where it moves to affects these costs; you’re going to pay more if you download your data from a data center in Ohio to a computer in China than to a computer in California. The good news is that if you do your compute on the cloud and use storage on the cloud, you shouldn’t have to move the big data files around too much and cloud providers offer features to manage the lifecycle of your data with rules such as ‘move this data to a lower cost/less accessible tier if nobody tries to access it for 6 months’.

Pricing Examples

Okay. With that background in place, it’s time to look at some real examples of how bioinformatics is priced in the marketplace. I picked examples that I think are representative and this is by no means exhaustive.

Outsourcing

Paying somebody else to do your bioinformatics is one way to go and is often priced at an hourly, per sample, or per project rate.

Bioinformatics cores: Several bioinformatics cores at academic organizations post their rates publicly. I put examples from four bioinformatics cores below. [5–8] The cost varies depending on if you are from a non-profit organization or industry and the number of samples. The comparisons aren’t perfect since the analyses performed are not the same across organizations and some cores price more modularly (e.g. pricing mapping and variant calling separately vs. providing a complete WES/WGS price) but gives a good general idea of cost.

Hourly rates and per sample bioinformatics costs at academic bioinformatics cores.

Consulting firms: A little poking around puts bioinformatics expertise in a range of $100-$300 per hour for professional bioinformatics expertise but can vary widely. These rates from professional consultants are a bit higher compared to their bioinformatics core counterparts. Surprisingly, per sample costs tend to be similar to academic core prices although the price ranges are broader. There is a nice collection of bioinformatics pricing at https://genohub.com/bioinformatics-services-and-providers/. The data there might be outdated, but not outrageously so.

Bioinformatics Platforms

Bioinformatics platforms provide researchers access to bioinformatics capabilities without using the command line. In this respect they are a self-service type model, but platform companies also provide support and services. There are several platforms out there. I’ll focus on some of the better-known platforms and those that are representative of a range of pricing models. One of the benefits of platforms that I leave out here is data access. In some cases, valuable data can only be accessed and analyzed through bioinformatics platforms, so if you want the use the data, you have to use the platform too.

DNAnexus: DNAnexus is an enterprise platform with a focus on security. Their website shows 3 different versions of the platform: Titan, Apollo, and Portals in addition to offering support packages and other services. DNAnexus charges for enabling bioinformatics capabilities in various ways including but not limited to:

They run analyses on the cloud and mark up cloud resource costs. I haven’t checked prices recently, because my account is locked…again. The most recent data I have from a year ago showed that DNAnexus was marking up cloud compute prices about 1.5–2.5x from listed prices for on-demand instances. Storage was being marked up a bit more. I suspect that they take advantage of discount programs and their cost is significantly lower than listed prices.
They make white-labeled versions of the platform and manage it for organizations. The The UK BioBank Research Analysis Platform (RAP), PrecisionFDA, and St. Jude Cloud are all implementations of DNAnexus. The FDA first brought on DNAnexus in 2015 with an $849,000 contract for a pilot of the platform. This was followed up with a $20M 5-year management contract in 2019. Using PrecisionFDA is free to the end-user — it is paid by the FDA — and anybody can request an account (subject to approval).
The UK Biobank RAP was funded by Wellcome and the Medical Research Council. There are costs to using the platform and the price sheet is publicly available here. I did a comparison of a few of the listed prices to prices listed the London region of AWS where RAP analyses are executed. The rates for compute on RAP are actually significantly lower than than the prices listed for accessing on-demand instances on AWS so discounts are being passed on to users.

Cost comparison between instance types on UK Biobank Research Analysis Platform and AWS on-demand prices.

They form collaborative partnerships such as with the Regeneron Genetics Center (RGC). DNAnexus and RGC collaborate on RGC’s large-scale sequencing efforts using the DNAnexus platform. Interestingly, Regeneron became an investor in DNAnexus in their last $100M round.

Seven Bridges Genomics (SBG): SBG is a bioinformatics platform not unlike DNAnexus although there is different technology underneath the interface and their business model is different. According to the SBG website, SBG does not markup cloud resource costs and passes the cloud resource costs on directly to the end-user. I’m not sure that SBG doesn’t generate revenue from this channel though. I’ve talked to users of the SBG platform and SGB requires that all analyses are run through SBG owned cloud accounts which doesn’t allow their clients to use their negotiated rates with cloud providers. If SBG has rates as favorable as some of their clients, they don’t seem to be passing on the savings. SBG tends to focus on working with large organizations with fat wallets. One contact told me that they were quoted ~$1M to get access to the SBG platform for their (not big or rich) organization. They also have sizable collection of programmers and data scientists for developing pipelines and working on custom projects. Between platform access, pass-through cloud resource costs, and services, large organizations can easily rack up bills of many millions of dollars per year.

Terra: Terra is unique because it is developed and maintained at a non-profit institution (the Broad) but funded through grants and industry partnerships; notably Google and Microsoft. Access to Terra is free although you need a Google account (or maybe a Microsoft account in the not so distant future). Terra also takes themselves out of the billing game altogether. The app connects to the user’s Google billing account and launches jobs in the the user’s own workspace and cloud resource usage costs are paid directly to the cloud provider. This is great, if you have access to your own or your organization’s Google billing account. At the time of this writing Terra seems to be going through a transition as I’m getting messages about workflows only being available in the legacy UI (Firecloud) and that I am about to access a government website when trying to access workflows. I don’t remember any of this from before and might have to do with Terra’s integration with AnVIL. Terra features shared workspaces which are a nice resource particularly because shared workspace descriptions provide time and cost estimates for running the workflows in the workspace.

Time and cost estimate to run the `1-Mutect2-GATK4 workflow in the` terra-outreach/CHIP-Detection-Mutect2 workspace `on Terra.`

Time and cost estimate to run the workflows in the help-gatk/Reproducibility_Case_Study_Tetralogy_of_Fallot `workspace on Terra.`

Terra is very cost effective if they have the methods that you need — which is not a given as they are really only focused on methods developed and used by the Broad. Yes, you can launch workflows on Terra from Dockstore, but there is no guarantee that these will work.

Notably, the team at the Broad worked with Google engineers to get the cost of whole genome analysis from fastq’s to vcf down to around $5 (I think it came out closer to $6, but who’s counting? I guess I am.) This was a big achievement and that original shared workspace is now superseded by two newer versions.

Basespace: Basespace is Illumina’s bioinformatics workbench and is slowly being being phased out by the Illumina Connected Analytics (ICA) platform. Back in the day, running some analyses on Basespace was included in the price of consumables, but now has to be paid for separately. James Hadfield made a nice blog post about that here. You can make an account on Basespace for free but executing jobs requires a $500 annual subscription fee and purchasing iCredits. An iCredit is equal to $1 in Basespace-land and is Illumina’s way of making users pre-pay for analyses rather than billing after the fact. Each app on Basespace has its own pricing, but 3.00 iCredits per node hour on the cloud is typical. I couldn’t find documentation on what Illumina defines as a ‘node’ but I assume it is equivalent to a virtual CPU. If this is the case, Basespace is marking up compute costs by about 75x over the listed cloud prices (about $0.04 per virtual CPU per hour assuming 2GB of memory per CPU). Even if a node is a machine with multiple cores, Illumina is marking up compute costs by a lot and assuming Illumina has cloud rates lower than the listed rates, their margin is bigger. Even though this is a large markup it might make sense to do some analysis on Basespace. If you only have ten samples and each analysis would cost $0.25 directly on the cloud or $18.75 on Basespace, you’d be paying $2.50 on your own or $187.50 on the platform, a difference of $185. Assuming you paid several thousand dollars to collect the data and it would take you several days or longer to set up analyses on the cloud, this is still a small cost in terms of the complete experiment cost and is definitely worth it for the convenience. I don’t know of any organizations that use Basespace for high throughput compute intensive analyses though. Although the markup seems high, Illumina is not alone with a high percentage markup plan. I have been told that One Codex, a platform for microbiome sequencing data analysis, marks up their compute costs even more.

Galaxy: Galaxy is a publicly funded, academic developed and maintained platform (and community) that is meant to be deployed by individual institutions. Galaxy was one of the first bioinformatics platforms out there and was built to run on HPC, but has since been adapted to also run on the cloud and along with Terra is included in AnVIL. There are Galaxy instances that are publicly accessible but these tend to be limited in the resources you can access. The main cost of a Galaxy server is having somebody (usually full time) to deploy and maintain it assuming there is already some infrastructure available to run it on. Several Galaxy developers have also started a company to sell a commercial version of Galaxy with support.

10x Genomics: The 10x Genomics Cloud Analysis platform is only for 10x single-cell data and at this time only runs cell ranger. Analysis on the platform is free for 10x customers, which means it is included in the cost of the 10x kits you buy. Given the cost of 10x assays, the cost of computing is not a significant part of the overall experiment cost — although it takes significant cost to develop and maintain a platform. This is a good strategy as 10x needs to enable its customers to process the data and labs are used to paying for consumables but can be resistant to paying for bioinformatics. Other assay companies have folded bioinformatics into the cost of the kits and either provide free access to a platform or provide a portal to upload data and bioinformaticians will process your data and return it to you. Bluebee was a platform for this kind of methods hosting that I saw in use by IDT, but I haven’t seen much of it since its acquisition by Illumina which makes me think they acquired it for the parts that will be put into ICA. Companies also use DNAnexus to provide ‘free’ bioinformatics to their customers. As I understand it, there is not a way to pass compute costs on to the end users, so the costs must be completely covered by the companies. To put some numbers on things, I estimate the cost to build a basic cloud-based platform that enables end-users to analyze data with very limited capabilities and methods near the $1M mark, but can quickly go up from there once you add methods, manage many users and share accounts across organizations, upgrade security, and add features. I have a couple of examples from companies that have used hosted solutions and these packages start around $50K and go up to $300K and beyond plus additional fees for services and cloud resource usage.

What is reasonable?

First of all, bioinformatics has costs and should (and must) be paid for somewhere. There has been some sentiment that bioinformatics should be free, which is not reasonable or attainable. The idea of free bioinformatics is a myth and the costs have been hidden in the price of consumables, are being paid with somebody’s time, is covered with grant overhead, and are being absorbed by resources that have already been committed to staff and infrastructure. On the opposite end of ‘free’ bioinformatics is the idea that bioinformatics will be the most costly part of the experiment. On the Q and A section of the Genome Informatics Facility (GIF) site at Iowa State University, [9 ]it says that bioinformatics will be 1x-2x the cost of sequencing. In 2010, as sequencing costs were plummeting, it seemed that the cost of bioinformatics could easily surpass the cost of sequencing and resulted in some catchy headlines such as The $1,000 genome, the $100,000 analysis?. [10]

Estimated bioinformatics cost answer from GIF at ISU.

As demonstrated with the above examples there are several ways to charge for bioinformatics: including it in the cost of consumables, markup on resources, flat costs per sample, fees for platform access, and charging for time and services. Paying for the analysis can be paid for directly by the end user or passed on to them indirectly. What is lost in many of these, is the direct link between value and cost.

The idea of free bioinformatics is a myth

We asked some of our users what they think is appropriate for bioinformatics costs and most said that bioinformatics should cost between 5%–15% of the total experiment cost. So if they spend $250 for an RNA-seq assay, spending $25 on processing the raw reads is acceptable. When we received our first STTR grant, we took advantage of NIH’s Niche Assessment Program and the consulting firm evaluated the idea of marking up cloud compute resources. Respondents thought a markup of 10% was reasonable. To be fair, most respondents were computationally sophisticated, had their own systems in place, and wouldn’t benefit a lot from the ease of use and accessibility that our platform provides. I don’t think markups of several times the raw compute costs is unreasonable as the value of bioinformatics platforms is not in the raw materials (compute) but in the time savings, convenience, job tracking, user management, and myriad of other benefits. Just as the value of a painting is not compared to the original cost of the canvas and paints, the value of having data processed does not necessarily correlate with computing costs.

We have several RNA-seq pipelines on Truwl: the ENCODE total RNA-seq pipeline, GTEx, and BioWDL RNA-seq. Compute costs for these pipelines ranges from a few dollars to $10 for paired-end samples with 2 replicates, depending on sequencing depth, pipeline, and options. Marking up compute costs by 10% clearly isn’t a viable business model as you’re making pennies per job, it takes a lot of work to productionize methods, and there isn’t enough throughput to allow these small margins. The cost/value ration doesn’t make sense here even though the margin looks reasonable. The point is, users are willing to pay differently based on different pricing models. They don’t want to see computing resources marked up multiple times the original cost which looks like a lot, but pricing at a small percentage of the total experiment cost seems okay. If an RNA-seq analysis costs $5 in raw compute charges, and the user is charged $25, that’s a 5x markup on compute, but still only 10% of the total experiment cost and significantly less than the $75-$540 range of the bioinformatics core per sample charges tabulated above. The wild card here is storage. Sequencing files can be big (80GB-100GB for a human WGS BAM file), and processing them produces more big files. Storing these over time at modest markups (but still more than 10%) can bring in significant revenue. Although there are also technical advantages, this is why some platforms will only work with data stored under their cloud accounts.

What are we doing at Truwl?

Truwl is a bioinformatics platform with accessibility, comparability, and broad content support at its heart. There’s a lot to say about that, but we focus a lot on productionizing and disseminating bioinformatics methods as broadly as possible and we want to enable researchers that need to process a few samples a year to those that need to process many samples daily, to find evaluate, and use the right bioinformatics methods for their projects. Researchers that span this range have different needs and we think we can cover them with with a few plan types outlined below.

Free: Some methods are not that costly to run and we think removing all barriers (except for making an account), including cost, for researchers to access them is more beneficial than trying to charge for them. Germline variant benchmarking is one of these and will be our first free workflow. The free version is limited compared to the full featured version as there are ways to incur significant charges with the full featured workflow. We envision this as the first of many “community edition” workflows and is a step towards making benchmarking more uniform and accessible.

Pay-as-you-go: For researchers that don’t need to analyze data frequently, a monthly or yearly subscription plan doesn’t make sense. Some of the most interesting science comes from projects that are not high throughput and we want to support that and enable researchers to run the analyses they need without making a commitment. To do this we are making a pay-as-you-go plan which allows researchers to just pay for the analyses they run and get a monthly invoice for their usage. Pricing on workflows on this plan will vary but will be marked up from the raw cloud costs. For less compute-heavy workflows that use a few dollars or less for compute time, it makes sense to price at a flat rate. This gives more clarity into what a job will cost. For more compute intensive workflows marking up raw costs by a factor seems more appropriate.

Subscription plans: For researchers and individual labs that need to process data on a regular basis, a subscription plan does make sense. In addition to giving discounts on compute costs, paying a subscription fee allows us to maintain dedicated compute environments and provide dedicated input and output buckets for these customers. We haven’t decided whether subscription plans will include cloud resources as Galaxy Works is doing or if subscription fees and compute resources should be billed separately. Early feedback shows a preference for separate billing.

Enterprise: For enterprise customers plans can be tailored to needs depending on support, customization, number of users, and volume.

Price transparency

It’s hard to know what an individual compute job is going to cost ahead of time, or even what is costs afterwards. To help provide insight into job costs we provide cost example tables on workflow description pages. These tables show prices of publicly viewable jobs.

Job cost table for the ENCODE ATAC-seq pipeline on Truwl.

It would be really cool — or maybe stressful — if you could monitor the cost as it is running or immediately as it finishes. Unfortunately that’s a big technical lift so we are starting with the next best thing: providing job costs within 24 hours of a job finishing. Cloud providers store data about compute costs for jobs in their databases but those do not update in real time but are usually updated daily. So we query those databases daily to display job costs to users.

Truwl job detail page with job cost highlighted.

What is right for you?

What do you think is fair to pay for bioinformatics? Have you seen other payment models that you like? I’m always interested to learn more about this area and interested to learn about what is fair, what is convenient, and what is the status quo so please reach out if you want to discuss or educate me.

1. Schwarze, K. et al. The complete costs of genome sequencing: a microcosting study in cancer and rare diseases from a single center in the United Kingdom. Genet. Med. 22, 85–94 (2020).

2. UC Davis Bioinformatics Core https://bioinformatics.ucdavis.edu/rates (2020).

3. Garijo, D. et al. Quantifying Reproducibility in Computational Biology: The Case of the Tuberculosis Drugome. PLOS ONE 8, e80278 (2013).

4. Krumm, N. & Hoffman, N. Practical estimation of cloud storage costs for clinical genomic data. Pract. Lab. Med. 21, e00168 (2020).

5. Bioinformatics and Systems Biology Core | University of Nebraska Medical Center. https://www.unmc.edu/bsbc/services/pricing.html.

6. UC Davis Bioinformatics Core. http://dev.bioinformatics.ucdavis.edu/services-2/.

7. Pricing. University of Kansas Medical Center https://www.kumc.edu/research/kansas-intellectual-and-developmental-disabilities-research-center/core-services/bioinformatics/pricing.html.

8. Rates | Arizona State University Core Facilities. https://cores.research.asu.edu/bioinformatics/rates.

9. How much does bioinformatics cost for my project? | Genome Informatics Facility (GIF). https://gif.biotech.iastate.edu/qa/how-much-does-bioinformatics-cost-my-project.

10. Mardis, E. R. The $1,000 genome, the $100,000 analysis? Genome Med. 2, 84 (2010).