The new NIH ‘Rule of 21’ Threatens to Give Up on American Preeminence in Biomedical Research Based on a Flawed Concept and Flawed Analysis

updated June 8

Updated: There is an addendum at the end. (June 8)

SUMMARY

The planned ‘Rule of 21’ change to NIH funding [1] is the most dramatic change in NIH funding in decades, is largely unvetted, and will have a seismic impact on American biomedical science. There are fatal problems with the proposal. These problems fit three categories: The Data, The Concept, and Other Failings.

The Data. ‘The Lauer Graph’ [2] is the primary basis for NIH Director Collins’ proposal for capping scientists at a three research grant maximum (3 R01 equivalents, including collaborations = a score of 21 = Rule of 21), or even less, based on the conclusion that the graph shows “strong evidence of diminishing returns.” That conclusion is not supported by the data. The actual data shown on the graph are that, on average, an NIH-funded PI with three R01-equivalents has over six times the impact of a PI with only one R01-equivalent. A PI with four R01-equivalents has eight times the average impact of a PI with only 1 R01-equivalent. In each case that’s an increase in productivity. The opposite of the NIH conclusion.

The Concept. The concept of applying the economic theory of diminishing marginal returns to biomedical science is fundamentally flawed. That economic concept is applied to maximize average productivity, not groundbreaking discoveries. American science has always been ‘going for the gold’, aiming to be the best of the best, the greatest in the world, like many American endeavors. In the Olympics, or professional sports, if you want the best team you do not take many individuals and optimize their average speed and skills at the sport. Instead, you take the best-of-the-best and apply your resources there, for the best possible outcome. Applying the concept of diminishing returns fails, because it optimizes for the average, not the extraordinary accomplishments. Extraordinary accomplishments in science are high risk endeavors that can require significant resources, and that approach has made American biomedical research the best in the world for decades. Going forward with the ‘Rule of 21’ as a new national policy is a direct statement that the goal of the NIH and American biomedical science is to maximize average productivity and produce a larger volume of ‘pretty good’ science at the expense of generating the best biomedical research discoveries in the world. The new policy bans a group of 1000+ scientists from participating in that competition, because of their previous success.

Other Major Failings of the ‘Rule of 21’. The Rule of 21 is anti-collaboration and anti-training, with punitive punishments for scientists that engage in either of those very important aspects of science.

THE DATA

It is critical to directly discuss ‘The Lauer Graph’ [2], because NIH Director Francis Collins made the conclusions of that graph/blog post [2] the central rationale for his May 2, 2017 announcement of changing the fundamental funding policy of the NIH [1]. That graph is the primary basis for the argument for capping scientists at a three research grant maximum (or less, see below) [1,3] based on the conclusion that the graph shows “strong evidence of diminishing returns” [2]. That conclusion is not supported by the data. The actual data shown on the graph is that on average an NIH-funded PI with three R01-equivalents has over six times the impact of a PI with only one R01-equivalent. A PI with four R01-equivalents has eight times the average impact of a PI with only 1 R01-equivalent (Figure 1A) [4]. In each case that’s an increase in productivity with increased funding. A massive increase in productivity! A near doubling or more of productivity per grant over the benchmark productivity (i.e., 1 R01). The opposite of the NIH conclusion.

Lauer’s conclusions were based on plotting the data on log10 scales with vastly unequal axes (Figure 1B)[2]. Showing the data from the curve fit on a linear scale reveals a clearer picture, as the viewer can directly compare 1, 2, 3, 4, and 5 R01-equivalents when shown equidistant on the scale (Figure 1A).

Figure 1. A. The ‘Lauer Graph’ data from [2] and [8] plotted on a linear graph. One R01-equivalent equals 7 RCI points. B. The Lauer graph as originally published [2]. Note that the dotted line would actually represent a massive increase in productivity per NIH grant starting from below 1 R01, because of the log scale (Y=0.85, X=7). The graph was published as log10 with vastly unequal axes.

Regarding the actual data in the graph, they use by far the largest PI dataset available, and a single quantitative metric of scientific impact (Weighted RCR. Weighted RCR is essentially defining an average paper’s scientific impact based on actual citations as 1 (normalized by research field), and a given scientist’s output can be compared against the average by adding together the citation count for that scientist’s papers to give the Weighted RCR over a given period of time).

I do not think productivity is truly measured by these publication citation metrics — they ignore fundamentally important biomedical contributions of device development, diagnostics, and advancing novel concepts and cure ideas into clinical trials that are not necessary reflected by paper citations — but if the NIH is determined to make massive policy decisions based on these reference numbers — the most seismic change in decades — it is worth knowing what Lauer’s arguments are and what the numbers actually show.

The 1st version of the Lauer graph was published in Jan 2017. There was a second version of the graphs, posted in Feb 2017 [6]. That version showed the actual scale of the X axis, but again has heavily skewed axes. It heavily weights the graph based on data points below a single R01-equivalent grant, and massively extends the vertical axis over a 100,000-fold scale, giving the same impression of diminishing returns because of the log scales. A slightly relabeled version of the same Lauer graph is what NIH Director Francis Collins presented to the House of Representatives [5], (Figure 2A) again with heavy bias to inconsequentially low numbers less than a single basic NIH R01 (highlighted in red in Figure 2A). That graph shows the same data, with the same fundamentally wrong conclusions. When shown on a linear scale there is no indication of significant diminishing returns between 1 and 5 R01s (Figure 2B). Again, an NIH-funded PI with three or four R01-equivalents has substantially more average impact than a PI with only 1 R01-equivalent (Figure 1A).

The relevant comparator group is PIs with exactly 1 R01. Why is this? Two reasons: 1) Most NIH funded PIs have exactly 1 R01. Thus, one R01 is the key benchmark for productivity. 2) The Rule of 21 policy is designed to create more PIs with exactly 1 R01. Thus, the relevant comparisons are PIs with 3, 4, or 5 R01s compared to 1 R01.

Now consider the Rule of 21 policy in light of productivity: based on the Lauer data, the Rule of 21 would clearly worsen NIH investment for productivity by creating more scientists with 1 R01. When one simply looks at the Lauer curve data transposed on a normal linear graph, as discussed, those data indicate that having a single R01 is about the least productive NIH funding situation, far less productive than 3–5 R01s. Having a single R01 is a very unproductive investment of NIH funds, by that graph. Essentially the opposition of what was initially claimed. This is also in conflict with the conclusion Collins passed on to Congress. As stated by Director Francis Collins in his announcement of the ‘Rule of 21’ on the NIH website, implementation of a GSI limit would “broaden the pool of investigators.” Adding more PIs with a single grant (One of Francis Collins’s first statements on the Rule of 21 in his congressional testimony last Wednesday was, “…above about 3 grants per year it gets pretty flat. That says that those dollars are not giving us as big an impact as if perhaps they were given to somebody who had no grants.” [6] i.e., creating more PIs with a single R01).

Does that mean the NIH should ban single R01s? No, but it shows that the data do not at all support what was claimed. Hopefully Collins can be convinced that what he was initially led to believe about productivity and the Lauer curve is not actually true, as it applied to at least up to 5 R01, based on values for 1–5 R01s extracted from the Lauer log plot.

Figure 2. A. The graph shown by NIH Director Francis Collins to the House of Representatives Subcommittee [5]. Highlighted in red here is the area represented by < 1 R01-equivalent, highly skewing the visualization. B. The data from the curve fit shown on a normal linear scale. Same graph as Figure 1A.

Instead of observing a positive association between grants funding and productivity, Lauer has instead insisted and based all of his claims on, the assumption that the slopes between the grant levels can be used to calculate diminishing marginal returns. That assumption is false. The derivative is not a meaningful metric in the study, as first pointed out by the Stanford letter of concern. Causality cannot be assessed by these RCI: RCR plots because temporal data and inputs and outputs are not assigned. This analysis does not allow for conclusions of causality. That inputes a level of temporal knowledge not present. The presented data are aggregated; cause and effect are disconnected. One cannot tell if among those PIs with higher RCI points whether the majority of the return on investment is averaged across them or is temporally associated with their earlier or later grants. The plot does not show that a PI who achieves the award of a 4th R01-equivalent accomplishes less than they achieved with their previous R01. The assumption of ‘stepping’ along the graph is simply not supported by data. Calculating 1st derivatives assumes a knowledge of causality that they do not have. Nevertheless, the new policy is based on the erroneous conclusions of causality. The Policy of 21 declares, based on erroneous assumption of causalities from this curve, that a group of 1000+ highly productive scientists be banned from merit-based peer review.

THE CONCEPT

The concept of applying the economic theory of diminishing marginal returns to biomedical science is fundamentally flawed. That economic concept is applied to maximize average productivity, not groundbreaking discoveries. American science has always been ‘going for the gold’, aiming to be the best of the best, the greatest in the world, like many American endeavors. In the Olympics, or professional sports, if you want the best team you do not take many individuals and optimize their average speed and skills at the sport. Instead, you take the best-of-the-best and apply your resources there, for the best possible outcome. Applying the concept of diminishing returns fails, because it optimizes for the average, not the extraordinary accomplishments. Extraordinary accomplishments in science are high risk endeavors that can require significant resources, and that approach has made American biomedical research the best in the world for decades.

The first person to run a marathon in less than two hours will make that extraordinary accomplishment in direct conflict with the concept of diminishing returns. That person will have put in immense hours of training, heavy sacrifices, and massive work to be the best in the world. But, if you were to define that in terms of diminishing returns, that person is only 1% faster than the second fastest person, and the accomplishment would not be worthwhile. Being the best in many human endeavors involves being incrementally better to reach new heights. Biomedical science is the same; reaching new heights can require concentrated efforts.

The concept is capping scientists at the relatively low fixed threshold of ’21 points’ is intrinsically anti-competitive. At that level it clearly becomes a statement that the NIH has decided that PIs with 1 or even no R01s are inherently more capable than scientists that have 3 R01s. In all cases, across the board. Even is a scientist has the #1 best idea, as judged by a panel of scientific peers, that will be discarded in favor of a grant with a lesser idea. Conceptually that idea is fundamentally anti-competitive and anti-meritocracy.

Going forward with the ‘Rule of 21’ as a new national policy is a direct statement that the goal of the NIH and American biomedical science is to maximize average productivity and produce a large volume of ‘pretty good’ science at the expense of generating the best biomedical research discoveries in the world. Lastly, as noted above, the data don’t even support that conclusion, even if the concept were valid.

Even if the concept were valid, and the data indicated a defined point of diminishing returns — neither of which is the case — how would it be logical to define the inflection point (slope change) in average productivity to be the maximum allowable grants? That is the equivalent of arguing that climbing 29 hills 1,000 feet tall with your FitBit on is the same as succeeding in climbing Mt. Everest. Accepting that logic becomes a race for a lowest common denominator, not a race to be the best biomedical research institution in the world.

OTHER MAJOR PROBLEMS WITH THE ‘RULE OF 21’

The Rule of 21 Policy is Anti-Collaboration

On the multi-PI side, collaborations are clearly discouraged by what is proposed. A three PI R01 hits each of the 3 PIs with 6 points. Three such R01s and everyone is capped and each of the three labs would only have the equivalent of a single R01 worth of money. If the scientists had instead pursued non-collaborative science, each scientist could obtain 3 times as much resources before hitting the ‘Rule of 21’ cap.

This anti-collaboration skew is not a theoretical concern. I currently participate in a multi-PI collaborative R01 grant to test innovative vaccine strategies and technologies in non-human primates. There are essentially 3 PIs. The bulk of the money goes to a non-human primate facility. One of the PIs’ budget is $89k, and he will be penalized 6 points in the new regime for his efforts to collaborate instead of ‘staying in his lane’. Scientifically we are succeeding well with the grant, making progress that none of the three of us could have made otherwise — nor anyone else that I am aware of — but starting this grant would not be possible now because of the severe anti-collaboration NIH points penalty.

The Rule of 21 Policy is Anti-Training

Not only are the new NIH rules anti-collaborative they are also anti-training. For the generosity of one of our faculty to help training the next generation of scientists with a T32 the NIH now declares him incapable of having a third R01. The scoring system assigns 2 punitive points for leading a T32, which precludes a PI from being awarded a third R01, even as a 100% full time research scientist.

A T32 frequently has virtually no financial value for the T32 PI. Just helping train the next generation of scientists.

In fact, in many ways the ‘Rule of 21’ is the ‘Rule of two R01s plus something else’. That’s an extraordinarily low ceiling to cap biomedical scientists, in all cases, across the board.

ADDENDUMs

June 8. My analyses of the raw data are now posted in a follow up Medium post: Considering assessments of scientific productivity and ‘ghost authors’

The GSI Rule of 21 plan has been officially cancelled, announced today.

Article text was also slightly edited.

June 1. Myself and others have now examined the raw data for the two underlying sets for the ‘Lauer graph’, since it was made available Friday. The data is a mess. And the calculations are a mess. Extensive errors and quality control problems with the underlying datasets.

May 27th. In response to response from the community, T32s were removed from the Rule of 21 [7]. Essentially all other aspects of the Rule of 21 plan remain unchanged. A preprint of the Lauer et al. study is now online [8], with the raw data.

CITED SOURCES

1. Collins FS. New NIH Approach to Grant Funding Aimed at Optimizing Stewardship of Taxpayer Dollars [Internet]. 2017 [cited 2017 May 19]. Available from: https://www.nih.gov/about-nih/who-we-are/nih-director/statements/new-nih-approach-grant-funding-aimed-optimizing-stewardship-taxpayer-dollars

2. Lauer M. Relative Citation Ratio (RCR): A New Metric That Uses Citation Rates to Measure Influence at the Article Level [Internet]. nexus.od.nih.gov. 2017 [cited 2017 May 19]. Available from: nexus.od.nih.gov/all/2017/01/26/research-commitment-index-a-new-tool-for-describing-grant-support/

3. Basken P. NIH Is Firm on Plan to Limit Per-Person Grant Awards. The Chronicle of Higher Education. 2017 May 17.

4. Lowe D. Changes in NIH Grant Policy? [Internet]. 2017 [cited 2017 May 19]. Available from: http://blogs.sciencemag.org/pipeline/archives/2017/05/11/changes-in-nih-grant-policy

5. Lauer M. Following up on the Research Commitment Index as a Tool to Describe Grant Support [Internet]. nexus.od.nih.gov. 2017 [cited 2017 May 19]. Available from: https://nexus.od.nih.gov/all/2017/02/15/following-up-on-rci-tool-describe-grant-support/

6. House Appropriations Labor House Appropriations Labor Health and Human Services and Education Subcommittee H. House Appropriations Labor, Health and Human Services, and Education Subcommittee Hearing: Advances in Biomedical Research (EventID=105953) [Internet]. 2017. Available from: https://www.youtube.com/watch?v=z_B2R7Qx508

7. Kaiser J. NIH scales back plan to curb support for big labs after hearing concerns. Science. 2017 May 26.

8. Lauer MS, Roychowdhury D, Patel K, Walsh R, Pearson K. Marginal Returns And Levels Of Research Grant Support Among Scientists Supported By The National Institutes Of Health. bioRxiv. Cold Spring Harbor Labs Journals; 2017 May 26;:142554.

ACKNOWLEDGEMENTS. A number of people contributed ideas and insights described in this article, provided via discussion and correspondence.

cover image