To Proxy or Not to Proxy?

Madison Horgan
Infrastructure in the Anthropocene
5 min readJan 24, 2023

Research worships rationality. Scientists routinely use inductive reasoning to discover truths about the world based on measurements and observations. Scientific findings are often used to guide public policy, especially in sustainable development. However, excellence is exacting. There are many aspects of the world that are difficult to measure because the process is too expensive, tedious, or uncertain. This is often the case when wicked complexity is in play, encouraging a focus on satisficing — “making most parties happy enough”— rather than optimizing (Allenby, 2021). In the case of variables that are improbable or impossible to measure, satisficing may mean using a proxy measure, or a variable that can be estimated or computed to estimate the behavior or certain characteristics of a target variable.

Satisficing with proxy variables contributes to what philosophers of science call the “uncertainty endemic” which results from the inductive, ampliative nature of science “where the evidence for any particular claim is never complete and the power of scientific generalizations is that they go beyond the evidence available, either by extending their descriptions to cases yet unseen or by positing causal relationships and/or explanatory theories that say more than the extant evidence can” (Douglas, 2017). The uncertainty introduced when using proxy measures to inform science that guides policy decisions requires scientists to consider and judge if the evidence given by a proxy measure is strong enough to support a claim about a target variable. In this way, using a proxy to estimate data carries a particular degree of risk when the proxy is used to make a decision.

One way to minimize the riskiness of relying on proxy measures to make decisions is to assess the quality of the proxy. Statistics is commonly used to validate proxy measures by comparing how well proxy data aligns with a sample of the true measured data or tracks with other correlates to the desired measure. However, in cases where a proxy is called upon because the target variable is unable to be measured, how can scientists validate the proxy’s use?

When using a proxy measure to collect data, a scientist is making an implicit hypothesis that the proxy estimates the aspect of the target measure that is relevant for the project at hand. Justifying the use of proxy measures is a special case of inductive risk, a specific type of risk that “concerns the seriousness of potential errors and accidents, or more generally of ‘getting it wrong’ in any inductive context of inquiry” (Axtell, 2017). Error occurs when scientists reject a true hypothesis or fail to reject a false hypothesis based on proxy data. Thus, inductive risk is present in every use of a proxy measure. As such, risk analysis may be an alternative way for scientists to justify, or at the very least, elucidate, choices made throughout many stages of a project.

Describing risk entails considering both the probability and severity of a potential error. When looking at the inductive risk of proxy use, a scientist would think about the probability that a particular proxy captures the aspect(s) of the target variable correctly or incorrectly and how important it is that the proxy captures the right thing based on the potential consequences. In cases where the likelihood of the proxy variable accurately capturing the intended aspect of the target variable is low (i.e., the hypothesis that the proxy variable adequately tracks aspects of the target variable is incorrect) and the impact of the decision to be made based on an incorrect hypothesis is high, the inductive risk is high. This conclusion can then guide next steps for adjusting the project to attempt to mitigate risk. In some cases, a scientist may feel confident sharing findings from proxy data even if the probability that the proxy data represents the key aspects of the target variable correctly is high so long as the finding is not especially consequential. This could be the case if a scientists is trying to inform a small heuristic aspect of a policy instead of a claim that concerns human life.

Risks of error (e.g., a proxy not properly representing the intended aspects of the target variable and thus causing incorrect assumptions or decisions) do not lend themselves to a single universal standard for acceptance or rejection of proxies. A proxy whose inferences are meant to simply describe concepts related to a variable may not be so risky. Social and ethical impacts of a proxy being used to make an erroneous claim can largely determine the level of risk one is willing to accept: if there is little evidence to claim the proxy represents the target variable well but there is dire need for information to help form policy decisions, there could even be a negative social impact of not sharing the “riskier” finding (Douglas, 2017).

To identify and justify value judgments in science mindfully and meaningfully, scientists can ask a series of standard questions analyzing risk based on the likelihood and impact of a proxy measure not fulfilling the role it is hypothesized to. This process could proceed in four steps:

  1. Hypothesis articulation — e.g. “I hypothesize that [this proxy] will adequately capture [this aspect of the target variable] for the purposes of [this goal].”
  2. Assessment of likelihood — how likely is it that the proxy measure captures the desired attribute(s) of the target variable (i.e., the hypothesis is correct)?
  3. Assessment of impact — how bad would it be if the proxy did not capture the desired attribute(s) of the target variable (i.e., the result if a decision is made based on an incorrect hypothesis)?
  4. Statement of proper use — is the proxy adequate for its purpose? If not, how can the project goal(s) be revised?

In projects using multiple proxies, repeated questioning can act as gates in a Stage-Gate system where a project proceeds through stages between which gates serve as points for deciding whether the project should proceed, be revised, or cease (Cooper, 2008).

A Stage-Gate model of the infrastructure revitalization project, including key questions to guide decision-making based on proxies utilized in each stage.

Proxy variables play an important role in timely decision-making due to their relative ease of measurement. In cases where it is impossible or improbable to conduct direct comparisons of proxies and target variables, scientists can incorporate inductive risk into their justifications for using proxy measures to draw certain conclusions in different contexts. This exercise of evaluating the appropriateness of a proxy for making claims about a target variable via inductive risk considerations in the absence of comparative statistical validation give scientists a general idea of how much uncertainty is present in the results of the project and what aspects we need to be particularly sensitive of. In this way, scientists can use risk analysis to make decisions regarding whether a proxy is “good enough” for the research goal(s) at hand instead of shying away from proceeding because of the need for value judgments.

References:

Allenby, B. (2021, June 28). Solving Problems Rationally in an Irrational World. https://medium.com/infrastructure-in-the-anthropocene/solving-problems-rationally-in-an-irrational-world-474d7c28a9dd

Axtell, G. (2017). Inductive Risk. PhilPapers. https://philpapers.org/browse/inductive-risk

Cooper, R.G. (2008). Perspective: The Stage-Gate Idea-to-Launch Process — Update, What’s New, and NexGen Systems. Journal of Product Innovation Management, 25, 213–232. https://doi.org/10.1111/j.1540-5885.2008.00296.x

Douglas, H. (2017). Why Inductive Risk Requires Values in Science. In K.C. Elliott & D. Steel (Eds.), Current Controversies in Values and Science. Routledge.

--

--