The Covid-19 pandemic is clearly a world-wide phenomenon which can be observed and studied from so many different ways. We have seen many opinions and analyses on its economic impact, on its biological properties, on its impact on digitization, even on the future of society given the technological developments enabling political actions once considered to be illegitimate.
In this series of posts, I want to highlight some of the questions or remarks that I had in mind while watching the rise of this pandemic, and listening to people around me and on the internet. Clearly, this pandemic induces much worry about our family, friends, and all other individuals (e.g. ranging from potential loss of job to death). Without implying any underestimation of these considerations, I would like to use this platform to elaborate on some other technical and scientific questions which arise from our situation.
Next, I will make a bit of a detour into systems dynamics and control systems theory, as I refer to some concepts throughout this post series.
Control Systems and Response to Pandemic
I am not an expert in control systems. I do not have an engineering degree, but a general understanding of systems dynamics. However, what we have been dealing with since the beginning of the Covid-19 outbreak can be simply put in terms of this approach.
A quick and dirty idea of control systems is as following:
- Measurement: A device measures the external state of its environment.
- Logic of Operation: The measurement value is passed on to an estimator or a logic system. If this system has some sort of memory and a processing unit, it can run some sort of operations on it for several purposes, such as noise removal, prediction of next state etc.
- Optimal Action: The device initiates the “optimal action” defined by its designer. Most of the time this optimal action is aimed at maintaining the external state at a preset value.
The simplest example of this system is found in heating/cooling systems, namely thermostats. These devices measure the temperature, run some estimations and determine the temperature of the air blown into the external system. At each cycle, the temperature is compared with a desired temperature set by its user.
Why do we care about this concept? Beginning with the phenomenon, Covid-19 outbreak, we see that our response so far has been very similar to the control systems cycle. At some point, Covid-19 outbreak became an issue important enough that at the moment all our attention is devoted to this situation.
Almost everyday we discuss about the (1) measurement problem, insufficient testing ran at different levels in different places. Following measurements, many institutions, from governments to news outlets, report the number of cases/deaths/recoveries in diverse fashion.
Hence, every day we are consuming information about the outbreak which is merely the output of an (2) estimation logic, which also specifically involves an aggregation logic. For example, the information is often reported at country level. We have to be aware that most of the information in whatever aggregated or summarized form becomes part of our everyday decision-making as well as of our everyday conversations with other people. So, the validity of this logic or awareness about the caveats surrounding the logic is crucial for the next step.
(3) Optimal action. Let’s assume we measured every case precisely, and we also had the capacity to fully comprehend and remember what number of cases/deaths/recoveries are found in any location and time. What do we do next? While engineers spend great effort to figuring this out in labs with tests, for the humankind it is a little bit more complicated. Basically, we do not have several earths and societies to run experiments with. Yet, today, decision-makers with power as well as every individual is following some subjectively believed optimal action.
… and then the cycle begins again. Whatever optimal action is taken, we move on to measure its impact, benefits and costs, and move on to adapt. This is not just an exercise of drawing analogies from engineering to our response. In fact, scholars of social theory has applied this logic to explain social structures . At a more practical level, research on functions of organizations has shown that this picture is actually an accurate picture of any organization as well. For those who are interested in more scholarly sources, some fields that are relevant to what I am writing here are (certainly not exhaustive): Performance feedback theory , very early application of aspiration level theory , systems dynamics and role of feedback etc. . As such, I believe that this analogy is a good framework to convey some of my ideas, while avoiding being esoteric. Following, I approach our society as a device attempting to optimize its survival chances by running several measurements and figuring out which state of the system requires which action to lead to or maintain the optimal survival chance.
I start with the first step of the cycle, namely the measurement problem. Testing for cases has been crucial for our monitoring the evolution of the outbreak. First, testing was not possible, because the apparatus was not developed. Then, testing become a question of whether to be done or not. Oftentimes, lack of testing was associated with political and economic intentions. Meanwhile, lack of testing is also due to scarce resources such as those who can run the tests or availability of testing kits. Also, the accuracy of the tests play an important role; a chicken and egg problem, where without testing the test we may not know what we are testing for. Luckily, our research methods have developed far enough to isolate the virus and actually establish the usefulness of the tests. However, variety of tests will still have variety of performances. Something that I would like to highlight in this post is the fundamental difference between all three causes.
- The accuracy of the tests is a familiar concept in medicine, or generally in decision-making. There are two typical errors, namely false positive or false negatives. Occasionally, a test will return positive results even if the subject was actually a negative case, and vice versa. If we assume that these errors are as good as random, systems dynamics may term this as simply noise. To keep things simple, let’s just assume so. Random noise is part of every system.
- Scarce resources causing lack of testing is a situation where we find some capacity limitations to our measurement. This is usually also taken for harmless. Those who are familiar with statistics encounter this problem even without a pandemic outbreak. With limited resources of money, time, and energy, an individual usually collects data for only a part of the population, a process called sampling. For example, statistics bureaus survey only a limited amount of people ahead of elections to draw conclusions about the general population. One big problem arising due to scarce resources is the selection problem: We may have chosen to sample only a specific part of the population, such as running Covid-19 tests only in big cities. We will only understand the data generated by people of big cities with certain international demographics. If Covid-19 test is expensive and individuals have to pay for it, we may get data, but only about those who can afford it or those who are willing to spend that money. As it was realized very early on in development of statistics field, the best way to remedy this problem is randomly picking people from the population and testing them. In short, we have to be well aware that where and how these tests are conducted gives us specific information, unless done randomly on a population-wide basis.
- Lastly, either intentionally keeping test results private or not legitimizing for wide use both constitute ways of tempering with the measurement design. First, there is no one way of tempering with design, hence it is not easy to model consequences of these actions. However, what we should keep in mind thanks to the analogy is that the intentions arise from the logic associated with optimal action. For example, China is accused by many for keeping early rapid growth of cases hidden. In such cases, the measurement results can be anything and the main question is then directed towards whether governments are actually acting with other goals in mind or not. As such, I do not include this debate in this part of the post.
A categorical note on measurement device accuracy: While we are usually accustomed to think that devices typically can misidentify cases, this is also valid for the human brain. Today, many doctors are facing the decision to categorize patients as a case of Covid-19, or even death by Covid-19. As many psychologists and sociologists have shown, individuals need to think in terms of buckets to facilitate thinking and communication [5,6,7]. These buckets are generally called representations or categories respectively. For example, fruit is a bucket (category) for apples, oranges, bananas etc. However, some fruits, such as tomato, are often misidentified as vegetables. Hence, people’s categorization accuracy matters for measurement device as well. Beyond testing for cases, this is especially vital, when it comes to considering reporting of deaths.
Measurement of deaths is without doubt very accurate. However, to attribute the death to Covid-19 or to some other disease may be false. Recent debate has focused on whether all deaths at the presence of Covid-19 (such as deaths due to heart conditions while having Covid-19) should be reported as a relevant death or not (even for identifying cases!). Many people may have diverse justifications for both ways. We could, for example, discuss whether Covid-19 was the necessary factor to initiate death, more specifically its absence would lead to survival of the patient despite other diseases, or whether it was the sufficient factor, its presence certainly leads to death. While generous reporting measures may rely on arguments of necessity, precise and focused reporting measures may argue on the side of sufficiency. Which reporting behavior is the optimal one, is either going to be determined by social debates or by looking at which one leads to better outcomes in terms of taking the optimal action. Too much reporting may lead to hysteria and excessive costs, not enough may lead to complacency and lack of necessary reaction. Once again, for those who would be interested in a bit deeper discussions of necessity/sufficiency should refer to  and in deeper discussions of how causal attributions affects our thinking and communication patterns to .
Illustrating the Impact of Measurement Accuracy Problem
To illustrate the impact of the measurement problem, we have to set up a baseline. For this, I use the widely known SIR model (S = susceptible, I = infected, R = recovered). Check out this video for a graphical description, and click here to skip the math coming up. Some equations as following:
Δ s= - α δ (s ⋅ i)
Here, s = S/N, the percentage of susceptible people, as N is the population number. Similarly, for i and r:
Δ i= α δ(s ⋅ i) - β δ r
Δ r= β δ r
α and β denote the infection and recovery rates respectively. It is important to keep in mind that both rates are not independent of social factors, e.g. social distancing and healthcare capacity. Δ s, Δ i, and Δ r denote the change in these percentages. δ is a mathematical trick to adding dimension of time into the system. Hence, if infection rate is 3 people per day (α = 3), then Δ s would correspond to change in susceptible percentage in a day, δ = 1 day. The basic intuition behind this model is that the probability of a susceptible person getting infected by an infected person is the product of susceptible and infected people’s numbers (S ⋅ I). If there are no infected or no susceptible people, then no further infection can take place. The underlying assumption (which will be called into question in the next post) is that every susceptible person meets every infected person once in the given time frame (δ = 1 day for example).
In these models what we typically observe is the initial exponential growth, epidemic outbreak, and then recovery phase, where individuals recover at a slower rate. See the following figure:
Currently, we observed the exponential growth all over the world. Using these graphs as baseline reality, now let’s look at how measurement problem may impact our perception. In this case, I assumed true positive rate 0.9 and true negative rate 0.9. With these maybe acceptable levels of measurement, we see that the real effect is amplified, when the real percentage is below 0.5. Meanwhile, if we go beyond 50%, then we see that the measurement errors diminish the effect. Also, better measurements are clearly more accurate, but more so in recovery phase than growth and peak.
What is important to keep in mind, is that we tested everybody in this population and everybody meets everybody once! This means that, if we had some extra knowledge about a community, which is most unlikely to be infected, then why would we have to run a test in the first place. The graph suggests that this will amplify the results and lead to thinking there are more cases than anticipated. On the other hand, false negatives strongly affect the numbers in high risk populations. In cities with major outbreaks, even if we test a lot, we may find smaller numbers simply due to the measurement error.
Take-away message: Measurement errors in testing may inflate the numbers when there aren’t many cases, or deflate the numbers when there are many (with respect to the total population size). Please keep in mind the accuracy of the tests! Check out the links below.
Are Coronavirus Tests Accurate? - MedicineNet Health News
New cases of the novel coronavirus continue to increase worldwide, with 73,332 confirmed global cases of COVID-19 as of…
Fast, portable tests come online to curb coronavirus pandemic
The rapid spread of COVID-19 across the world has exposed major gaps in the abilities of most countries to respond to a…
Illustrating the Impact of Resource Scarcity Problem
Case of resource scarcity is a more complex situation to analyze. While there can be many ways of looking at this, I chose to focus on two limitations:
- What happens if we test a representative sample of a population, but only at a limited rate? This tends to be a problem due to limited supply of test kits etc.
- What happens if we test a sub-population? Our tests will be mostly informative about the impact on that sub-population, but not the rest. Combined with the aggregation problem discussed in the next post, this is an important factor to keep in mind.
Testing at a limited rate
Now let’s imagine that we have the perfect test and classification of patients are not erroneous. Today, one of the biggest limitations any country is suffering from is the limited amount testing kits it has. The trade-off that testing kit producers are facing is between the speed at which the results are obtained and its accuracy. Then, clearly, better test kits tend to cost more. Here, we will only focus on how many people test kits can test in a day, and look at how that would impact the information we receive.
Before we go into analyzing some plots, I would like to talk briefly about how exponential functions work and how do we interpret them given different ways of plotting. For a deeper and visual explanation, please refer to this video, and if you are already familiar, then move to this paragraph.
Exponential functions are mathematically written as: f(x) = A ⋅ exp(Bx). We have our variable x, which is the horizontal dimension of a plot. A and B, meanwhile, describe the shape of the exponential function. See the left figure in the following plot:
Exponential growth pattern is clear. All lines start either from 1 or 2, depending on the value A, and they grow either slowly or fast, depending on the value of B. Hence, we see that A and B are responsible for determining the initial point and the speed of growth, respectively. However, when we see graphs of cases on media (see for example), we usually see them plotted by a exponentially growing vertical axis. The plot on right side shows exactly the same exponential functions with such a way of plotting. Roles of A and B did not change; they still determine the initial point and rate of growth. However, now they become lines, because the vertical axis is growing exponentially as well. What is important here, is that now we can read of the rate of growth much easier compared to an attempt to read it off from left plot: Red and orange functions have half the slope, hence half the rate of growth of green and blue ones.
Now, let’s apply this knowledge to our analysis of how rate of testing may impact what we see in reality. First, we zoom in on the growth phase, meaning we focus on the first 500 time steps of this graph. Blue line in the plot on the left, below, shows how the case of infections increase. As expected, the peak is at 500.
The clear first impression is: If there is lower capacity to test, less cases will be reported! It is rather intuitive, nothing new. When the real peak hits a little more than 80%, the testing capabilities lead to numbers around 20%. While number of cases is clearly important, when we talk about how infectious the virus is, then we are trying to infer how fast it grows, more specifically we want to know B. Just as FT plot also shows, the growth rates can be easily compared with dashed line plotted on the exponential plot, plot on right side.
Here, we find a hidden lesson: Testing capacity does not influence our knowledge of how fast the virus spreads! As you can see, testing capacity only shifts the curves around, meaning only changes A. Even with limited testing capacity, we can infer how fast the virus spreads and extends our knowledge to some actual numbers. For example, by testing our population with 25% capacity, we found out that the dashed line has slope 0.19, 19% spread rate. To adjust for the capacity, all we have to do is comparing the initial points of both curves. Estimations show that initially, blue line started at 0.3% and red line at 0.1%. Hence, at least until 300th time step, the period in which the line can be followed nicely, we would expect the number of real cases to be three times more than reported. As we reach the peak, the line is not very informative anymore.
Take-away message: Testing capacity leads to lower absolute numbers, but informs us accurately about rate of spread, if it is randomly taken from the whole population. Try to interpret graphs found on FT tracking page!
Coronavirus tracked: the latest figures as the pandemic spreads | Free to read
The humanitarian costs of the coronavirus outbreak continue to mount, with more than 462,000 people infected globally…
Testing a sub-population
What would happen if we test only a subgroup of people, for example such as those who can afford the tests? This is a more complex issue as it involves looking into the structure of our society: Those who can afford the tests may also have the privilege to stay home or work from home for longer times. From opposite perspective, those who cannot afford the tests may be exposed to less hygienic environments. Especially, with increasing infections in countries lacking certain infrastructure (see for example Africa), even access to water will matter. The illustration from this section will be fundamental for the next post, as aggregation suggest mixing different results together, without considering how they were generated.
Sub-population has been a rather silent but central discussion during this pandemic.
- First and foremost, the distribution across age groups have been central. Older people face a higher death threat than young ones. Being overwhelmed, hospitals make decisions (mostly based on age) regarding who will go into an intensive care unit, and who not. For example, these age-related arguments have been the central elements of arguing why Italy has higher death rates, or why a national lockdown is suboptimal as young ones should be able to survive the virus.
- In terms of economic power, initially, the cost of testing was 1300$ per person in USA (see transcript of discussion between Rep. Katie Porter and CDC Director). Looking at 2019 poverty guidelines and at households living right at the poverty line, this corresponds to 1) 1.25 monthly salary of a person in 1 person household, 2) 1.85 monthly salary of a person in 2 person household, 3) 2.2 monthly salary of a person in 3 person household, and so on. Most conservative implication of this is that, by assuming 11% living in poverty in USA, we can simply expect at least that many people not being able to afford a test.
- On a rather different line, the national lockdowns are impacting various groups of people differently. Those in the service industry suffer much more than others, as there is no demand anymore. Unemployment is expected to rise drastically with the given policies, which is also expected to impact poorer people differently than wealthier ones.
The question is then, how would this focus on a sub-population would impact our understanding of how the virus spreads. First, earlier investigation has shown that if we randomly choose people from the whole population, then our estimation of spread rate is not bad, even if limited. This indicates that if the whole population is facing the same infection and recovery rate, we would not expect any change in our measurements. More concretely, if poor and wealthy were facing the same rate of infection and recovery, then our estimates just on the wealthy would inform us well about the infection rate, also applied to poorer people. Maybe not the absolute numbers, but the rate, for sure. However, it gets tricky when the rates depend on who can be tested and who cannot.
Once again, guided by the SIR model, we can capture some of these effects by looking at how individuals position in social world impacts infection α and recovery rates β. To keep the illustration simple, we will assume that half of the population is the baseline (group 1), meaning follows exactly the same dynamics as we built so far. The other population (group 2) is subject to categories of two-by-two: 1) They are the same as the baseline, 2) They have higher infection rate, same recovery rate, 3) They have lower recovery rate, same infection rate, 4) They have both lower recovery rate and infection rate. By studying these four cases, we can then complement with our social knowledge to start interpreting measurements in context. For each case (except the first one, as it does not add anything new to the model), I will present i) measurements focused only on group 1, ii) measurements focused only on group 2, iii) measurements done 50/50 from both groups.
Our first assumption is that these groups do not meet each other. Hence, infections happen only in separate groups. The results are as following:
To summarize what the plots are telling us:
- Case 2: When one sub-population has higher infection rate, 1) the position of the peak moves to earlier times compared to baseline, 2) infection rate follows first the higher infection rate, then resembles more the lower infection rate; it is not easy to infer what is exactly the rate. Recovery rate constant.
- Case 3: When one sub-population has lower recovery rate, 1) position of the peak remains the same, 2) recovery rate is simply the average.
- Case 4: Last case is the combination of both cases, due to assumption that these sub-populations do not interact.
So far, I intentionally compared two populations, where one was always in worse situation compared to the other in terms of infection and recovery rates. This is a fair assumption as most of the times those who get more infected, also recover slower. So case 4, is the most accurate representation of poverty example, if poor and wealthy do not interact with each other. Due to independence of two populations, conclusions drawn from case 2 and 3 can be combined. Doing this would show that case 4 is the most important outcome of this analysis.
Take-away message: When measurements are done on a population with two different sub-populations, the outcomes will reflect different dynamics in a given time. Most importantly: Position of the peak move and infection rate may increase or decrease over time, simply due to mixed measurement.
A social note: From social perspective, I would like to note that this crisis is making us much more aware of how social structures are built, where the boundaries of social strata are, and how capital is distributed among them. In the upcoming months, people will have to consider in a much deeper way what these salient features of our society mean for them; we should expect a new society with a new mindset after this pandemic ends. See for example the Tweet by author and doctor Rachel Clarke:
I do not believe that it was “revealed”, as I believe nobody would admit that they were not key workers. However, the current social system allowed people to pay less attention to how important they are and move on with life. In a non-crisis period, people would neither question why the system is sustained nor how. If you ask, they would admit, but not spend their everyday lives wondering about it; this is a psychological phenomenon. Now, in a crisis, we are worried about the very basic things like hygiene, healthcare, food supply, internet etc. This corresponds to Weick et al.’s  notion of period of arousal, where people experience more emotions than usual, and start paying attention to those who were left unattended all along:
If emotion is restricted to events that are accompanied by autonomic nervous system arousal (Berscheid and Ammazzalorso, 2003, p. 312; Schachter and Singer, 1962), if the detection of discrepancy provides the occasion for arousal (Mandler, 1997), and if arousal combines with a positive or negative valenced cognitive evaluation of a situation (e.g. a threat to well-being or an opportunity to enhance well-being), then sensemaking in organizations will often occur amidst intense emotional experience.
In this analysis, I assumed that resource scarcity (such as labor, skill and capital) simply exists. More complicated models look at how progress of an epidemic also impacts resources, introducing complex dynamics. For example, check out the following work on impact of economic resource scarcity on how to deal with epidemics:
Disease-induced resource constraints can trigger explosive epidemics
Advances in mathematical epidemiology have led to a better understanding of the risks posed by epidemic spreading and…
In this post, I tried to offer some social and technical perspectives on the evolution of Covid-19 pandemic. First, we started with elaboration on control systems theory, which roughly consists of three main steps: Measurement, Logical Operations, Optimal Action. Here, we looked at several ways of having mismeasurements, hence being informed inaccurately, which clearly has a cascading impact on the next steps of logical operations and optimal action. While I made some very crude assumptions to facilitate illustrations, we already filtered out some simple — even to some extent obvious — insights which are easy to forget when facing great amount news and debates everyday:
- Measurement device inaccuracy may inflate or deflate case numbers.
- Temporal limitations to testing (e.g. how many people can get tested per day) will misinform on total number of cases, but may still inform correctly regarding rate of spread.
- Spatial (or social) limitations to testing (e.g. geographic accessibility and affordability for some sub-populations to get tested) will impact the position of the peak and the measured rate of spread.
These insights should be kept in our minds, when we are looking at variety of plots and hearing new numbers everyday. Let’s not only take them in by “after having added a grain of salt”, but also after having added a grain of knowledge.
Meanwhile, I also tried to offer some references to scholarly articles or sources. These not only ground our analogy of “thermostat” in approaching the pandemic, but hopefully raises interest in some of the readers. In short, these remarks were:
- The homeostatic systems, performance feedback and aspiration levels theory, and system dynamics approaches to social systems including organizations relate to our analogy.
- How people assign their observations into buckets, namely assign entities into categories, have direct impact on measurement errors, as much as technological errors. For those familiar with artificial intelligence, this is the fundamental problem of feature engineering, introducing errors already at measurement level to very powerful machines.
- Finally, the pandemic crisis makes so many taken-for-granted aspects of non-crisis status-quo once again questionable; it forces many people to revise their basic assumptions, which sometimes may have very deep consequences for social dynamics as a whole. See  for a linguistic discussion on this.
In the next post, we will look at how aggregation, operations of putting data together, may impact the information passed on to decision-makers, whether politicians or individuals.
 Stinchcombe, A.L., 1987. Constructing social theories. University of Chicago Press.
 Greve, H.R., 2003. Organizational learning from performance feedback: A behavioral perspective on innovation and change. Cambridge University Press.
 Starbuck, W.H., 1963. Level of aspiration theory and economic behavior. Behavioral Science, 8(2), pp.128–136.
 Rahmandad, H., Repenning, N. and Sterman, J., 2009. Effects of feedback delay on learning. System Dynamics Review, 25(4), pp.309–338.
 Rosch, E. and Mervis, C.B., 1975. Family resemblances: Studies in the internal structure of categories. Cognitive psychology, 7(4), pp.573–605.
 Murphy, G.L., 2002. The big book of concepts.
 Hannan, M.T., Le Mens, G., Hsu, G., Kovács, B., Negro, G., Pólos, L., Pontikes, E. and Sharkey, A.J., 2019. Concepts and categories: Foundations for sociological and cultural analysis. Columbia University Press.
 Pearl, J. and Mackenzie, D., 2018. The book of why: the new science of cause and effect. Basic Books.
 Barsalou, L. and Hale, C., 1993. Components of conceptual representation. From feature lists to recursive frames.
 Weick, K.E., Sutcliffe, K.M. and Obstfeld, D., 2005. Organizing and the process of sensemaking. Organization science, 16(4), pp.409–421.
 Toulmin, S.E., Rieke, R.D. and Janik, A., 1984. An introduction to reasoning (No. Sirsi) i9780024211606).