Philip E. Tetlock on Forecasting and Foraging as a Fox (Ep. 93)
Accuracy is only one of the things we want from forecasters, says Philip Tetlock, a professor at the University of Pennsylvania and co-author of Superforecasting: The Art and Science of Prediction. People also look to forecasters for ideological assurance, entertainment, and to minimize regret–such as that caused by not taking a global pandemic seriously enough. The best forecasters aren’t just intelligent, but fox-like integrative thinkers capable of navigating values that are conflicting or in tension.
He joined Tyler to discuss whether the world as a whole is becoming harder to predict, whether Goldman Sachs traders can beat forecasters, what inferences we can draw from analyzing the speech of politicians, the importance of interdisciplinary teams, the qualities he looks for in leaders, the reasons he’s skeptical machine learning will outcompete his research team, the year he thinks the ascent of the West became inevitable, how research on counterfactuals can be applied to modern debates, why people with second cultures tend to make better forecasters, how to become more fox-like, and more.
Listen to the full conversation
You can also watch a video of the conversation here.
Read the full transcript
TYLER COWEN: Today, I am speaking to Philip Tetlock, who, quite simply, is one of the greatest social scientists in the world. Up until now, we’ve been doing Conversations with Tyler face-to-face. But for obvious reasons, Phillip is in Philadelphia. He teaches at University of Pennsylvania. And I’m here in Arlington, Virginia.
Let’s just jump right into it. First question, Philip: with our forecasters, do we want accuracy or do we want them to be a kind of portfolio to make us more aware of extreme events and possibilities?
PHILIP TETLOCK: We want a lot of things from our forecasters, and accuracy is often not the first thing. I think that we look to forecasters for ideological reassurance, we look to forecasters for entertainment, and we look to forecasters for minimizing regret functions of various sorts. We would really regret not having anticipated X, Y, or Z, so we want to pump up the probabilities of those things.
COWEN: But if we take, say, the coronavirus — if we had had a few more extreme nuts who were maybe wrong most of the time but insisting that we needed to fear the next pandemic, wouldn’t we have been better off with that kind of portfolio, and thus, we don’t actually want more accuracy from our forecasters?
TETLOCK: Well, in some sense we already did have that portfolio. It was a mainstream position among epidemiologists for the last 20 years or so that you had a recipe for disaster. David Epstein, the guy who recently wrote Range, a very interesting guy — you may have had him on your show, I don’t know — but he recently quoted himself from a 2007 newsletter that he wrote, and he said something like, “The presence of a large reservoir of SARS-like viruses among horseshoe bats, combined with a culture of eating exotic meat, is a time bomb.”
COWEN: But what’s the problem —
TETLOCK: That was in microbiology books. In the first half decade of the 21st century, it was common knowledge. And indeed, even before SARS 1, even before the first SARS outbreak — some epidemiologists prefer to call COVID-19, SARS 2 — but even before SARS 1, epidemiologists were acutely aware of this, so it’s not as though we didn’t have it in our portfolio. We did.
COWEN: So those forecasters maybe weren’t entertaining enough? Isn’t then, the margin we want to work on to make our better forecasters more entertaining and not more accurate. Yes, no?
TETLOCK: [laughs] Well, the Signal-to-noise ratio isn’t going to be great. I can assure you of that because there are plenty of people who are naturally more entertaining than epidemiologists.
COWEN: But maybe the whole portfolio needs to be more vivid rather than trying to fine-tune the accuracy of particular parts of it. Right?
TETLOCK: We have lots of people competing in the marketplace of ideas for attention, and that’s a hard competition for scientists to beat.
COWEN: What do you think of the argument that science only exists at all Because most scientists are overconfident that if they were rational Bayesians, they would just latch onto the opinions of the smartest and best-trained people before them. That there’s only progress precisely because people are making forecasting mistakes.
TETLOCK: [laughs] Right. you’ll hear versions to that argument, I suppose, on Wall Street as well.
TETLOCK: There’s a great deal of truth to it. I certainly have been guilty of overconfidence at many junctures in my career, thinking I’m going to be able to take on things that looked impossible and often turned out to be impossible. Most projects that most scientists embark on, I think, don’t succeed. It does take a certain amount of quasi-irrational persistence.
COWEN: As with Columbus, right? Or the founding of the United States arguably was irrational to break away from the British Empire, right? Which was doing pretty well back then.
TETLOCK: It seemed like a risk-seeking move.
COWEN: But say I set up an alternate research program, and I sought to take forecasters and (A) make them more entertaining, and (B) maybe I’d give them uppers, so they were more overconfident. Would that do the world good?
TETLOCK: Well, you could certainly have induced the epidemiologists who are worried about the horseshoe bats in central China or other possible sources of zoonotic viruses. You could certainly have induced them to pump up their probabilities. And you would, of course, have started to run into a problem of crying wolf. They’d been saying there’s a 30, 40, 50 percent chance of a viral leap into human beings each year, and it didn’t happen. Didn’t happen. Didn’t happen. You get a crying-wolf effect, right?
On financial markets and super predictors
COWEN: Right. How do you think about financial markets in relation to your work on super predictors? Are financial markets, in essence, super predictors to begin with? Or can super predictors, on average, beat financial markets?
TETLOCK: Oh, boy! [laughs] We play with prediction markets with the work with the intelligence community, going all the way back to Admiral Poindexter and the original DARPA effort to launch prediction markets inside the intelligence community. I think your colleague, Robin Hanson, was involved with some of that work around 2000. I would say around 2003, 2004 — in that range.
We’ve also been working with the prediction markets in parallel with forecasting tournaments, and there are pros and cons to each method of eliciting judgments.
But the IC, the intelligence community, doesn’t let us make those markets deep and liquid the way they are on Wall Street. People are essentially competing for reputational points the way they are in forecasting tournaments, and monetary prizes are either small or nonexistent. So most economists, I think, would not consider that to be a very robust test of the efficacy of prediction markets.
And when prediction markets fall short — and they do typically not perform quite as well as forecasting tournaments — when they fall short, it’s hardly a decisive rebuke of the market mechanism for eliciting forecast.
There are lots of very powerful institutional actors like Goldman Sachs and so forth that are continually trying to do exactly what you describe. It’s an ongoing process. You’re an economist. I don’t need to tell you that.
COWEN: Sure. There are implicit prediction markets in, say, coronavirus, say, prices of airline stocks, right? And those are very liquid. They’ve been very thickly traded lately.
TETLOCK: They have indeed.
COWEN: If you took your 10 best superforecasters and brought them into the hedge fund people at Goldman Sachs and you all sat down together, who would be teaching whom?
TETLOCK: [laughs] It’s an interesting experiment. My project manager from the first set of forecasting tournaments, Terry Murray, founded a company, Good Judgment Incorporated, which does things like that. So that’s a proprietary venture. You’d probably want to talk to Terry about how successful or not successful they’ve been in doing that. I think they’ve had some success.
It’s extremely hard to do that. It’s nontrivial. I think there’s a good deal of similarity in the cognitive ability, cognitive-style profiles of superforecasters and the kinds of people you see on the staffs of Goldman Sachs. It would be a tight race.
COWEN: What about the sports betting market, do you think there are inefficiencies in that because they don’t have enough superforecasters?
TETLOCK: I’m not an expert on sports betting. Better to talk to Nate Silver about that.
COWEN: Let me put the question more generally. There are many markets out there which predict something. Sports betting markets are simply the most obviously, most explicit about prediction. So if there was something the world didn’t know about prediction already, those markets should be inefficient. Yes or no?
TETLOCK: Why would you think the answer would be yes?
COWEN: Well, let’s say that people favored the home team too much. Too many people might bet on the New York teams, the Los Angeles teams, and then the odds would be skewed. So if there’s a bias in people without superforecasting techniques, we would expect sports odds to somehow be off. Or at least they would have been off before your work was published.
TETLOCK: Well, our arbitrage predated superforecasting.
COWEN: But isn’t arbitrage itself superforecasting? People are arbitraging —
TETLOCK: Yes, in a form.
COWEN: — on the basis of some set of information.
TETLOCK: I think that’s fair, yeah.
COWEN: So all these markets out there — do you think they are, without you, already the best available superforecasters?
TETLOCK: The term best is a term I’m just not comfortable with. I rather doubt that they’re at the optimal forecasting frontier at the moment. But they’re often probably fairly close. And is there room to incentivize people how to predict the market? Well, that’s one of those paradoxes that economists have written about, right?
COWEN: When people predict out to many decimal places, do you think that’s absurd? Or do you think it’s useful?
TETLOCK: What’s the optimal level of granularity for different categories of forecasts? For the kinds of things we were looking at in the IARPA original forecasting tournaments with geopolitical events, like how long the Syrian Civil War would last, or what would Russia do in Eastern Ukraine, or things of that sort — yes, it would be absurd to go to three or four decimals points.
The National Intelligence Council, which synthesizes a lot of intelligence analysis — originally, they only distinguished five degrees of uncertainty, and they didn’t put numbers on it. More recently, they’ve moved to seven degrees of uncertainty, and they do put numerical ranges on, so “somewhat likely” represents a certain probability range.
Now, in our work, we’ve explored how granular the best forecasters are. Doing various rounding experiments, we round a forecast off to the nearest tenth, that kind of thing. If it’s pseudo precision — if when they adjust it and they move from 0.6 to 0.65, for example — if on average, that doesn’t improve their accuracy, we would conclude that they can’t achieve that level of granularity.
Our best statistical estimates are that forecasters, for the types of questions the intelligence community often poses, can distinguish between 10 and 15 degrees of uncertainty, which is considerably more than the seven they think they can now, a lot more than the five they thought they used to be able to distinguish.
But how useful it is to be able to distinguish varying degrees of uncertainty is going to hinge on the kind of game you’re playing. If it’s poker, you might well want someone who’s very adept at distinguishing things to, say, three decimal points.
COWEN: If you could take just a bit of time away from your research and play in your own tournaments, are you as good as your own best superforecasters?
TETLOCK: I don’t think so. I don’t think I have the patience or the temperament for doing it. I did give it a try in the second year of the first set of forecasting tournaments back in 2012, and I monitored the aggregates. We had an aggregation algorithm that was performing very well at the time, and it was outperforming 99.8 percent of the forecasters from whom the composite was derived.
If I simply had predicted what the composite said at each point in time in that tournament, I would have been a super superforecaster. I would have been better than 99.8 percent of the superforecasters. So, even though I knew that it was unlikely that I could outperform the composite, I did research some questions where I thought the composite was excessively aggressive, and I tried to second guess it.
The net result of my efforts — instead of finishing in the top 0.02 percent or whatever, I think I finished in the middle of the superforecaster pack. That doesn’t mean I’m a superforecaster. It just means that when I tried to make a forecast better than the composite, I degraded the accuracy significantly.
COWEN: But what do you think is the kind of patience you’re lacking? Because if I look at your career, you’ve been working on these databases on this topic for what? Over 30 years. That’s incredible patience, right? More patience than most of your superforecasters have shown. Is there some dis-aggregated notion of patience where they have it and you don’t?
TETLOCK: [laughs] Yeah, they have a skill set. In the most recent tournaments, we’ve been working on with them, this becomes even more evident — their willingness to delve into the details of really pretty obscure problems for very minimal compensation is quite extraordinary. They are intrinsically cognitively motivated in a way that is quite remarkable. How am I different from that?
I guess I have a little bit of attention deficit disorder, and my attention tends to roam. I’ve not just worked on forecasting tournaments. I’ve been fairly persistent in pursuing this topic since the mid 1980s. Even before Gorbachev became general party secretary, I was doing a little bit of this. But I’ve been doing a lot of other things as well on the side. My attention tends to roam. I’m interested in taboo tradeoffs. I’m interested in accountability. There’re various things I’ve studied that don’t quite fall in this rubric.
COWEN: Doesn’t that make you more of a fox though? You know something about many different areas. I could ask you about antebellum American discourse before the Civil War, and you would know who had the smart arguments and who didn’t. Right?
TETLOCK: Well, I would know who has arguments to take the more integratively complex forms — on the one hand, on the other hand, and then synthesis. Whether they want to consider those arguments smarter or not is another matter. But yes, I suppose that’s fair. I’ve always resonated a little more to the foxes than to the hedgehogs, but I think when you look at the great achievements in science, they often come from hedgehogs.
On arguments about COVID-19
COWEN: If you look today at the ongoing debates about coronavirus and what will happen — and I mean now the debates amongst the smart people, the people you respect — what is the mistake you see them making, the biggest mistake?
TETLOCK: Oh, you want me to be an amateur epidemiologist here.
COWEN: No, mistaken reasoning. You don’t have to give your numerical estimate. Procedurally, what are they not getting right?
TETLOCK: Well, is it a mistake if you’re a public health expert who feels that one mistake is much worse than the other. It’s much better to overestimate the threat of the virus than to underestimate it because you have to influence public opinion and public behavior. There, you’re not forecasting. You’re engaged in manipulation, social influence.
It comes back to your original question about what do we want from our forecasters? And accuracy is only one of the things we want from them. We look to forecasters for a lot of things, be it to inspire confidence, inspire fear, and so forth.
COWEN: If we’re trying to estimate how much people cut back on their risk-taking behavior because they’re afraid of the virus, are the group of people best suited to do that epidemiologists, other social scientists, or your superforecasters? Maybe even economists, right? We study elasticities. Why should it be the epidemiologists?
TETLOCK: I think you’d want an interdisciplinary team. Diversity is one of these words that’s been reduced to a cliché, but we have found in our work that cognitive diversity helps, and it helps in certain quite well-defined ways.
If you want to create a composite that out-predicts the vast majority of the superforecasters, a good way to do it is not only to take the most recent forecast of the best forecasters in the domain, but it’s also to extremize that forecast to the degree that people who normally disagree, agree with each other. And when you have convergence among diverse observers, that’s a signal that the weighted average composite is probably too conservative and you should extremize.
COWEN: Does the diverse team have a CEO, someone in charge?
TETLOCK: Not in this case, no. That’s done purely statistically.
COWEN: But would you put someone in charge? And maybe the person in charge would implement that statistical algorithm, right? But who is the person you would put in charge of the team? An epidemiologist? Yourself? Your best superforecaster? Bill Gates?
TETLOCK: That’s a matter of managerial skill, I would say. Going back to one of my old dissertation advisors at Yale, 40 plus years ago, Irv Janis on Groupthink, I would pick a leader who knows how to shut up, and not reveal opinions at the beginning of the meeting, and knows how to listen.
I would pick a leader who knows how to shut up, and not reveal opinions at the beginning of the meeting, and knows how to listen.
COWEN: And which group of people do you think that best describes?
TETLOCK: I think a lot of good executives have the intuition that you get more out of a team of forecasters or problem solvers if you elicit independent judgments initially that are uncontaminated by conformity pressure, and then you create an environment in which ideas can be freely critiqued before lifting the veil of anonymity and letting people see who’s taking which positions.
COWEN: Do you think having machine learning and artificial intelligence has made us much better at forecasting things right now? Social events —
TETLOCK: Certain categories of things for sure.
COWEN: Sure, at the micro level, but social events: Whether there’ll be a recession. How many people will die from the coronavirus? Will we settle Mars?
TETLOCK: IARPA just ran a forecasting tournament called Hybrid Forecasting Competition in which they pitted algorithmic approaches and human approaches and hybrid approaches against each other. I should let IARPA speak for itself about how well its programs work or don’t work.
I don’t think there’s a lot of evidence to support the claim that machine intelligence is well equipped to take on the sorts of problems that the intelligence community wanted to have answered when it runs the forecasting tournaments it’s been running with our research team.
The things like the Syrian Civil War, Russia/Ukraine, settlement on Mars — these are events for which base rates are elusive. It’s not like you’re screening credit card applicants for Visa. Machine intelligence is going to dominate human intelligence totally. Machine intelligence dominates humans in Go and Chess. It may not dominate humans in poker.
I don’t know that the state of the art is quite there yet, but it’s in their StarCraft or whatever the next thing that Dennis Hassabis is going to conquer. No, I don’t see evidence that those approaches work in the domains that we study with the intelligence community.
COWEN: Do you think the hybrid man-machine approaches are overrated?
TETLOCK: No, it’s very domain specific. It sounds like a great idea. Who could be against it? But the devil lurks in the details, and it doesn’t deliver as automatically as you might hope. It’s trench warfare here.
COWEN: Do you think the world as a whole is becoming easier to predict or harder to predict? And again, I mean social events.
TETLOCK: I’m not sure there’s a definite trend one way or the other. You hear a lot of talk, a lot of claims that there are, but if you look back on the 20th century, there certainly were lots of major pockets of unpredictability. It’s not clear.
COWEN: What if I say — again, current events aside — but it seems easier to predict. There hasn’t been a world war since 1945. There’s been steady economic growth in most parts of the world. More peace. Isn’t that easier to predict? You just predict 2 to 4 percent global economic growth and you pick up a fair amount of what’s happened since 1950.
TETLOCK: Indeed. So, simple extrapolation algorithms, historically, are hard to beat. We’re running a COVID-19 mini forecasting tournament right now, and they’re proving to be hard to beat. The skill, of course, is when to alter the trend, whether to accelerate it or decelerate it, or to change direction.
COWEN: And do you have a personal intuition on that?
TETLOCK: I think humans have been repeatedly humbled in competitions against simple statistical algorithms, going back to Paul Meehl’s famous little book on clinical versus actuarial approaches to predicting in medicine and psychiatry. So, I would say, “Be humble.”
COWEN: There’s some of your early research that, if I read it properly, suggests that making people accountable leads to more evasion and self-deception on their part. Are you worried that your work with pundits, by trying to make them more accountable, will lead to more evasion and self-deception from them? How do you square early Tetlock and mid-period-to-late Tetlock?
TETLOCK: [laughs] It’s actually not too difficult in that particular case. It really depends on the type of accountability. Tournaments create a very stark monistic type of accountability in which one thing and only one thing matters, and that is accuracy. You get no points for being an ideological cheerleader and pumping up the probabilities of things that your team wants to be true or downplaying the probabilities of the things your team doesn’t want to be true. You take a reputational hit. So the incentives are very unusually tightly aligned to favor accuracy.
That’s extremely unusual in the social world. Most forms of accountability occur in organizational settings in which there are lots of distortions at work. And the rational political response for a decision maker located in most accountability matrices and organizations is to engage in some mixture of strategic attitudes shifting toward the views of important others or, as you put it, evasion, procrastination, and so forth.
COWEN: But given that, when a pundit is on, say, the evening news, there’s not a little box at the bottom that gives the Tetlock score of that pundit, correct? So most people don’t know the actual record. Given that, out there somewhere is a measure of how good or bad the pundit is, are you worried that, in a sense, you will make those pundits run further away from objective standards precisely because they do poorly by them?
TETLOCK: I don’t know if we can make them run much further away from objective standards than they already are. [laughs] But that’s a very interesting point about forecasting tournaments. I look at the kinds of people who are attracted to participate in them. At the very outset, I invited lots of big shots to participate in forecasting tournaments, and they’ve repeatedly turned me down.
I had a very interesting correspondence with William Safire in the 1980s about forecasting tournaments. We could talk a little about it later. The upshot of this is that young people who are upwardly mobile see forecasting tournaments as an opportunity to rise. Old people like me and aging baby-boomer types who occupy relatively high status inside organizations see forecasting tournaments as a way to lose.
If I’m a senior analyst inside an intelligence agency, and say I’m on the National Intelligence Council, and I’m an expert on China and the go-to guy for the president on China, and some upstart R&D operation called IARPA says, “Hey, we’re going to run these forecasting tournaments in which we assess how well the analytic community can put probabilities on what Xi Jinping is going to do next.”
And I’ll be on a level playing field, competing against 25-year-olds, and I’m a 65-year-old, how am I likely to react to this proposal, to this new method of doing business? It doesn’t take a lot of empathy or bureaucratic imagination to suppose I’m going to try to nix this thing.
COWEN: Which nation’s government in the world do you think listens to you the most? You may not know, right?
TETLOCK: [laughs] I might actually. I’m not going to say. I suppose that the most prominent political fan I have at the moment is probably one of the senior advisors to Boris Johnson, Dominic Cummings, who recently caused a bit of a stir in the UK by appointing a superforecaster who had written some blogs that people interpreted as misogynist or racist or fascist or eugenicist or I don’t know, some mixture of all those things.
I think his name was Andrew Sabisky. He was a young man, 24, 25, the kind of person . . . young people are attracted, as I said before, to forecasting tournaments. It’s a fast track toward upward mobility. You have all the high-status people making vague verbiage forecasts, people like me.
COWEN: But do you think Cummings actually is influenced by you? Because as I understand what he’s doing, correctly or not, he thinks he knows a bunch of things that other people do not. And that seems somewhat non-Tetlockian, right? Maybe, for part of his portfolio of ideological armor, maybe he’s actually very non-Tetlockian, and it’s the people in Singapore who are your true fans.
TETLOCK: Well, there are some people in Singapore, too. It’s interesting you should mention that. Interesting you bring that one up.
Mr. Cummings and Mr. Gove both separately brought up my work at various points during the Brexit debate. And Michael Gove, at least in the UK, famously said that Britain has had enough of experts. I don’t know if you remember that particular quote.
COWEN: Of course.
TETLOCK: He evoked a support at least for that position, Expert Political Judgment, in which portions of that book compare subject matter experts to minimalist statistical baselines like extrapolation. Can you predict simple extrapolation algorithms? And the answer was often no. Gove was raising the point that, where do these guys get off making these confident predictions about the consequences of Brexit? And the best empirical evidence would suggest that probably not materially more accurate than simple extrapolation algorithms.
It was brought up for a political reason. He had a political point to make. And Dominic Cummings had a pleasant political point to make as well. I feel that he fears that parts of the civil service are hostile toward Brexit and want to undermine the Boris Johnson administration objectives.
I think this comes to something deeper now. It’s not just about the UK. It’s about the intellectual fissure that exists between social science and conservatives, that most social scientists are liberal, and conservatives are wary of advice from social scientists. I think we may have partly paid some price for that in this epidemiological . . . to put it kindly, the slowness of the Trump administration response to COVID-19.
COWEN: Yes. You brought up Britain. If we look back at speeches in the British House of Commons, who is giving the most cognitively complex speeches?
TETLOCK: Bob Putnam collected those data that I reported in that study in 1984. Bob Putnam collected original data and he reported in a book, Beliefs of Politicians, published in the 1970s. I think those data were based on interviews with members of the British House of Commons. They were not speeches. They were interviews, confidential interviews that Bob Putnam got access to and put a lot of work into obtaining.
He shared them with me, and I used them as grist for a small research program I was running in the 1980s on cognitive style and political ideology, trying to tease apart the rigidity of the right versus the ideologue hypotheses.
COWEN: And who sounds the smartest from that period?
TETLOCK: In that period it was a mixture of moderate labor, moderate conservative. It was the centrists that did better. It was slightly left shifted.
COWEN: Do you think we can draw any inferences from that about politics? Should we have more faith in the people who sound smarter or not at all?
TETLOCK: Well, I think that particular measure in integrative complexity does have some correlation with forecasting accuracy.
But I think you’re picking up something more than just forecasting accuracy. You’re picking up what I call value pluralism. You’re picking up a tendency to endorse values that are often in conflict with each other. So the more frequently that you, as a political thinker, confront cognitive dissonance between your values, your value orientations, the more pressured you are to engage in integratively complex synthetic thinking.
When your values are more lopsided, it’s easier to engage in what we call simpler modes of cognitive dissonance reduction, like denial or bolstering and spreading of the alternatives. So downplay one value, push up the other value, and make your life stress-free.
But some ideological positions at some points in history require more tolerance for dissonance and some people are more inclined to fill those roles.
COWEN: And you think those politicians are also likely to be better forecasters?
TETLOCK: We’re not talking about huge effect sizes here or integrative complexity. I think that fluid intelligence is probably a more powerful predictor. A combination of fluid intelligence and integrative complexity, I think, does boost up forecasting accuracy, yes.
COWEN: And if you were running the CIA, those people who are pluralistic in the manner you just outlined, would you promote them more rapidly?
TETLOCK: Now of course, the CIA, bear in mind is, by statute, supposed to be value-neutral. They’re supposed to be feeding impartial, apolitical advice. Just the —
COWEN: But no one believes that, right?
TETLOCK: Well, that is supposed to be the division of labor here. I think one reason they may be interested in forecasting tournaments is because forecasting tournaments incentivize people to do one thing and only one thing, and that’s accuracy. You don’t get points for skewing your judgements toward your favorite cause. You take a hit in the long term by doing that.
On reforming the CIA
COWEN: If you were in charge of the CIA and had a free hand, how would you reform it?
TETLOCK: Whoa, boy! [laughs] Well, there’s a long history to efforts to reform the CIA. It was founded in 1947, and there have been various efforts since then. People have been unhappy with the CIA for many reasons over time, Vietnam being a big one. But there are lots of other reasons that people have expressed unhappiness. Both liberals and conservatives have, at various points, been unhappy with the performance of the intelligence community.
In 2001, it came to a crisis point, I think, and then there was a commission to reform intelligence analysis. After the WMD fiasco in Iraq, the pressure grew even more. One reason why we’re even talking right now today, is that the intelligence community was forced, essentially by the recommendations of the reform commission, to take keeping score more seriously, to take training for accuracy and monitoring of accuracy more seriously.
That’s when they created the Intelligence Advanced Research Projects Activity, which is the R&D branch housed within the Office of the Director of National Intelligence. Its job is to support innovative research that will improve the quality of intelligence analysis. That can be defined in various ways, but accuracy is certainly one very important component of that. But it’s not accuracy with a liberal skew or a conservative skew. It’s supposed to be just plain, “just the facts, ma’am” accuracy.
COWEN: To predict. That reflects values, obviously.
Which questions they choose to predict — that reflects values. Whenever a bureaucracy tells me they’re value-free, I start getting more suspicious very quickly, right? I never believe them. It might be okay for them not to be value-free, but the cynical response is the correct one here.
TETLOCK: Well indeed. The old expression, “There’s no view from nowhere.” There is no such thing as pure value neutrality. That doesn’t mean it’s not something worth aspiring to. But you’re right, the values . . . Even if you had a perfectly objective forecasting tournament system, if you had people that are generating the questions promoting a political agenda, you could skew the results. I think that’s one of your points, right?
On things under- and overrated
COWEN: Now in the middle of all of these discourses, we have a segment called overrated versus underrated. I’ll toss out a few names, ideas, and you tell me if you think they’re overrated or underrated. How’s that?
COWEN: Philadelphia, the city of Philadelphia — overrated or underrated?
TETLOCK: I got endless grief when my wife and I decided to leave Berkeley and move to Philadelphia. People thought that we were borderline insane. But we left for very personal reasons, and without going into what those were, I would say Philadelphia has been a moderately pleasant surprise. It’s a city that has many, many problems. But it’s not as bad as the people in Northern California thought it was.
TETLOCK: I read a little bit of Tolstoy, and I’ve seen a number of films, and I know some of the shorthand versions, and I know that Isaiah Berlin had a hell of a time classifying Tolstoy as a hedgehog or a fox. But I think I’ll pass on that.
COWEN: John Cleese. He’s commented on you. You’re allowed to comment on him.
TETLOCK: I had a wonderful time as a kid watching Monty Python and also Fawlty Towers. I’ve enjoyed his comedy. I haven’t followed him since then, but when I was younger, I thought he was absolutely hilarious and brilliant. And I appreciate the flattering things he said about my work.
COWEN: The television show, The Sopranos.
TETLOCK: I fell for it. James Gandolfini — I fell in love with his performances. One of the last things he did is he played Leon Panetta in Zero Dark Thirty. And there he is, eliciting forecasts from people in the CIA about whether Osama bin Laden is in that compound in Abbottabad. Is he there or isn’t he effing there? Great, wonderful. I never knew him, but I think very highly of his work.
COWEN: The threat of terrorism? Do we overrate it or underrate it in the United States?
TETLOCK: That’s a very difficult question because of the tail risk aspect to it. If you look at the number of people who’ve died from terrorism versus other causes, it would seem that the amount of money we spend on suppressing terrorism would be disproportionate. But the tail risk complicates that a lot.
COWEN: What is your favorite movie?
TETLOCK: I don’t have a hierarchy like that, I’m sorry.
COWEN: Nothing comes to mind? What’s the movie you’ve seen the greatest number of times, that you can count?
TETLOCK: I don’t know if I watch movies all that often over and over. I did see myself coming back — just very recently, last week in the quarantine period — to Westworld. It’s not a movie. It’s a series, of course. I think the first two seasons are quite brilliant.
COWEN: On historical counterfactuals, by what year do you think the ascent of the West was more or less inevitable?
TETLOCK: Well, I have inside information here. We did a survey of some very prominent historians. We reported it in that book, Unmaking the West. We put a part of it anyway, also in an article in American Political Science Review. If you looked at just the unweighted average of judgments from the median, I think was probably around 1730, 1740.
COWEN: And you think after that, it was not very contingent?
TETLOCK: I’m not a historian of the West. What do I really know about that? I’m simply reporting the news here.
COWEN: And is there any great hinge of contingency that you think about looking backwards? Like, “Oh my goodness, if there hadn’t been a Reformation.” Or “If there hadn’t been a Council of Trent.” Or what?
COWEN: Because it got people thinking more rationally.
COWEN: More in terms of science.
TETLOCK: And that had huge spillover effects. Now, could there have been an enlightenment in China? Could the Chinese have created certain types of technologies without science? What if they hadn’t scuttled their navy in around 1400? There are all those sorts of counterfactuals.
COWEN: And how necessary or contingent do you think it is that we keep on thinking in Enlightenment-like terms? Is it once you’re locked into it, it keeps on going? Or is it like good government that you have to renew it every generation or two?
TETLOCK: Well, I see the work I’m doing as very much in the spirit of the Enlightenment. Public transparent standards of evidence for judging subject matter expertise. I think one of the great challenges of our time is striking the right balance between democracy and technocracy. And I think the fissures that have emerged . . .
It’s not just conservatives are skeptical of social science — and it’s currently in some parts of biological science too, which I think’s very unfortunate — but you see some reflexive skepticism towards science on the left as well. So, how do you manage the relation between small-d democrats and technocrats in a society in which expert guidance is increasingly crucial?
COWEN: What are you learning from playing the game Civilization V, or at least watching others do so?
TETLOCK: That’s a good example actually, of why I’m not a superforecaster. I don’t play Civilization V. But Civilization V was one of the simulations that IARPA chose to feature in its counterfactual forecasting tournaments under the rubric of FOCUS. And if any of your listeners are interested in signing up to be forecasters for FOCUS. We still have one more round, round 5, and we will be recruiting people.
But again, you see, it reflects my temperament. I don’t have the patience for a game like Civ V. Forecasting is inevitably a mixture of fluid and crystallized intelligence, and you have to invest a lot of energy into mastering a game like Civilization V. And I suppose, as people get older, they may become less likely to make those kinds of cognitive investments. It becomes more and more essential as I get older, I think, to focus on the things where I have a real comparative advantage.
COWEN: The best chess players are all young, right? This we know. There’s clear data.
TETLOCK: Well, yeah. It’s interesting the domains in which child prodigies emerge: music and chess and math.
COWEN: If we take supercounterfactualists and superforecasters, do those two groups basically overlap? Or how do they differ?
TETLOCK: I think they have to be rather intimately connected, although disentangling this one is going to be really, really hard. It’s one of the things I do want to dedicate a few years of my life to doing.
Now obviously, when you say someone’s a good counterfactualiser, some people shrug their shoulders, and they say, “Well, how are you possibly going to know whether you would’ve gotten to undo the assassination of the archduke in 1914? Do you undo World War I, undo Hitler? You undo World War II. You make Kennedy grouchier during the Cuban Missile Crisis, trigger World War III.”
You’ve got these sorts of arguments that are essentially unresolvable. You can’t rerun history. You can rerun Civilization V.
But you can’t rerun history. So that makes counterfactuals a place where ideologues can retreat. They can make up the data, make up whatever facts they want to justify pretty much whatever. No matter how bad the war in Iraq went, you can always argue that things would have been worse if Saddam Hussein had remained in power. So you have these factual and counterfactual reference points that people use in debates implicitly to make rhetorical points.
Part of what attracted to me to counterfactuals was (A) how important they are in drawing any lessons from history, (B) how important they are in policy arguments, and © how unresolvable they are.
One of the things we’re hoping to do in the FOCUS program is to develop some objective metrics for identifying people and methods of generating probabilities that produce superior counterfactual forecasts in simulated worlds in which you can rerun history and assess what the probability distributions of possible wars are. Well, it turns out, you get World War I 37 percent of the time even if you undo the assassination of the archduke. And you get something like World War II . . . You see where we’re going.
So we’re hoping that one result of FOCUS will be to help us identify people and methods that generate superior counterfactual forecasts in domains where there is a ground truth. The next task will be to connect superior performance in simulated worlds to superior performance in the actual world.
And this is where things get tricky, of course, because in the actual world, we don’t have the ground truth. So if I ask you a counterfactual question of the form, if NATO hadn’t expanded eastward as far as it did in 2004 under the Baltics, NATO-Russia relations would be considerably friendlier than they are now.
COWEN: Are chess players good forecasters?
TETLOCK: But do you want to go with this for a second? No, I don’t want to stop on that one.
COWEN: Yeah, keep on going.
TETLOCK: Now, we can’t rerun history. We don’t know how relations with Russia would be if NATO hadn’t gone into the Baltics. “Russia would have gobbled up the Baltics.” “Maybe Russia would be friendly or feel less threatened.” You have people who have more hawkish or more dovish mental models of Russia, and those mental models predispose them to give you certain canned, almost ideologically reflective answers to those counterfactuals, right?
TETLOCK: But you can measure what people’s beliefs are in the counterfactual. And then you can measure people’s beliefs about conditional forecasts that are logically connected to the counterfactuals in a kind of a Bayesean entrance network. If I know that you think that the Russians would be every bit as nasty and snarly, even if we hadn’t moved into the Baltic — they might have even been nastier — it’s probably a fair bet that you’re also likely to think it’s a good idea to increase arm sales to the Ukraine, ratchet up sanctions on Putin’s cronies, and so forth.
So we can identify the counterfactual belief correlates to more or less accurate conditional forecasting. And in that sense, you can indirectly validate or invalidate, you can render more or less plausible certain counterfactual beliefs. That’s the longer-term objective of this research program. We’re not playing Civilization V for the sake of getting better at Civilization V. The ultimate goal is to link the sophistication of counterfactual reasoning about the past to the subtlety in the accuracy of conditional forecast going into the future.
Another thing you should observe by the way — if people are becoming better counterfactual reasoners, you should observe less ideological polarization in their counterfactual beliefs. The counterfactual belief should become as ideologically depolarized as conditional forecasts are.
COWEN: Does playing World of Warcraft a lot help you become a better forecaster, or playing chess?
TETLOCK: I don’t have any evidence bearing on either of those things. There’s a third variable problem there, too.
COWEN: But you get used to a test, right? You know if you lose? There’s very little self-deception.
TETLOCK: This is true. The people who do well in these sorts of things often like games like that.
COWEN: Venture capitalists — when they try to spot talent in others, do you interpret their behavior in terms of a superforecaster model? Someone like Peter Thiel — he found Mark Zuckerberg, Reed Hoffman, Elon Musk. He’s a kind of superforecaster. How do you superforecast talent in other people?
TETLOCK: Well, it really helps to be working in an environment in which super bright people are not super rare. It’s very, very hard to forecast, to identify talent when the base rate falls below 1 in 1,000, 1 in 10,000, 1 in 100,000. You’re looking for a needle in a haystack.
But the great advantage that venture capitalists in Silicon Valley have is that the talent pool is relatively rich. There’s quite a few flaky people that come by seeking their money, for sure. But their odds of success are significantly better than they would be if they’re working from a population base rate. And of course, they can tolerate a lot of mistakes because just a few hits will pay for a lot of false positives.
COWEN: But some do much, much better than others, right? Mike Moritz, Peter Thiel.
TETLOCK: Right, the question is, are they doing better because they have better social networks? Or they are doing better, they have better judgment?
COWEN: Superforecasting as a technique for predicting the future — does it also apply to predicting how successful people will be?
TETLOCK: Yes. I think everything we know, from the overlap between superforecasting and intelligence and the overlap between intelligence and success in many different professional lines of work, would suggests that that’s almost certainly true.
COWEN: Who first superforecasted your own major success?
TETLOCK: I don’t know what my first major success was. Some people might challenge that there has been a major success in my career.
COWEN: Who first saw it coming?
TETLOCK: I think probably my advisor — University of British Columbia — Peter Suedfeld who supported me and believed in me when there didn’t seem to be very many good reasons for doing so. I was a confused Canadian kid who was unsure whether to become a lawyer in Canada or go off to Oxford or go to graduate school in the US, who tipped me towards social science and US.
COWEN: And what did he see that other people had not seen?
TETLOCK: God only knows.
COWEN: Well, if anyone would know it’s you, right?
COWEN: You’ve lived with it for some time.
TETLOCK: I think he thought I was probably bright enough to do well, and he probably thought that I was contrarian and weird enough that there was some possibility of doing something distinctively well. Something distinctive and different and doing it well.
COWEN: Do you think academic advisors, in general, today undervalue weird students?
TETLOCK: There’s probably a tendency in that direction, yes. Because weirdness is a big word. Weirdness takes lots of forms that you and I . . . We would not be embarrassed about turning away lots of weird people.
COWEN: Sure. Do you think people who grow up with a second culture are better forecasters?
TETLOCK: Oh, you’re thinking of my work with Carmit Tadmor.
COWEN: Of course.
TETLOCK: Gosh, you’ve really looked into my vita here, right? This is highly unusual.
I think there does seem to be some advantage there, yeah.
COWEN: And where does that come from? How does it work?
TETLOCK: It ties into accountability. Accountability to conflicting audiences and value pluralism. You have a richer internal dialogue. You learn to balance conflicting perspectives more. You have to be a better perspective taker. Perspective taking’s a very important part of superforecasting, too.
COWEN: How do we create more Nate Silvers and Philip Tetlocks? What should we change in the world to get more of you?
TETLOCK: Well, I think Nate and I are probably very different creatures.
COWEN: Sure, but we want more of you both, right? What should we do?
TETLOCK: One thing that would be useful . . . There is this tendency for training in universities to become hyperprofessionalized and compartmentalized. So it is harder for people who have weird interests that straddle, say, psychology and organizational science and political science and law and history. The people who have weird sets of interests — it’s hard for them to get traction in the current career environment.
Certainly, in my home discipline of psychology, it had this kind of rampant publication inflation in which PhD students would be very lucky to get a job. You’d have to have a ridiculous number of publications and some of them in Ajournals. It’s the sort of thing we didn’t really expect 30 years ago. We thought, “Oh, that’s a tenure case. Oh, that’s a junior hire.” Those kinds of pressures I think produce a narrowing of focus.
Now you say, “Well, yes . . .” I said earlier a lot of the great advances in science come from hedgehogs. We’re producing hedgehogs on an industrial scale here. And I think there’s some advantage that may make a little bit more room for weirdo eclectics.
COWEN: And the way that departments are carved up — would you change that at all?
TETLOCK: A number of academics in the past, like Jim March in UC Irvine many decades ago, tried to do something like that. Amy Gutmann, the president of the University of Pennsylvania, is trying to do that with Penn Integrates Knowledge, which is the chaired professorships that were used, among other things, to hire Barb [Mellers] and me and hire a number of other people. And we are sort of floaters. We’re not connected to one unit. We have multiple connections. That’s a very hospitable work environment from my point of view.
But I don’t see a lot of places doing a Jim March–UC Irvine experiment. “Well, let’s integrate all social sciences together.” And Amy Gutmann, PIK, Penn Integrates Knowledge kind of program — you don’t see too many of them yet. It runs against the grain. I’m not even sure it’ll survive at Penn beyond Amy. The natural tendency will be for departments to want to claw back the resources.
COWEN: Let’s say someone comes up to you and they say, “Philip, I would like to be more fox-like. I’m not enough of a fox.” What actual advice would you give them to achieve that? What should they do? Or not do? Wake up earlier in the morning? Exercise more?
TETLOCK: I had to commit career suicide to become more fox-like.
COWEN: Maybe they’re not an academic. They’re a smart business person. How do they do this?
TETLOCK: Kind of obvious things, like read a little bit more outside your field. If you’re a liberal, read the Wall Street Journal. If you’re a conservative, read the New York Times. Expose yourself to dissonant points of view. Try to cultivate some interests outside your field, and try to connect them together. I think there’s an optimal distance.
For history, for example — sounds quite different from what I did. I started as an experimental psychologist, and history looks very different, but they can be connected because historical judgment is something that psychologists study to some degree. Psychologists are interested in hindsight and counterfactuals and so forth. So you can link the two.
I think there’s an optimal distance, and when you go foraging as a fox, you probably don’t want to forage way, way far away. You want to forage far enough away that it’ll be stimulating but still possible to reconnect.
COWEN: What kinds of people are best at adversarial collaboration?
TETLOCK: Those are rare people also. It’s really hard to do.
COWEN: But how do you spot them? Is it a personality trait? A cognitive trait?
TETLOCK: Gosh, that’s a hard one. Danny Kahneman, I think, coined the term when he was dealing with his various critics over time. My wife, Barb, actually was involved in an adversarial collaboration between the Kahneman camp and the Gigerenzer camp on the conjunction fallacy. It took at least two or three years of her life.
It’s very hard to get people to . . . Well, lots of us, in principle, are Popperians. We believe in stating our beliefs as falsifiable hypotheses. Most of us believe our beliefs are probabilistic, so we’re somewhat Bayesian. But that’s lip service. There’s what we believe. There’s an epistemological formal self-concept, which is kind of noble, falsificationist, and probabilistic. And then there’s how we actually behave when our egos are at stake in particular controversies.
And those things are quite different. I think Danny Kahneman who’s not known as an optimist, proposed it. It sounds like an optimistic idea, but I think he’s not all that optimistic about what it can achieve on close inspection. My efforts at adversarial collaboration have not been all that successful. I’d like to jumpstart a few of them again, but it’s very hard to find the right dance partners.
COWEN: Let’s try a question from the realm of the everyday and the mundane. If I go around, and I look at Mexican restaurants, I’m very good at predicting which ones have excellent tacos. What are you good at predicting?
TETLOCK: I’m not good at predicting your questions.
TETLOCK: What am I good at predicting? I think I was pretty good at anticipating the fragility of a lot of microsocial science knowledge prior to the replication crisis erupting.
COWEN: No, I mean everyday life.
TETLOCK: Oh, okay. So not in my —
COWEN: Not social events. Not social science. What in your life are you good at predicting? When you’re going to get tired and wanting to go to bed at night? Or when the dog wants to eat? What is it?
TETLOCK: Well, we have a pet-free existence. Our lives are actually pretty simple. It gets me subscribed to that old adage: You’ll be boring and staid in your life so you can be violent and creative in your work. We have kind of a routine here.
COWEN: That was highly predictable.
TETLOCK: And now, in the quarantine days . . . It used to be, pre-quarantine, “Oh, we’re going to go to Europe. We’re going to go here or there.” There were the little points of unpredictability that spiced up life, but those do not exist right now.
COWEN: And two more questions to close. First, what can you tell us about your next project?
TETLOCK: Well, I think the next project is the one that I mentioned earlier. It’s linking historical counterfactual reasoning with conditional forecasting. I think it’ll be the second phase of the FOCUS research tournaments. I think that counterfactual reasoning has, for too long, been the last refuge of ideological scoundrels. Insofar as we can improve the standards of evidence of proof in judging counterfactual claims as well as conditional forecasts and linking the two, I think there’s a potential for improving the quality of debates among interested parties.
COWEN: And finally, what should a superforecaster predict about the future course of your influence?
TETLOCK: Oh, to be very cautious because we know we’re running against the grain. We’re running against a psychological grain. We’re running against human nature. We’re running against a sociological grain.
COWEN: But see, enlightenment is stable you tell us, right? If it keeps on cumulating and growing, your influence should be enormous given your other presuppositions.
TETLOCK: Well, there’s a lot of cognitive resistance to treating one’s beliefs as falsifiable, probabilistic propositions. People naturally gravitate toward thinking of their beliefs as ego-defining, quasi-sacred possessions. That’s one major obstacle.
Then you have the existing status hierarchies. You have subject matter experts who are entrenched, have influence. Why would they want to participate in exercises in which the best possible outcome is a tie? They reaffirm that they deserve the status that they already have.
So psychological, sociological resistance. It’s interesting, sociologists and economists have different reactions to forecasting tournaments. Sociologists — they react, “Why would anyone be naive enough to think that anyone would want to have a forecasting tournament in their organization?” They’re status disruptive, right?
TETLOCK: And an economist would say, “Well, if these things are so great, how come they’re not everywhere?”
COWEN: And with that, Philip Tetlock, thank you very much. It’s been a pleasure.
TETLOCK: Take care.