Behind the Book — Interview with Gábor Békés

Published in

CEU Threads

12 min readJan 2, 2024

A few years ago, with Gábor Kézdi, you wrote a book titled “Data Analysis for Business, Economics, and Policy”. Tell us a bit more about the inspiration behind it. What made you write the book?

One of the motivations came from a time when I was supervising an MA student’s dissertation more than 10 years ago. It turned out that many of his questions were about relatively unimportant issues, like whether we should use fixed effects or random effects and how we should test for this. And it doesn’t matter, really (and we shall use fixed effects).
So I felt that there was this disconnection between what you should know to do empirical work and what traditionally has been taught in econometrics. I went up to ask Gábor Kézdi, who at that time was head of the department at CEU and had been teaching econometrics for a long time. How about we do a small course on how we do empirical work and how to analyze data? Not like at the academic or PhD level, but at the lower level.
We then started to teach a course together. After three or four years of teaching it, there was a moment when I was sitting in Gábor’s office and we were contemplating the book. We said, okay, let’s try to put what we have into a proposal and see if it flies. And, you know, it has been a long process from start to end. Almost 10 years.

What did that proposal look like?

The way it happens is that once you’ve collected enough material, you can kind of visualize how a book should look like. Then you write one chapter fully, and you also develop a detailed list of topics for the rest of the book. The chapter we submitted to the publisher was the actual Chapter 7 of the book: Introduction to Regression.

But it must have been challenging to get noticed. There are so many books about causal influence and regressions. What do you think made you stand out? What was your vision?

I think the vision was democratizing data analysis. Often statistics and econometrics are daunting for many people because you get bogged down by asymptotic theory and by derivations and proofs. And that’s interesting, but not necessarily the most important. So I think our vision was to curate what really matters in analyzing data and to explain it in plain English. And also to provide a lot of case studies and exercises so that people would get familiar with the practice. We wanted to change what is typically taught in a Master's program in Economics and move away from certain topics while adding exploratory data analysis and machine learning and prediction.

Since you’re talking about the democratization of science, did you ever consider self-publication or posting the content online without a publisher?

We wanted a good publisher because it’s harder to get noticed if you are not a very famous person and/or you are not coming from a top US school. At the time Kézdi was at the University of Michigan, so that helped. But still, you need a publisher as a quality assurance or a quality signal. So we never thought about self-publishing. We always wanted to get a top publisher and basically, the choice was between US and UK or European publishers. Because I was mostly in charge of negotiations, it ended up being London-based publishers. It was not difficult. I met with three publishers and they all wanted in. I made them compete and in the end, Cambridge won.

Tell us more about the competition part.

So, you know, there is a Tom Sawyer bit where he gets other boys to compete and make offers to get to paint the fence. Inspiring. I got in contact with three publishers and let them make offers. I sent this proposal to all of them and went to London, had three meetings over two days. I told them that I was talking to others. Eventually, it was close between Cambridge University Press and another large publisher. Cambridge was more comfortable letting us do our vision, which is to write one single large book as opposed to the other publisher who said it should be a series of smaller books. Indeed, smaller books may be better commercially, but we wanted to encapsulate what people should know to be doing data analysis in real life.

Is there anything you included in the book that you now think you should not have, or is there anything you didn’t include that you now wish you had?

There isn’t much I would take out. I hope to have a second edition in 2025. It will be hard as I’m missing my friend and co-author, Gábor Kézdi, who sadly passed away two years ago. So it will be a happy/sad exercise to work on the second edition.
However, there are a few things that I plan to add. We get feedback like you should have added two more paragraphs explaining a concept and then there are some useful concepts we left out. One example is the Frisch-Waugh-Lovell theorem on regressions which will be included because it is pretty useful in many applications. Interpretable machine learning has improved and evolved, so Shapley values (SHAP) will probably go into the book. The methodology around the staggered diff-in-diff estimation has improved quite a bit, so I think there will be two pages covering one new method. But I haven’t made up my mind yet. There’s one thing that almost made it, which is doubly robust estimation. Maybe this time.

In your interview with Scott Cunningham, you talked about seeing LSE professor Tony Venables distribute handouts in class during his book-writing process. Observing how you integrated similar handouts into your book — which I also contributed to — I’m curious: Were you, perhaps even subconsciously, influenced by Venables’ approach?

Yes, I was a master's student at LSE in London, and there was Tony Venables who taught urban economics and economic geography which I absolutely loved and he was fantastic. At that time, he was writing a great textbook with Paul Krugman and Masahisa Fujita about economic geography and he was actually teaching out of the pre-prints or the drafts. I loved that textbook and it was a great experience to see how it’s compiled and to see that this is possible. You see a template, so yes it was certainly an important input for me.

Are there other influences in your career or people that have inspired you, and have directly or indirectly affected the way you think or approach research?

Yes. Gianmarco Ottaviano, my PhD thesis supervisor and co-author, was very influential, teaching me about the importance of theory-based empirics, focusing on why we do some analysis. But I certainly want to talk about Gábor Kézdi. With Gábor, we taught a data course together for roughly six to seven years. For me, this was new as I used to teach urban economics before. I would say our course didn’t have a super strong metrics background which is a bit weird. I think Kézdi’s common sense approach to econometrics was a huge influence on me. His approach of not looking for the most sophisticated solution but looking for the simplest one that still works. Someone said that our textbook is an ode to OLS. Maybe. It’s certainly about trying the simplest method first, and adding levels of complications if needed.

With Gianmarco, you also have a joint paper titled “Cultural homophily and collaboration in superstar teams” which I find fascinating. Can you tell us a bit more about that?

The paper is about understanding how multinational workplaces with high-skilled people function. Homophily is quite relevant in this setting. We know the self-selection channel: people are often picking other people who are similar to them in some sense whether this is ethnicity, race, gender, age, we know the benefits of diversity in combining skills. But then there are also costs associated with diversity. We wanted to see whether among the highly skilled workforce, people would collaborate more with people who are similar to them. In our case, similar means shared nationality or cultural background. The fun part was realizing that we could use data from football matches to answer this question.

How did that idea come about?

With Gianmarco, we have worked together on a few different projects. He has been interested in diversity in cities. He would tell me about his projects about cities, but he’s also a football fan supporting Inter Milan. Regardless of what we were doing together — for example, we wrote a paper on innovation and internationalization of firms — we would discuss football extensively.
In one of these discussions he was telling me that regarding diversity in cities, the big problem was that you never see who collaborates with whom as you can only see aggregate stuff. Same with firms, you can observe firms and the composition of the workforce and you can observe the performance of these firms, but you don’t really know what’s going on because it’s super hard to observe people in a workplace. And then we realized that the advantage of football is that when you see the game you can observe how people collaborate. So I think that the big idea came from shared interests in urban economics and football. It also coincided with my interest in working with large data sets.
So the paper looks at football teams and how people with different cultural backgrounds collaborate in terms of passing intensity. We have data on people coming from more than 100 countries. We find that people who share either nationality or colonial background (think Argentina and Spain), will actually collaborate (pass) more than those who don’t. There is homophily in collaboration even in these superstar teams in a setting where language is not that important.

What is the mechanism you propose?

It could be prejudice: people may have never met people from certain countries and there’s a distrust. If this is the case, we can expect homophily to drop once players get to know each other. But instead, what we find is the opposite. When people start to spend more time with each other, their choices become more homophilic: they will collaborate even more with people from the same background. We think this is because people will hang out more outside of work with people who are similar to them. And that kind of creates familiarity with each other which then translates into more collaboration in the workplace.

As a soccer fan, what aspects of soccer attract you to it and how do you compare it to other forms of entertainment?

I think soccer, when played nicely, is just incredibly beautiful. It has the strategy, it has skills, and it has unexpected solutions. I like it when it’s extremely well done, and then you can see the plan behind it. So, you start to see not just whatever is happening at that second, but what’s going on beyond the action.
There is a deeply human element as you learn about people, what happens to them, and sort of get interested in their stories. It’s not that different from liking South American TV soaps. It is very comforting to follow something, anything regularly. It’s a great idea, unfortunately, it’s not mine; it’s Milan Kundera’s, who just passed away fairly recently.

One thing I admire about you is your ability to update your views and not shy away from learning new things. For instance, you took up coding later in your career. Can you share what motivates you?

I’ve actually written some code even over the two-year period that I worked in investment banking in London. I mean I ran regressions but yes, I didn’t really think about coding, like objects or classes, etc. The book was a big change because that was the first time we had to build a large code base. This was also the first time that I moved away from STATA, I had to learn some R and then also a little bit of Python, at least to read it. And then once you do that, you have to think about the differences across these languages.

What makes you keep learning new diverse things?

I think one thing leads to another. But I’m not genuinely interested in coding. I don’t find beauty in code, I consider it as a necessity, as a tool, but it’s not like, okay, oh my God, this code is so pretty, and we should really have like 100 line chunks.

[I, instead, happen to have feelings about code :-) Check out the article that Gábor is referring to:]

Tidying Up (Python) Scripts

Last week, a bundle of Python scripts landed on my desk(top). Each script was a year-specific file for data cleaning…

medium.com

I also know that you swim. You seem to approach swimming also with an improving/learning mindset. Is my reading correct?

I have always loved swimming. What is new is timing a 1km swim. I realized I needed something to focus on so that I don’t let my mind wander and dwell on problems. Measuring time helps, but not because I am competitive. I enjoy thinking about the time I just made, or how many laps I have left if I finally manage to go beyond a target time. These things help me be more present and relaxed. That’s the reason I do the measurements.

You’re one of the few academics I know who has also had “the real world experience”. How did your work in the industry shape you as an academic?

Yes, before my Ph.D. I had my “real-world experience”. I don’t know if it helped my job as an academic, but one thing that you learn is to have a stake in your results. What you say is just a number, maybe a model coefficient, but people may act upon it. Also, when you make an analysis and a forecast, you have to sell it to other people in the bank. So this idea of translating analytics to decision-makers, that’s something that I certainly took from those two years. It is helpful in teaching for sure.

How did the industry experience help with your teaching methodology?

Mainly, in the sense that people will use what we teach. It has led me to try to be precise in interpreting your results and drawing conclusions. What is the exact meaning of this coefficient? Start with a precise interpretation of a coefficient and then move to a more general policy interpretation. Also, push for economic meaning beyond statistical significance which in real life is less important.

I am curious about how you envision your future: research and life. What comes next?

There is a project related to open-source software. There is a theory paper that we are working on, which is completely new to me. I’ve never done that pure theory, and I’m enjoying it tremendously. Also, more papers using football data are in the works, one on labor markets and maybe one on organizational behavior.
Because writing the textbook took a large amount of my time, there were kind of a few years where I didn’t do research and it’s nice to be back and thinking about stuff. I enjoy that.
And then I’m also trying to have more time to do fun and slightly crazy things like flying to London to watch Arsenal play.

Very nice. It’s also a sweet spot to end this interview. Thank you so much for doing this.
Thanks a lot. It was fun.

Gábor Békés is an Associate Professor at the Department of Economics at the Central European University. He is also a senior research fellow at the KRTK Institute of Economics in Hungary and a research affiliate at CEPR. You can also find him on social media: Twitter: @GaborBekes and LinkedIn.

You may also be interested in this article from CEU Economic Threads:

The Economist as a Software Engineer

Whenever my kids ask me what I do, I tell them I am trying to figure out how the world works. They don’t believe me.