Image from “Twelve Million Phones, One Dataset, Zero Privacy”. New York Times, Dec 19, 2009

‘Twelve Million Phones, One Dataset, Zero Privacy’— An Interview with the New York Times’ Stuart A. Thompson

Madison Hall
Nightingale
Published in
13 min readJan 27, 2020

--

Stuart A. Thompson is the Graphics Editor for the New York Times’ Opinion section. Before working at the Times, he was the Graphics Director at the Wall Street Journal. His recent series with journalist Charlie Warzel, “One Nation, Tracked,” has gained significant attention in the data and privacy communities as the series breaks down 50 billion location ‘pings’ and 12 million unique IDs into being able to track the President of the United States’ every movement.

MH: When you first received the massive dataset, where did you even begin? Can you elaborate on what languages/services you used to be able to make the connections in the story?

An example of what the dataset looks like from the “One Nation, Tracked” series.
An example of what the dataset looks like from the “One Nation, Tracked” series.

ST: Well, the data is quite large but not that complicated. We published a screenshot of the data in the piece and it’s basically a spreadsheet. It’s 4 to 5 columns, and just billions of rows.

We’re lucky at the Times to have a tech team that has a little bit more experience with really large datasets. These guys come from companies like Spotify and are used to working with data much, much larger than what we have. They work with tools that are slightly above my pay grade to give us a searchable database that’s indexed so we can do geospatial queries on it.

We started off using a slightly different form which, basically, we just queried it in a way that we could slim down the results. Then I built a separate interface, a web browser-based mapping interface, that you paste in a bunch of JSON and then it processes it in-browser and puts it on the map and lets you filter by time and hours that will do some query matching using a library called “Turf.” It’s for JavaScript mapping queries and that’s basically the tool that we had. We could all jump in there and work on looking at specific data paths.

Back when Google Maps was a big deal and Google API in the newsroom was pretty big, I would build these Google Maps for “bicycle accidents” or something. It was a very similar process. It’s like you have points, you can cluster those points, and you can draw lines between points. I just used Leaflet.js to do that.

MH: So for clarification, a different internal team scoped out the dataset and you were working in JavaScript/Google Maps API to show routes and organize the information?

All of the pings from one day at Grand Central Terminal in New York City.

ST: Yeah, so the data itself, because it’s LAT/LONG in the spreadsheet, is not super useful — you definitely have to map it. We had a web interface tool to query the data. We basically would draw a box around an area we were interested in and then either got back a bunch of IDs that we could then query or we’d get just a bunch of points. So with the big dot maps at the beginning of the first story, those are the result of drawing a box around an area and then downloading all the points that are in that area.

We’d then query an ID and it would spit back the LAT/LONG and dateline. Then we just pasted that into the interface tool which would map the points for us and let us kind of interrogate and work through the de-anonymization. That was one of the risk assessments that we were trying to do.

MH: How many frames and different ways of presenting this information did you go through? You told me the project took seven months, clearly it wasn’t a one drafter.

ST: From the very beginning, we had a couple ideas we wanted to do and build off of the great work of our colleagues in the newsroom who had already done a piece about location data. We knew that the difference here was one of volume and scale, so we always wanted to show a big view of the data. But we really didn’t start working on that until pretty late, like the last month/last few weeks. I had a good sense of what I wanted to show.

With the Pentagon, we looked at that early on and we found it shocking. That was what I showed people in meetings and my editors. I’d say, “What do you think the Pentagon looks like?” And then I’d show it to them. So stuff like that I knew the particulars. It’s dots — that’s what the data is — so that’s what we’re going to show. The fact that it’s green, a designer came up with that. We originally used orange and it had kind of a biohazard-type feel to it. Blue seemed too acceptable and normal of a color and seemed like we were condoning it. So green is better.

We played with the base layer of what the maps looked like. We had originally done a bunch of treatments that were not satellite images and slowly worked our way back to using satellite images. It was just the clearest thing that made it feel the most real.

With a custom-built library, Thompson was able to highlight specific areas and visualize just a small glimpse of the data.

I coded that zooming map library which is what powers the Central Park thing and the Trump thing. I’ve used that library on a couple of projects before and then the Saturday before we published I completely refactored it, deleted it all and started again, because it was performing so shittily. I think in the end it worked out.

MH: You wrote a piece in the past about online advertisements utilizing data from browsers and credit card tracking. Does the new cell phone piece change how you see the future of online advertising?

ST: I mean, it could. It depends what lawmakers do and what the public calls for. I think there’s a bit of inertia with all stuff related to privacy. Because it’s not enshrined as a right, we have to constantly find harm in order to care about it. Whereas, if it’s a right then you defend it on that basis alone. Like, we have a right to privacy from the government, but we should think of our private lives as a right, I think. Then you don’t have to constantly find and then convince people that there’s a harm associated with losing every moment of privacy.

But it really depends what happens next. You can see a world where at the bare minimum these companies are contained slightly so that they can’t keep data forever or sell to whoever they want (how it is now) and that they have to treat it as personally identifiable information. Right now, they treat it as “anonymous data”, sell it to anyone, anytime, and could give it away for free if they wanted to.

The outcome of the story remains to be seen. I think it freaked a lot of people out and woke a lot of people up. I should say the second-most read in the series wasn’t the article where we tracked someone assigned to President Trump’s security detail, but the “How to Protect Your Phone in 3 Steps.” So tons of people shared that and wanted to know how to protect themselves which I think shows how concerned people are. There’s a sense of, “well I have nothing to hide,” but then if I say, “okay well here’s three ways to protect yourself,” they’re probably going to go and do those three things because, why not? So that speaks to how concerned people actually are. We’ll see what happens.

MH: You mentioned that people say “I have nothing to hide,” but I think we have a lot more to hide than we really think.

ST: That’s a great point and rebuttal for people to say that. It’s like, well maybe you don’t have anything to hide. There’s an idea that you can’t escape the system even if you want to, so if you’ve got nothing to hide, that’s great. But what if somebody does? They can’t get out of it either. Shouldn’t we have a system set up where if you want to you can remove yourself from it? Or maybe a system that is not going to, by default, collect all the private moments from your life? We’re so willing to accept it the way it is, but it just takes a little bit of work to make something better. And more cynicism and distrust of the companies that are involved in collecting all the stuff.

MH: Do you think this apathy is localized to the United States or is this more widespread than that?

ST: I was talking to somebody from Europe and he was like, “The people in my country don’t care, and it’s nice to see that the U.S. cares and there’s a story out there.” He said, “Everyone here says ‘I have nothing to hide so I don’t care if I’m being tracked.’”

Well that’s what everybody says here too! We were in Pasadena reporting on the story and we talked to a couple people — we’d be knocking on the door saying “we know everything you did for this period of time and now we’re on your doorstep showing it to you” — we expected people to be freaked out, like out of their minds concerned, and go throw their phone in a lake. But they weren’t and we couldn’t figure that out, like why?

The conclusion we reached is that people are a little bit brainwashed. These companies have controlled the terms of engagement this entire time. So they control the consent screens, they control the information that’s given out to people, and they control the reward system. So if you want to use this new app, you have to consent right away, and you’re not really sure what they’re doing with the data. They don’t tell you and maybe in the privacy policy they say something that’s not even clear there.

People are going to not care because it’s not enshrined as a right that we have, that we could get out of this kind of monitoring if we want. I think it’s a universal thing. I think people everywhere kind of feel that way.

MH: As more data gets collected and it becomes easier to access, do you see this data being utilized for blackmail and times of war in the future?

ST: Yeah, definitely. The national security risks are all there. We had a great quote from a guy who said “Russia/China have people that sit down at a computer from 9 to 5 every day and try to steal data and that’s just what they do.”

This data is extremely intimate. More intimate than any other data I can think of except for search history or emails. But there’s no industry that I know of that sells raw email histories or raw search histories. Like it’s definitely monetized, but it’s not sold and transferred. But this data is! This data can be shared easily — it’s not a complicated relational database, it’s a spreadsheet. A very big spreadsheet, but a spreadsheet nonetheless.

There’s no telling what some of these companies have done with the data and there’s been no regulation to stop them from doing anything they wanted, more or less.

MH: Other than what’s in the “3 Steps” article that you’ve mentioned, how did the project change your personal habits and how you view your own personal data?

ST: Yeah, I mean the project scared me a lot. It totally changed the way I think about everything related to privacy. Now I turn off my location at all times.

It was not long ago when a friend of mine had this app that would track where you go and then would tell this little journey of your life. I was like, “Cool, that is so neat; I’m gonna turn that on.” I got rid of it eventually as it was draining my battery, but I thought of it as innocent fun. It’s like “What a cool thing that our phones can use GPS and come up with these cool ideas.” But 100 percent, that app is using that stuff for attribution, advertising and all the things like it. No doubt in my mind.

It’s really made me cynical about the world we’re building. It’s exciting to have this technology that’s improving our lives and making things better, and then at some point in the last 10 years it’s totally been corrupted — where any piece of information that you give out just by living your life and moving through the world is harvested and collected and stored for all time under these conditions where they’re not going to tell you that they’re doing it and they don’t want you to know what’s going on. But they can also claim that they are being transparent because they’ve given you one consent screen, one time, 5 years ago and that’s all they need to do legally. So it’s a system that exists all around us, but is so easy to miss because we’re just living our lives. We shouldn’t be expected to be experts in all this stuff but when you start to peel it back and think like “wow, I don’t want to be a part of this system and participate,” sort of where I am now, it’s kind of too bad. It’s so hard to get out of it.

I got a targeted ad the other day that was definitely from something I clicked on or looked at, but I try to be bulletproof. I have like 5 ad blockers, I use the Brave browser, I use DuckDuckGo, the DuckDuckGo mobile browser. I am locked down and it still gets through. I use a fake email address to sign up for everything. I am a nut. My editor was joking that I was going to file these stories by carrier pigeon because I was going to be holed up in the woods somewhere — and he’s not far off because I’m freaked out all the time. I think more than changing my day to day behavior, which it did, it just made me so cynical about the companies that are in our lives all the time.

MH: “Scrollytelling” is a trend that seems to be growing among publications, especially at the NYT. Why do you think that resonates so much with the readers?

ST: Yeah it’s funny, there’s always these little trends in the industry and that’s definitely one that we’ve seen. You could probably trace it all back to this one presentation that Archie (Tse) did a while ago where he said people don’t click on anything… and then everybody stopped. I think [scrollytelling is] the response to that idea of simple user interaction. You can contrast it to like a decade ago where it was dashboards: buttons, drop-downs, filtering, editing — giving readers the power to find the story themselves. Then slowly we’ve realized it’s a horrible idea and instead, people just want to get the information and get out. Scrollytelling is a way to do that. It requires one interaction that we’re all really good at — scrolling. Then you can update data and show different slices and the only thing people have to do is read and look. I think that’s the reason why it’s so popular.

A screenshot from the “Peak Olympics” interactive from the New York Times Opinion team. This utilizes clicking rather than sc
A screenshot from the “Peak Olympics” interactive from the New York Times Opinion team. This utilizes clicking rather than scrolling to move to the next point.

Will that stop being popular and go away? Probably. If you’re going to think about it in terms of interaction, we’ll probably go back to clicking because of mobile phones and the way that Instagram stories work. We’ll probably have a lot more of that; we have a template at the Times for tap stories. In the Times Opinion, that’s been one of our calling cards. If you look at a project we did, “Summer Songs” or “Peak Olympics”, we’ve done a bunch of things in that mode which is sort of a response to how mobile phone users — who are the majority of our audience by far — tend to react to stuff. So it’s a bit of a trend but it’s also the easiest way for people to get through content. I’d say it’s probably going to change.

MH: What difference is there working with graphics in an opinion section rather than the traditional news side?

ST: That’s a good question, I get that a lot. Sometimes people are like, “How can you make a graphic an opinion? Those are contradictory ideas” and I think that basically people have the wrong idea of what opinion is. People think of it as the equivalent of Fox News or something where it’s the Fox News commentator saying “This is what I think.” When really opinion is, what I find, agreeing on the same set of facts (like the facts are the same and not quite in dispute with the normal newsroom), but the conclusions and what you do with that information is different. So with climate change, you have a lot of people who agree on the fact that climate change is happening. Opinion’s job is what to do about it.

There’s cases where I think Opinion plays a slightly different role than the newsroom. Like we might be more willing to imagine scenarios, make some assumptions, build a model, draft some conclusions from a data set or let people play with the dataset and look up projections.

MH: What advice do you have for people aspiring to join the design and data visualization/journalism field?

ST: Something I always try to tell people is to just do the work. It should just be for yourself. If you don’t have another opportunity, or work in a newsroom that doesn’t do this type of work but you’re interested in it, just do something. It doesn’t matter what it is or how impactful it is. It doesn’t have to be a big investigation. Some of the coolest work we pass around is just someone’s fun project that tried something different and did something innovative, or posted something on Observable that’s a little unique.

Being able to make the things you’re dreaming of is pretty big. It doesn’t have to be groundbreaking, it could just be something that interests you. I think for a lot of people that are trying to get into the industry or thinking about getting into it, especially if it’s coming from the reporting side, say, “I’ll totally get into that and learn that library once I get a job doing it.” But that’s not really enough. If you’re passionate about it and enjoy it, that’s going to give you the most success.

You can find more of Stuart A. Thompson’s work in the New York Times or by following him on Twitter.

--

--

Madison Hall
Nightingale

Data journalist and visualization enthusiast @byMadisonhall on Twitter