How “Platforms as Publishers” Could Threaten Journalistic Ethics

News organizations have got themselves into a tough spot. After years of not valuing page load time, social platforms have begun implementing systems that either host articles directly (Facebook Instant Articles) or dictate technical standards for how story elements are structured (Google AMP). The result is largely the same: news organizations that use these platforms see improvements in load times, but they are losing control over how they distribute and present their journalism.

Although performance is an area that the news industry should dedicate more resources to, using these platforms to solve that problem raises ethical questions.

This issue is most visible with interactive stories: articles that ask readers questions about themselves and use that data to personalize the resulting narrative. If platforms host these articles, they could capture reader responses and do things such as add that data to an advertising profile, or sell it to a third party. If a news organization is aware or suspects this is a possibility, it could also chill a newsroom’s output of these types of stories, which raises questions of press freedom.

For example, the New York Times ran an interesting interactive story on jury selection. The article asked the reader about their personal values and then showed how those views might affect whether they’d be picked for a jury. The story specifically states at the top “Your responses will not be stored.” If this story ran within, for instance, a future Facebook Instant Articles quiz component, how do we guarantee that indeed those answers don’t go anywhere?

In the wake of the Newtown shootings, I was an interactive reporter at the Daily Beast where we put together a quick project asking readers why they did or did not own a gun. We used a similar format for the 40th anniversary of Roe v Wade where we asked readers to tell us about their experiences with abortion and complete either the sentence “I am pro-choice but…” or “I am pro-life but…”. Responses in these two projects were both honest and fascinating. They helped us tell stories that were outside the longstanding frames of the debate.

Facebook is one of the largest collectors of personal data in the world. Could an answer in the “I am pro-life but…” column be sold to a political campaign that might serve that reader ads to sway their views? Anti-abortion groups are reportedly using online ad profiles to send targeted ads to women inside abortion clinics.

Red flags should go up any time one creates structured data around what people believe or value.

Maybe you clicked the button that says you “strongly distrust the police” in a recent New York Times story on body cameras. Law enforcement is largely within its rights[1] to request everyone’s answer to that question and add it to an individual’s predictive “threat score” its jurisdiction is using.

Reader answers in these stories are not guaranteed to be truthful, either. Readers could click a button that doesn’t reflect their views because they’re curious to see how the interactive reacts differently. It would be hard, if not impossible, for a computer or a law enforcement agent to distinguish, however.

Could a platform say they won’t collect data against certain topics? Yes, certainly. And companies such as Google and Facebook do have guidelines on what data they will or will not target against. But a newsroom’s take on what constitutes sensitive data might not align with a platform’s policies. Even with such a policy in place, the online data collection ecosystem is extremely opaque and would be almost impossible for a newsroom to audit with certainty.

As a news organization, what is our responsibility to run or not run stories that we suspect the platform will mine for information that could be used against the reader’s interests? This question doesn’t have a very good answer but it does give the journalist only two choices: do something that could harm your reader or censor your work[2].

But does surveillance scale?

How likely is it that a platform would go through the effort of sniffing this data given how many thousands of articles are published a day? I’m not sure, but here are a few possible scenarios using some of the examples above:

  • The social platform mined answers from just the high traffic pieces. These types of quizzes are some of the most popular content news organizations publish. The Times dialect quiz, for example, was one of the most-visited piece of content they ever ran.
  • Law enforcement is interested in a few stories or in a group of individual’s specific responses.[3]
  • An advertiser is interested in a segment of the population (gun owners, or women who have had abortions, etc.) and hasn’t previously found a reliable way to get that data.

In any of these scenarios, it’s plausible that the platform could assign an engineer for a dayto come up with a simple tagging system.

Because some of these platforms are already enforcing markup standards, in the future, platforms could mandate content creators to semantically tag response submissions on these types of “quiz” stories. In that case, the page would automatically return structured data.

Quizzes, iframes and code reviews (the technical section)

This problem isn’t unique to pre-built “quiz components”, though. Our readers are vulnerable in any scenario where the code from an interactive story element is not served directly from the news organization’s servers.

For example, if custom-coded story elements (e.g. a JavaScript form that accepts user input) are served to the user through an iframe[4] or a similar technique, and that code is served from the platform’s CDN, as Google AMP does, the news organization loses some guarantees over data privacy because the code could be modified in the CDN mirroring process. Such modifications would certainly require effort and specific intent on the platform’s part but the scenarios discussed above are still plausible.

Another idea that’s seen some discussion is to allow custom story JavaScript to exit outside of an iframe and run within the same window context as other elements on the page. This is an interesting idea but many software systems rightly frown upon executing third-party code because it poses user privacy and platform security reasons. To mitigate such risks, a platform could institute a code review process. While that makes good technical sense, a review prior to publication would be a serious threat journalistic independence. If I write an interactive story that takes a critical lens to the technology industry, would I have to submit it to a tech company for approval?

Newsrooms are jumping on board with platforms under the promise of larger audiences and more revenue. I’m certainly sympathetic to the need to create a business model because I’m in favor of a strong press, which requires healthy news organizations. However, this relationship is not just a new distribution model or a business decision. It could affect what stories journalists do and how we do them. For my field, interactive journalism, it could threaten whether we can continue doing our job while being both creative and ethical.


[1] Nationally, the FBI has recently pushed for easier access to user data, arguing that that a “typo” in the current law is preventing them from accessing it today. The Intercept reported that such language authorizing access to emails and browsing history was inserted into this year’s intelligence authorization.

Some states have moved in a different direction. California recently instituted a measure that requires law enforcement to obtain a warrant before accessing user data. Texas, Maine and Utah have similar laws. California’s law was criticized by privacy advocates, though, because it doesn’t stop companies from proactively disclosing user data in the absence of a warrant. In other words, if a company wants to give up user data, they don’t need to wait for anyone’s permission — they can simply hand it over.

[2] It’s worth noting that the idea of sniffing user responses is not unique to platforms. News organizations already allow a number of third-party scripts on their pages. A recent CNBC article that used a form for reader submissions was shown to inadvertently allow that data to be sniffed by ad code on the page. At newsrooms where I’ve worked, I’ve never seen that on our pages but it is something worth discussing because it could mean that our pages are already collecting and codifying this information. The trend toward implementing https for news articles recognizes that reader behavior should get some measures of protection, particularly in country’s with authoritarian regimes that monitor internet use.

[3] The idea here is not that law enforcement should be forever barred from accessing this information. Per footnote number one above, that question is currently being figured out by state and federal governments. What this post is concerned with is how do newsrooms react to a world where such information could be stored and used by third parties for arbitrary purposes.

[4] Many in the data and interactive journalism community feel that serving interactive content via iframes is unworkable for creativity. For further reading on views from the news community, I recommend this post from Rich Harris, an interactive journalist at the Guardian US, and this roundtable from Source.