What I’ve learned about AI and why journalists should care

Florencia Coelho
JSK Class of 2019
Published in
7 min readJun 6, 2019

And why I’m excited about satellite imagery, the environment, and human rights

Deep Solar by Stanford

What we need to understand as journalists is that AI offers a new dimension of opportunities and challenges to build upon computational and data journalism.

It’s being discussed how useful it will be for investigative journalism but definitely, there are interesting solutions and I think its usage is going to spread in newsrooms around the world. As it happened with blogs, video, mobile, social media, and data journalism.

Scientists argue about the name Artificial Intelligence and that it should be renamed. Either way, there are several definitions of AI and I’ll stay with this one from Google Dictionary.

“Artificial Intelligence is the theory and development of computer systems able to perform tasks that normally require human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages.”

It’s intrinsically related to the world of computer science (CS) and data science (DS). I’ve heard AI named as “fancy statistics”.

Image: Internet meme inspired by the original work of Sandserifcomics

TYPES OF AI

The first recorded use of the term “artificial intelligence” was in 1955, by Professor John McCarthy at Stanford University.

Interested in history? You can navigate a comprehensive historical infographic prepared by Stanford’s new Human-centered AI Institute.

There are two main types of AI you should understand right now. They’re usually called Strong/General AI and Weak/Narrow AI.

Strong or General AI. This type implies that computer systems have consciousness, sentience and self-awareness. They could ideally multitask among different challenges and projects.

This type is NOT happening anytime soon and as I heard from scientists and researchers it’s doubtful it will ever exist.

This is the scary kind of AI you might see in movies, where robots come after the humans. At one conference, I heard that some consultants are raising money to research on General AI exploiting people’s fear of this type of AI.

Weak or Narrow AI. This is the type that has been evolving to solve specific problems. It can reason logically, find patterns, and “learn” within a scope of focus.

This is the kind of AI that is referred to, tested and used in academic, governmental and business areas.

As journalists, we should perhaps be more concerned about bad stuff associated with Weak AI, like autonomous weapons, and the increased automation in specific job markets.

Subdomains of Narrow AI.

From there, different subsets of AI have been developed.

The most relevant for journalism that you’ll probably read about are Machine Learning, Natural Language Processing (NLP), Speech, Vision, Expert Systems, Robotics, etc.

These subdomains have other subsets. For example, Machine Learning can be Supervised or Unsupervised. But you will also read about reinforced supervised learning or weak supervised learning and deep learning’s neural networks.

DO NOT PANIC!

What journalists need to know is that there are different subsets of those subdomains of AI. They have different goals and they even intertwine within projects and challenges.

Machine Learning is the predominant and has general purpose algorithms which are also used in the more specific AI subdomains (e.g language, speech, vision). Expert systems are an older version of AI but it has been used as lead generator.

In this graphic, one of many included in the Artificial Intelligence Index 2018 Annual Report , you can see how Machine Learning has led AI research papers over the last 15 years.

Scopus is a citation database of peer-reviewed publications

MOMENTUM

Why is AI exploding? What’s happening?

The most basic answer is: data + technology

Data: Large amounts of data are needed to train computer systems and models using AI techniques. Governments and corporations are producing those required amounts. Journalists can work with a large volume of data obtained through leaks, scraping, Freedom of Information Act requests, open data, etc.

Technology: Data storage capacity and computing processing power have increased too. The challenge is money. The financial cost can become a burden, depending on the quantity of data being analyzed. News organizations and universities will probably need to collaborate to help some journalism projects happen.

AI AND JOURNALISM

So, why should we care?

1) To fulfill our journalistic mission.

2) To take advantage of a great opportunity for newsrooms.

Any journalist with a strong mission for public service should understand the challenges AI presents to society.

It’s already happening.

Decision-making algorithms being run on government and corporate projects are producing unfair results.

Is your city using facial recognition tools to pursue criminals? What are they doing with the data? How many false positives do they have? How are they preserving privacy rights? Is it a worthwhile balance of security versus privacy? What happens if an abusive government uses the technology for the oppression of dissidents and control of its own followers?

On the other hand, in an era of shrinking newsrooms and competition with digital-only players, a long-term strategy using AI solutions for the different phases of the news cycle sounds like a must-have.

I’ve been collecting inspiring examples for journalists and have shared below some relating to different subsets of narrow AI.

Machine Learning (general purpose algorithms)

Supervised (implies labelled training data): Atlanta Journal Constitution’s “Doctors & Sex Abuse” story “…collected more than 100,000 disciplinary documents. To assist us in identifying those involving sexual misconduct, we then created a computer program based on “machine learning” to analyze each case and, based on keywords, [gave] each a probability rating that it was related to a case of physician sexual misconduct. We then read all the documents in over 6,000 cases to determine the nature of each case and board action. …” (Methodology)

Unsupervised: The Associated Press’s “Guns in school-accidents” story “used unsupervised machine learning to find hidden patterns in a data set of 140,000 human-entered incidents documented by the Gun Violence Archive (GVA). After discovering a host of errors in the initial data, the AP used unsupervised machine learning to simplify the data and flag certain entries for further review without specific guidance” (The Future of Augmented Journalism Report p. 10. )

Natural Language Processing (NLP)

The most comprehensive guide I’ve found during these last months is Jonathan Stray’s “What Do Journalists Do With Documents? Field Notes for Natural Language Processing Researchers” paper, in which he states different examples that use custom NLP.

Topic modeling, Reuters’ “The Echo Chamber” story is about a small group of lawyers and its outsized influence on the U.S Supreme Court. (Methodology J. Stray, 4.1)

Sentiment analysis. The Washington Post’s “Whistleblowers say USAID’s IG removed critical details from public reports.” (Methodology J. Stray, 4.2)

Speech

Speech to text. In Folha’s “A whole presidential campaign categorized” project, “…We also captured and stored TV spots and videos posted on Youtube, using Google’s Speech-to-text API to [transcribe] the audio content of these videos to text. By the end of the first round campaign, we had [transcribed] more than 95 hours of videos. …”

I’ve found more examples of different AI subdomains and subsets. You can check them on my Pinboard, using tag combinations with “journalism.”

What am I excited about? Satellite imagery, the environment, and human rights.

I’m interested in the opportunity to combine satellite imagery and AI tools.

Take a look at some examples.

Environmental projects

Stanford has projects to identify potentially polluting animal farms in North Carolina and map every solar panel in the United States.

Standord.edu

This year’s WiDS Conference datathon winners worked to detect oil palm plantations, using satellite imagery.

Texty “Leprosy of the Land” project “ …used a deep learning model to search satellite images of 70,000 km² in northern Ukraine for traces of illegal amber mining. (More methodology via QZ’s AI Studio)

And Reuters used satellite imagery and AI that provided a first pass. Then their team manually edited the initial results to ensure there weren’t any false positives or missed buildings to track expansion in the South China Sea.

Human Rights

Amnesty International’s project, quantifying destruction of villages in Darfur, used crowdsourcing and transfer learning to automatically analyze satellite imagery on a country-wide scale in Sudan.

AP’s “Seafood from Slaves” story where The AP used satellite imagery to secure high-resolution images of sea vessels in Southeast Asia. Reporters gathered critical evidence for an investigative project on abuses in the seafood industry that won a Pulitzer Prize for Public Service in 2016. (Detailed behind the scenes, in The Future of Augmented Journalism Report, Vision p. 14)

Human Rights Watch wishes to apply a neural network to scale an expert eye and tell the difference between smoke plumes and puffy white clouds, as when monitoring the outbreak of ethnic violence in Myanmar in 2017.

This is the end of my journey at Stanford. Now I’m returning to Argentina and LA NACION, where I plan to increase my understanding in this field, work on related projects and exchange experiences within my extended data community.

Keep in touch! Twitter @fcoel or fcoelho [at] stanford [dot] edu

BIBLIOGRAPHY

To learn more about opportunities and tools for newsgathering, production, and distribution, these have been my favorite sources.

AP’s report: A guide for newsrooms in the age of smart machines by F. Marconi and A. Siegman

Making Artificial Intelligence Work for Investigative Journalism by Jonathan Stray

What do journalists do with documents? by Jonathan Stray (NLP)

Knight Center for the Americas MOOC News Algorithms: The Impact of Automation and AI on Journalism with Nicholas Diakopoulos.

Computational Journalism Symposium 2019 (videos and papers)

QZ AI studioColumbia Journalism Review by Nicholas Diakopoulos

Weapons of Math Destruction by Cathy O’Neil (algorithm bias)

Artificial Unintelligence by Meredith Broussard (expert systems)

Artificial Intelligence for Journalists by Daniel Kirsch y Julius Tröger (explanation of different machine learning algorithms). It’s in german but you can use Google Translate.

--

--

Florencia Coelho
JSK Class of 2019

JSK Stanford Fellow. Class of 2019. LA NACION (Argentina). #neverstoplearning