Artificial Intelligence & Real Journalism
While our brains still matter, I’d like your help in brainstorming the opportunities artificial intelligence, deep learning, et al hold for journalism. I’m asking this of my students and asking for your help in that.
I organize my thinking about the roles AI can play into these buckets:
- For the journalist: gathering, sourcing, and analyzing information.
- For the public: making information relevant and presenting it.
- And for both: a feedback loop that should become not the end of the process of journalism but instead the start — that is, using these new tools to better discern and serve the public’s needs and wants.
My buckets are a variation on Nick Diakopoulos’ useful and succinct roadmap for innovation in computational journalism, in which he identifies four tasks: information gathering, organization and sense-making, communication and presentation, and dissemination and public response.
“It is relevant to discuss how journalism interacts with augmented intelligence of individuals, the collective intelligence of society, and the artificial intelligence of machines,” say Noam Lemelshtrich Latar and David Nordors in a 2009 paper. “Ideally, journalism raises intelligence — empowering the audience — as it improves itself from the higher intelligence of the system surrounding it, i.e., the audience and the machines.”
There is no way that we in journalism can explore the opportunities in AI on our own. Hell, we tie our knickers in knots trying to use a simple CMS and still see “data” as singular. Harvard’s and Twitter’s Ryan Adams identifies four requirements for working in AI, telling MIT Technology Review: “So you can have the ideas. You can have the code. But if you don’t have the data and you don’t have the horsepower, what are you going to do with them?” Right. We in media don’t have data or horsepower or code. But perhaps we can generate the questions — the ideas.
There are many potential partners that can help us. Facebook and Google have the data, code, and horsepower and also want to make friends in media and need the human analysis and value we in journalism can bring. Open AI has backing from the likes of Elon Musk, Reid Hoffman, and Peter Thiel to explore uses of the technology. IBM’s Watson is now led by media savant David Kenny (formerly of Publicis, Akamai, and Weather Channel).
Adams argues that a first task is to demystify AI and see it as just another tool to our end of informing society. “Part of the challenge of this is the need to anthropomorphize the concept of intelligence,” he told MIT. “We use the phrase ‘artificial’ intelligence, as though intelligence isn’t a property of the world. We don’t call airplanes artificial birds, and they don’t have artificial flight. They have actual flight, right?” Right. And journalism supported by thinking machines is not artificial; it is a real opportunity.
Gathering, sourcing, and analyzing information
There are and now will be ever more sets of data available to analyze, looking for patterns and anomalies, pictures of how we live and the correlations and exceptions that can make news.
The tools available include pattern recognition, image recognition, computer vision, clustering, relationship mapping, translation, prediction, hypothesis recognition, modeling and simulation, and others.
Data sets will include the exhaust of sensors in our public environments, in our homes and cars, and in our phones; health data; economic and sales data; interest data from the public’s content — including photos and video — and interactions online and from their behavior with media; and, importantly, government data.
We in journalism must lead the fight for opening access to data, especially government data. We also should at least be at the table when discussing the ethics and standards for both openness and the protection of privacy.
We in journalism will also benefit from and can help with efforts to identify trust, authority, expertise, and originality of sources.
With that landscape, what can you imagine journalism can accomplish with systems that learn? Just a few examples:
- Seeing the miracles Google Photos performs with one person’s pictures — identifying not just dogs but breeds and recognizing one person even back into childhood — I wonder what such learning systems could do (and what they are doing behind the scenes at Instagram) with the huge corpus of images produced every day: shifting sentiment analysis of selfies; diversity of relationships; changing targets of memes.
- I hope journalists — and more particularly scientists — can negotiate the requirements of privacy and HIPAA to identify and investigate correlations in disease and illness with income, location, race, timing, and other factors.
- I’m supportive of efforts to identify trust and authority — including the Trust Project for news content and the Coral Project and Diakopoulos on comments — but believe this will never be a simple, one-dimensional label but instead a complex set of signals that will mean different things to different users. Trust is dependent on subject area, context, and the needs and biases of the judge.
- The Counted, the magnificent, Pulitzer-worthy project from the Guardian that finally tallied every police-caused death in America, opens the door to so many more analyses of crime, punishment, and justice through data against time, place, race, economics, and other factors.
- And on the news, imagine if we had systems at the ready to take the Panama Papers — 2.6TB of data about 140 offshore firms in 11.5m documents, analyzed by 400 journalists from 100 news organizations in 80 countries — and draw connections to power around the world. Oh, what such a connection machine could do. (My first exposure to computer-assisted reporting came in the mid-’70s when I tried to convince my colleague at the Chicago Tribune, investigative reporter Chuck Neubauer, to let me computerize his amazing stack of index cards that helped him connect donors to contracts in that capital of corruption.)
What else?
Relevance and presentation
There is a separate set of opportunities around breaking our one-size-fits-all, one-way, mass-media presumptions in how we chose and present information for people and communities. We can vastly improve the relevance and thus value we deliver. And we can deliver news and information in many new forms.
First, relevance: In 2009, Marissa Mayer challenged my talk of hyperlocal with her notion of a “hyperpersonal news stream” that would filter and prioritize not only our email but also our social feeds, news, and other notifications. I see an opportunity for smart systems to learn well what I need.
This requires a few things: First, the systems have to get much smarter about me as a person so they can serve me as an individual. That means I have to trust the system to get to know me and my behavior and needs — a big task in itself, not just technically and logistically but also ethically. There need to be standards and means for me to share my identity voluntarily in a transaction that I know will bring me relevance and value in return.
Separate conversation: Here is where I see a role for blockchain automated contracts in news: ‘I give you this information under these conditions to get that in return.’ This is how we can launch agents that both search for the news we know we want (‘alert me when — ’) and, better yet, deliver news we didn’t know we wanted because the agent knows (as editors used to) that it’s just plain important or because other people like me in some way are interested in or talking about this news. Someday, there will be effective serendipity agents.
Next, we in news media must get *much* smarter about our own content. Now, we know our content only in terms of our simplistic news/business/sports/life taxonomy and newsroom structure or, at best, by extracting the entities inside. We need smart systems to take even our content and understand its topicality, archetypes, and connections so that we can target it to specific people with much greater nuance and utility. I will call again on the wonderful example Richard Gingras of Google News gives: He followed Anthony Wiener not out of interest in the man’s politics, Congressional district, or all-too-public private parts but out of a hunger for stories of falls from grace. Do we know even our own content well enough to feed such desires? We need systems to learn why some stories are more appealing to some people.
Just as systems can try to investigate authority, expertise, and originality of sources for reporters, so do readers need these systems to help them sift through the ever-greater abundance of content that mass media business models produce. Again, this is not as simple as presuming that a news brand equals authority or that a job title equals expertise. Neither, lord knows, it is as simple as trusting traffic, else TMZ would be covering President Trump.
And then there is the question of quality. Systems are getting ever better at identifying and hiding content that lacks quality — that is, spam. Obviously, quality is not defined as everything else; that’s repetition and overdose and mediocrity. Quality will be much harder to define but aggregators and curators will be able to learn to be more selective in what they serve us. If we’re smart, we in news will help them by giving them honest signals of quality and originality.
Note also that IBM Watson is getting good at writing headlines — that is to say, summarizing what matters.
I organized Geeks Bearing Gifts around new relationships, forms, and business models for news. I argue that news organizations must rebuild themselves around relationships with people as individuals and members of communities so we can deliver greater relevance. I’ve just outlined how I hope AI can help us do that. I believe building such relationships can help build greater value in advertising (more below), membership, commerce, and other business opportunities. So now to forms and how AI et al can help us there.
We already see that learning systems can get better and better at writing news stories automatically, especially if they are built on structured data such as that found in sports and business. Narrative Science was a pioneer in the field. In conveying such data to humans, narrative is just another and sometimes better form of data visualization than tables and charts. Now Narrative Science is selling its skills to companies to dig into their data and reveal knowledge inside through stories that people can grasp.
Natural language processing — which requires learning systems to get better and better — is useful not only in presenting information and narrative to us but also, clearly, in understanding our speech. Of course, this technology has given birth to Siri, Alexa, OK Google, et al.
I see a big future in delivery of news and information via conversation — not in what we will someday soon see as the crude form we are forced into with search engines, but in question-and-answer with bots that can interpret the nuances of our questions, that know what we know and don’t know, that can learn to serve us better with experience. News in this case is less a product and more a storehouse we can dig into at will and need. Quartz’ new app and Purple (from CUNY entrepreneurial grad Rebecca Harris) are first steps on the road to news-as-conversation. Microsoft CEO Satya Nadela calls conversation a platform. I am eager to see what emerges from the rumored launch of Facebook Messenger’s bot store at F8 next week.
Translation will lag behind natural language processing but will be almost as important in unlocking information.
I wonder how else AI could be useful in modeling and presenting information to us. Ideas?
Feedback loops
I believe it is critical to include in this list of opportunities the ability of AI to learn from people’s use of news to give us greater insight into what is valuable to the public and what is worth spending of resources on. The Guardian’s Ophan affords its editors insights into the source of reach and that’s just a start.
It is critical that we not see listening as the end of the process. Listening to the public we serve must be the beginning of the process of journalism. How can learning systems better understand the public’s information needs and help guide our work?
And now a word from our sponsor
I can’t leave this topic without speculating on the incredible importance of AI and learning systems to advertising. They are already in use in programmatic advertising and that not-very-smart retargeting advertising that follows you all around the internet for months after you dare to look at a pair of boots on Amazon. Advertising will support journalism for sometime to come (there just isn’t enough charity or revenue from other streams to do it). The serving of advertising will more and more be dependent on exchanges backed by AI. The value of media’s environments and adjacencies will continue to be commoditized. This is why I argue we must build relationships — so we can build and act on our own first-party data. That data will be the only chip we have to play in this game.
This is just a stone-skipping overview of the landscape and opportunities. My friends at Columbia are doing very good research on data and journalism as are Diakopoulos and others. When encompassing not just reporting but also relevant selection, presentation, listening, and advertising, this is a huge field that requires our attention. The first step is understanding the opportunities. The next will be finding partnerships.
I’m eager to hear your thoughts.