A closer look at Google’s plan to spotlight ‘original reporting’
Google’s VP of News explains the wider implications of the new search algorithm.
Last week, Google said it had made changes to its algorithm to highlight ‘original reporting’ and have it remain longer at the top of search results. GEN spoke with Richard Gingras, Google’s VP of News, to understand how the algorithm will work and what the role of humans will be in the process. More importantly, we spoke about the issues at stake and the potential unintended consequences — how does Google rank and rate news outlets? Could Google’s criteria for ‘reputable sources’ favor big news outlets and thus exacerbate inequalities among publishers?
GEN: News organisations have been asking for better transparency behind Google Search rankings for some time. Why did it take so long to change the search algorithm and prioritise ‘original reporting’? Why did you decide to make this change now?
Richard Gingras: Understanding and acknowledging original reporting has always, and continues to be, a top priority at Google. What has changed in my ten years of working on news at Google is that we now have more technical sophistication than we did then. We learn. We discuss with journalists. We evolve new models. We experiment and test. It builds on all the changes that Google Search has done in the past and Search is always a work-in-progress.
We have the good fortune of teams filled with world class engineers and product managers from around the world. They are passionate in their work and operate against defined objectives, principles and ethics. That is crucial given that our ranking judgements must be sound and defensible. Not unlike your newsrooms, we work against honed principles and ethics.
It’s also important to recognize that this is not a ‘problem’ that someone can simply fix — that this is not about a missing ‘if, then’ statement or a bug that needs quashing. Our efforts are an ongoing progression of our understanding of a news story and its evolution over time, and to then use that output to properly serve our users at massive scale in near real-time — local, national, global.
The first step in addressing original reporting is to understand the changes in an evolving cluster of stories and understand the nature of those deltas — additional fact-based coverage? Analysis? Commentary? New media assets? Etc etc. What are the ‘signals’ that indicate an original report? Or a component of original reporting? Does it have some or many quotes? Are the quotes unique or redundant with other stories? Is there further ‘factual’ information and/or relevant analysis? Are there attributions to other parties?
Some of these signals can generate opposing values. Is the attribution an acknowledgement or endorsement of another news source or is it a protection against possible reporting errors by the 3rd party source? Is redundancy (‘22 dead’, ‘crash kills 22’) lack of originality or pseudo verification? One thing I’ve learned from my work at Google is that every signal that seems like gold can just as easily be fool’s gold. It’s very tricky stuff. A while back I asked one of journalism’s great editors how we might identify original reporting. He responded with the tiniest smirk, ‘It’s the stuff we spend a ton of money on!’ Of course that’s neither a detectable signal nor a guarantee of quality. In fact, being able to describe what original reporting is is tricky. It means different things to different editors and newsrooms at different times. People say you know it when you see it, but how do you translate that to an algorithm?
I go into this detail in an effort to close the gap in the mutual understanding between the journalism community and those of us managing and evolving algorithmic systems, and to narrow the delta in understanding the definition of success. An editor’s lens is about getting the great work of his or her team fully amplified. Ours is to provide the best and most helpful set of results to our users. Those objectives might be proximate but they are not fully overlapping.
An editor’s lens is about getting the great work of his or her team fully amplified. Ours is to provide the best and most helpful set of results to our users. Those objectives might be proximate but they are not fully overlapping.
It’s one thing to recognise original reporting, it’s another ‘to ensure it stays there longer’. How can Google algorithms ensure this in light of the endless flow of new stories?
In instances where a particularly seminal piece of original reporting is identified, we then need to determine how best to rank and present it. Context matters depending on how people access information. It’s very different in queryless feed environments (Google News, Discover, etc) versus timely search queries (Top Stories), versus untimely search queries (organic results that are more long-standing). Notifications are a different surface as well. Each dictates different and complex variations of freshness and authoritativeness. How long the canonical stays in a prominent position will vary with the time and progression of the story. Is the core subject matter of the story persistent or is it drifting with the coverage of others? Is the story being attributed because its reporting is being endorsed or because you are questioning it?
For example, in Search (‘Top Stories’) the way we ensure visibility and discoverability for identified instances of original reporting is by explicitly countering the ‘freshness’ factor for these results (giving them more shelf life) and by explicitly increasing the visibility of their placement on the page (beyond what they would have normally received based on standard ranking considerations).
As always, we will experiment and evaluate various approaches. But please note, while our recent announcement is another step forward we are not declaring mission accomplished. Given the constant evolution of the ecosystem, we always need to improve and evolve our systems over time — as we have been doing from the first days of Google Search.
How long the canonical stays in a prominent position will vary with the time and progression of the story.
In your blogpost, you point out that ‘there is no absolute definition of original reporting, nor is there an absolute standard for establishing how original a given article is.’ How will the new algorithm handle the ambiguity and what are Google’s ‘must-meet’ criteria for original reporting?
As we always do, we’ll explore what we think might be helpful signals, test them, and evaluate the results. We have had many discussions with editors and journalists to solicit their input and will continue to do so. We are very cautious about sharing information about specific signals in order to prevent gaming and because the system keeps involving and improving. Beyond our own analysis, the effectiveness of our efforts will be judged by others, as they should be.
The guidelines for raters about how to recognise a reputable source highlight such criteria as ‘journalistic awards’ (in particular, the Pulitzer Awards). Do you think Google’s new algorithm might inadvertently favor big, well-established outlets and thus exacerbate inequalities among publishers? How would you mitigate this risk?
Journalism awards are a criteria but only one of many. They’ve been part of our rater guidelines for a long time and should not be interpreted as a definitive signal to our systems. The recent updates to the guidelines specifically adds original reporting as another characteristic of quality journalism independently of any award. Also, the raters are NOT rating stories in real-time nor are they rank ordering. I used the phrase ‘fair and equitable’. We believe passionately in the broadening of access that the Internet has enabled. We believe passionately about being fair and equitable to small publishers and larger, legacy publishers and digital natives, local as well as national. I welcome ongoing assessment of our work.
To emphasise what I noted earlier, this is not mission accomplished. Our systems will evolve and improve over time. The work that reporters and publishers do is of great importance and we are committed to helping our users get access to quality journalism in a way that is helpful to them — giving them a deeper understanding of a story or an issue to help them understand their communities and the conversations going on around them. My preferred definition of journalism is to give citizens the tools and information they need to be good citizens. Our objective is to connect users to the tools and information they need to be good citizens.
Google employs about 10,000 third-party humans called ‘search quality raters’, who will provide feedback on the new algorithm that will be used to improve it further. Can you tell us about the selection process and training of these raters? How do you minimise bias in rating?
First, let’s be very clear about what the search quality raters do and don’t do. Raters do not directly impact ranking. Raters are used to provide a general human assessment of whether our ranking systems are providing great results. And feedback from raters is also used in our machine learning systems as labels for example results.
In a rating task, raters assess how well results fulfill what someone was searching for and evaluate the quality of results based on the expertise, authoritativeness and trustworthiness of the content. Ratings are not used directly in our search systems to give particular pages or sites any type of “rating” or “score.” Instead, ratings help us understand how well our systems satisfy searches overall and as examples to learn from.
As for the raters themselves, these are regular people who typically work from home. They are spread throughout the world and across nearly all US states (currently 48 of 50). They are recruited and employed by vendor companies we work with. We also use multiple vendors at the same time.
To be hired, raters must pass a vendor-administered test that shows the raters have a deep understanding of our 167 page rater guidelines. Vendors also regularly evaluate the raters to ensure they maintain an understanding of our guidelines and are working to them.
Our rater guidelines provide a common standard that all raters must apply. The guidelines are our definition of the goals of Search; they’re the product specification, if you will. And, the rater guidelines are public for anyone to read, which provides transparency.
To be hired, raters must pass a vendor-administered test that shows the raters have a deep understanding of our 167 page rater guidelines.
How does Google practice transparency around rating? Can news organisations know what their rating is and can they challenge it?
We practice transparency in three ways: 1. We convey the policies and principles that guide our algorithmic work (the Rater Guidelines); 2. We explain our methodologies as thoroughly as practicable within the bounds of security and risks of manipulation. 3. Our results are all there for people to evaluate. “We show our work every day.” And we also work with academic researchers in support of their analyses.
Google intentionally develops systems to eliminate the bias of any individual on our algorithmic results. No individual makes decisions as to who goes where. We create systems to protect from that. That’s why we have the rater program. That’s why we have our internal Honest Results Policy which prevents those of us who are involved in the work of Search and News from engaging with individual third parties on their ranking. We strive to maintain systems that are fair and equitable in presenting our users with the quality information they need.
Google intentionally develops systems to eliminate the bias of any individual on our algorithmic results. No individual makes decisions as to who goes where.
If Google takes on the responsibility of rating of news organisations, what will be the role of other initiatives, such as the Trust Project and the Journalism Trust Initiative? Why don’t you integrate press associations in the process?
Google’s signal collection and data modeling serves the purpose of powering Google Search results, not of being a way to inform users of journalistic characteristics of publications. That task is better handled by industry efforts.
Again, we use various methods and hundreds of signals in doing our work. I was a key participant in the creation of the Trust Project and believe strongly in its objective to drive more transparent attributes of the organization and its work. As I’ve discussed with the Trust Project, the Journalism Trust Initiative and others, the results of their efforts can be very helpful to the raters in growing a better understanding of the authors and the organizations behind them.
Google’s signal collection and data modeling serves the purpose of powering Google Search results, not of being a way to inform users of journalistic characteristics of publications.
Will the change in the algorithm apply only to English language reporting or all languages? Would the articles highlighted by the search algorithm be automatically translated?
It’ll work for all languages. Depending on the user’s browsing tools they might be able to have the article automatically translated. Additionally, some publishers are embedding the option to translate articles in their own websites to make their content accessible to a wider audience outside their core language.
Facebook is also in the process of developing a rating system for quality sources (i.e. news organisations). Did you have discussions with other platforms in order to avoid developing different rating systems?
Google operates a search engine on the platform of the open web. We are not a proprietary social network. Our motivations and methods of operation are very different from a social network and we should be viewed and assessed differently as well.
Search engine optimization (SEO) is a way for publishers to engage new audiences, but their content risks not being featured if the publisher decides to set up some kind of paywall. How are you addressing this issue? What are some of the outcomes of your collaboration with publishers to understand the widely varying and changing subscription models?
A few years ago, we collaborated with several publishers to study the interaction between search users and publisher paywalls. As a result of that research we ended our First Click Free system in favor of Flexible Sampling wherein publishers could decide how many sample articles they want to present to search users per month. So for the last few years, regardless of how much free content publishers provide to search users, all news publisher paywalled content is fully indexed in search.
We should also note that there are no ranking distinctions made between paywalled and non-paywalled content. While it is true that when publishers offer no free content at all, users may learn to avoid those results over time, we concluded from our research that it would be best to leave all decisions about free sampling to publishers themselves, rather than trying to create a one-size-fits-all solution for the industry. And certainly some publishers have appreciated the positive impact of this change.
There are no ranking distinctions made between paywalled and non-paywalled content.
What are the next steps for the cooperation between Google and news organisations? Which other future initiatives can you mention?
Our primary objective is to do all we can to enable a strong sustainable ecosystem for journalism. We are collaborating extensively with news organizations around the world and at every dimension of the journalistic model — from facilitating reader revenue with Subscribe with Google, to our experiments in new local news models, to the development of analytical research tools for reporters.
This was why we established the Google News Initiative — to do what we can to drive innovation across three key dimensions of the industry: elevating and strengthening quality journalism, evolving business models to drive sustainable growth and empowering news organizations through technological innovation.
One project, I am really excited about is our Local Experiments Project which involves partnering with news organizations to create digital sites and experiment and innovate with everything from storytelling to business models and operational approaches. We recently announced a partnership with McClatchy which is launching the first site in Youngstown, Ohio where the local paper closed after 150 years. We have no input into the editorial operation but we are interested in sharing what we learn from the business and operational side of the house. As part of this, we are also supporting the development of third-party “newsroom-in-a-box” platforms (such as Automattic’s NewsPack). The Local Experiments Project aims to assemble the playbooks and platforms than can enable success in local news.
This is just one project of many the GNI is pursuing but key to the success of any of our efforts is the collaboration between ourselves and publishers. We have worked together for the last 15+ years and are committed to continuing these efforts long term.
Today Richard Gingras is Vice President, News at Google. In that role he guides Google’s strategy in how it surfaces news on Google search, Google News, and its smart devices. He oversees Google’s efforts to enable a healthy, open ecosystem for quality journalism, which include Accelerated Mobile Pages, Subscribe with Google, the Trust Project and other efforts to provide tools for journalists.
In March 2018, Gingras announced the Google News Initiative, a global effort including $300 million dollars to elevate quality journalism, explore new models for sustainability, and provide technology to stimulate cost-efficiency in newsrooms.
Gingras is a member of the Knight Commission on Trust, Media, and Democracy.
Twitter handle — @richardgingras