Aggregating Signals of Quality in News

Announcing a new effort to help platforms and advertisers support the best in journalism

By Jeff Jarvis, Tow-Knight Center at CUNY’s Craig Newmark Graduate School of Journalism, & Marc Goldberg and Jesse Kranzler, Trust Metrics

We are undertaking a project to aggregate signals of quality about news sites from many sources. The goal is to make those signals more useful and to help platforms, ad networks, ad agencies, and others to make better, more informed decisions about ranking and ad purchases — which we hope will help drive both promotion and financial support to quality news and away from disinformation, misinformation, and junk.

The problems: With so much attention being paid to trying to limit the spread of disinformation, not enough attention and resource is being devoted to supporting quality in news. Platforms and advertisers can bring attention and revenue to quality news but they need help deciding what quality news is. There are many good and independent efforts to create signals of quality, but we have heard from technology and ad companies that it is difficult for them to make use of data from so many sources. We saw a need to map the work of these independent providers and aggregate the signals they generate so that technology and ad companies can make better use of them. That is what we are doing. We are not creating a whitelist or a one-size-fits-all quality score but instead are trying to help companies make better use of the signals that exist to make better judgments themselves.

Here is an explanation of what we’re doing and why we’re doing it so we can get feedback from the ecosystems involved: social media, search, and other platforms; advertising networks, agencies, and brands; and news and media.

This project is being managed at the Tow-Knight Center for Entrepreneurial Journalism at CUNY’s Craig Newmark Graduate School of Journalism. The work has received funding from the Facebook Journalism Project and will seek funding from others. The work is assured independence and is not in the service of Facebook or any single company. The fruits of our efforts will be provided to platforms, ad companies, and others that can make use of it.

Again, we are not compiling a white/black list of news sources but are instead providing a wide range of signals of quality — or a lack thereof — so that the users of these signals can combine them with their own data to make more informed decisions based on their needs. We are also not starting yet another trust/quality/credibility project to compete with the many good efforts that already exist. To the contrary, we saw the need to aggregate the signals from all those efforts, making them more impactful by putting all this data in a form that will make it easier for technology and ad companies to ingest and use it. The more these signals are used by platforms and advertisers, the more benefit that can come to news organizations (through audience and revenue), and the more news organizations are motivated to ascribe to standards of quality being formulated by such efforts as the Trust Project and another from Reporters Without Borders. That is the virtuous circle we hope to help enable.

Our process began at a meeting in San Francisco in March, convened by the World Economic Forum and underwritten by Craig Newmark, which brought together many of the independent efforts to rate quality in news — including the Trust Project, Reporters Without Borders, Internews, the Credibility Coalition, Storyful, Open Brand Safety, the W3C, and others — with representatives from the major technology companies, the ad industry, the academe, and government to take stock of the overall effort to combat disinformation and support quality, determining what is needed.

The meeting identified five ongoing needs, summarized here:

  • Work together on a common data model to facilitate sharing of signals.
  • Continue convening the coalition of initiatives working on the problem of trust and credibility in news.
  • Encourage platforms and others to share data with researchers to inform this work.
  • Support networks to share alerts about targeted campaigns of disinformation.
  • Investigate opportunities to use blockchain to enable this work.

Our project will continue work on especially the first two items: bringing together the various initiatives so we can enable effective sharing of data about quality and credibility.

To do much of the work of aggregating signals, we are hiring Trust Metrics, a well-established company that already rates more than a million sites and apps against thousands of features for advertisers. Trust Metrics has the means and mechanisms — human and algorithmic — to deal with this task of rating and aggregating data about thousands of news sites. Importantly, Trust Metrics also deals not just at the low end of the quality spectrum — warning advertisers away from bad sites — but also the high end of that spectrum, driving ad dollars there. We will also be getting help managing the project from the Credibility Coalition.

It is, of course, important for platforms to recognize, prevent, and compensate for manipulation by bad actors, withdrawing support from such content and stemming its spread. When it comes to such content, in our opinion, neutrality is no longer an option and every major player in these ecosystems is forced to make judgments about sources, because some small but impactful number of those sources is attempting to manipulate technology and ad companies and ultimately the public conversation.

It is vital that we in journalism help platforms and ad companies identify and support quality news and media. Much attention and resource are being devoted — and thousands of people are being hired — to find, deal with, compensate for, and if all goes well prevent the products of the low end of the quality spectrum: disinformation, manipulation, lies, bigotry, hate. These are difficult tasks that get more difficult when spread across diverse cultures from India to Sri Lanka, Brazil to the Philippines. That investment will only increase for the foreseeable future. But as a society, we should also take a step back and examine our allocation of resources today. Just as thousands are being hired to grapple with the low end of the quality spectrum, the population of journalists continues to shrink. More resources and more support must be given to quality news. That is why, in our first phase, we are directing more of our attention to the higher end of the spectrum.

To start, we have identified these categories of signals, a list we are sure will evolve as we discuss it with those who will use this data and others:

  • Observed and algorithmic: As a backbone of this project, we will have access to relevant Trust Metrics signals the company already gathers with human raters aided by algorithmic prediction and tracking. These will be combined with signals from similar work by other entities. See examples of these criteria in the latter portion of this paper.
  • Self-reported: The Trust Project worked with scores of news organizations — including The Washington Post,the Globe and Mail, La Repubblica, SkyNews — to determine eight indicators of journalistic quality and set standards these organizations will follow. Reporters Without Borders is also endeavoring to set standards for news organizations with its Journalism Trust Initiative. These self-reported signals will require some auditing of compliance against the standards. We hope organizations such as NewsGuard may be able to help provide some of that verification in the course of their efforts to rank sites and there are other proposals afoot to providing auditing.
  • Endorsement: If a news organization has passed muster to join an industry or professional body with entrance standards, that can provide a signal of quality. Similarly, winning certain prizes is a positive signal.
  • Public opinion: Public opinion — like TV ratings and newsstand circulation — has limited value and indeed can lead to pandering to a market. But in a larger set of signals such as we are gathering, the public’s opinion must have a place. In this regard, we admire the work done by the Reuters Institute for the Study of Journalism in its annual report, which surveys the public across political viewpoints regarding their trust of specific news organizations.
  • Diversity: The greatest danger with particularly endorsements and public opinion is that diverse, specialized, and new sources of news will be disadvantaged in favor of large, well-known, legacy, and mainstream sources. We must seek out ways to prevent this. We plan to gather data about the particular and diverse communities various sites serve (e.g., in New York, that the Amsterdam News serves African-American communities and The Forward serves Jewish communities). Trust Metrics has a methodology of viewing sites against various “lenses,” which can include communities of various definitions. We will also seek out means for new sources to be brought to our attention for review. Diversity flags will include capturing information about sites that serve specific ethnicities, religions, localities, political viewpoints (across a wide spectrum, describing both a site and its audience), sexual orientations, social groups, demographics, interests, and professions.
  • Behavioral: If a site consistently fails fact-checks — in spite of other measures — that is certainly a valuable signal that should be accounted for. We want to work with the many good fact-check efforts underway to gather data about their findings at a site level. The ClaimReview process and schema adopted by fact checkers around the world is a model for collaboration.
  • Alerts of disinformation campaigns: Similarly, if a site regularly spreads disinformation as well as conspiracy theories and incitement that has been identified not only by fact-checkers but by initiatives like Data & Society’s manipulation campaign, that should be accounted for. As we all know, some sites bury such disinformation amid a larger volume of apparently legitimate content.

So we come to the Infowars test. As we said, we will not be ranking sites or putting them in white or black lists. But to be sure, if Infowars — with its aggressive disregard for facts and its chronic spread of incitement, hate, and disinformation — looks as if it could pass for quality in our aggregation of signals about news and journalism, then we will have failed. The public discussion about Infowars was a process of society negotiating its standards, norms, and limits and a broadly held view gathered around the conclusion that the site’s behavior was unacceptable. Much discussion is needed to set the rules around a decision like that and to identify the signals that can identify similar sites with similar characteristics and behavior. We plan to convene discussions around these questions.

Our first task will be to analyze the methodology, standards, and output of the many initiatives that are generating signals and standards of quality in news. Then we will look at the signals Trust Metrics already uses and should use in this project. Finally, we can see what is missing and look at how those new signals can be created.

That work begins with the American market — including, according to our schedule, Spanish-language media. In the second year, if our work proves to be useful, we will expand to other languages and countries. Here we will need to seek advice from organizations that have experience in these markets.

For perspective, this project tackles only a very small slice of a much bigger problem around credibility, trust, and quality that is being addressed in many ways by journalists, academics, technology companies, publishers, government, civil society, and citizens themselves.

The problem is hardly new. The Nature of the Book, Adrian Johns’ book about the early days of printing and movable type, recounts many difficulties for credibility and trust caused by practices of piracy, inaccuracy, forgery, and impersonation. “The arena of public opinion, if it were to operate successfully, required printed materials to be reliable,” Johns writes. “Such reliability depended on a stable régime of printing and bookselling…. In this circularity originated some of the most profound problems of the public sphere in practice, as its participants wrestled with the lack of any firm foundation for credit, authorship, and knowledge.” That sounds familiar today as many sectors — technology companies, advertising companies, and publishers — grapple with how to provide signals of authority and authenticity to the public and now also to the platforms and algorithms that serve them. Here we are undertaking a small bit of that work, providing information that platforms and advertisers can use.

That is a brief overview of what we plan to do and why. Now here are more details on how we will undertake these tasks.


BACKGROUND AND METHODOLOGY

Why Trust Metrics

Trust Metrics was founded on the principles of print publishing, and the idea that high quality publishers were best suited to host the highest quality ads. Beginning about a decade ago, programmatic buying was threatening to devalue the hard work that many legacy publishers put into their editorial, so Trust Metrics sought to reward good content online by informing brand advertisers where to find it and defining exactly what made it superior. Trust Metrics is now bringing its skills and data to rating news sites against a set of quality metrics.

Trust Metrics achieves this work through a crawler-based technology. Crawlers visit the sites it scores and pull the source code from around 100 pages per site. Machine learning algorithms evaluate the freshness of the content, page design, depth and breadth of editorial, keywords, and hundreds of other unique features to differentiate sites along 14 model outputs measuring quality, safety, UGC, language, context, and overall quality. Each machine-generated output is validated by a team of human QA whose scores feed back into the algorithms as training data. These ratings provide the much-needed insight for advertisers who don’t have the resources to evaluate and monitor tens of thousands of domains on their own.

Over time, the issues that have plagued digital advertising have varied. The initial concerns all revolved around brand safety: early on that meant porn, violence, profanity, and other unsavory content that came with it and later so-called “fake news.” During all these challenges, the Trust Metrics quality score and its basis in traditional advertising principles has remained an efficient solution. The bad guys have yet to figure out how to fake quality, safe content.

Now Trust Metrics is working with Tow-Knight on analyzing and collecting quality signals. Acting as an aggregator for all third-party signals and presenting that information in an easily digestible form will allow platforms and other end-users to make informed decisions about the content they allow, promote, and help support.

We plan to collaborate with a number of other organizations as well, including at the start the Credibility Coalition, W3C, the Trust Project, Reporters Without Borders, and more.

Process

To gather observed signals (see above), Trust Metrics will crawl each domain across a sample of pages to collect features that feed into machine-learning algorithms to determine quality and safety metrics. Relevant features will be analyzed and outputs will be generated regarding overall site quality, ad environment, safety (hate speech, violence, profanity, pornography, and an overall safety measurement), user-generated content, and context (site category and a number of news and politics-specific subcategories). Each output will be verified and validated by a team of human QA and fed back into the system as additional training data. Future crawls of the same domain will look for contradicting model outputs for further review by the QA team.

As described above, Trust Metrics will be aggregating signals from a growing number of third parties around site traffic, fact-checking, public opinion, awards, press association memberships, and so on. These third-party signals will also serve as feedback to Trust Metrics’ algorithms to help determine how other signals should be weighted.

Some good sites will contain negative signals and vice versa. This is why it is crucial to understand the context in which each signal is generated and ultimately judged Look at profanity as an example. For example, evaluating profanity seems easy. But there are nuanced questions:

  • Does the profanity occur in the editorial content or user-generated content? If it is the latter, profanity would be more of an indication of poor UGC moderation rather than an indicator of site profanity.
  • Do certain words need to carry more weight than other words?
  • A site geared towards an audience of younger adults might have a higher tolerance for what others might regard as profanity given its target demographic (e.g. Vice), but the same density of certain language might not be acceptable on a site geared towards a different audience.
  • Profanity occurring in a quote or in the context of objective news is not an issue.

On the converse, just because a site contains potential positive signals does not mean the site is acceptable. Indeed, some disinformation sites pack pages with innocuous content to hide misinformation buried within. There are certain features of a site that can make it a non-starter. These include (but are not limited to) the promotion and proliferation of conspiracy theories, overwhelming failure on fact-checks, hate speech, harassment, or bullying. So, while a site like Infowars may appear to be “higher quality” than a no-name amateur blog due to its professional layout, frequent updates, staffing, and consistent editorial tone, the cons will outweigh the pros.

These are just a snapshot of some of the issues that we will face in this project when making sense of observed and third-party signals and presenting them in a digestible, usable format. For this exercise, we’ll look at one example signal for each of the seven signal categories.

Observed and algorithmic

This group of signals will be both machine-generated and human-verified as observed by Trust Metrics. These signals will focus largely on site quality from a traditional publishing standpoint, analyzing the depth and breadth of content, user experience, and content relevancy in addition to various safety and contextual concerns. Looking at a site’s frequency of updates as compared to other sites in the category and the internet as a whole is often a good indicator of a site’s hygiene. When a site is updated sporadically, it can often indicate that the site isn’t frequently maintained or run by a professional editorial staff. Particularly as it relates to news, where content is expected to be posted as stories break, posting infrequently decreases the site’s relevance for an audience looking for breaking news. These thresholds differ based on the type of news, since we’d expect much more updates from a major market news site than a small rural town’s paper, and we wouldn’t expect long-form informational analysis like Foreign Affairs (foreignaffairs.com) to post new content every minute.

Self-reported

A number of organizations, including the Trust Project, Reporters Without Borders, and NewsGuard, have established self-reported criteria to determine journalistic quality. Using the Trust Project as an example, sites ascribe to eight Trust Indicators, including proper citations, author expertise, publication funding, among others, to determine whether or not the site is deserving of its Trust Mark. Note that these self-reported indicators will need to be audited for compliance (which leads to a later and much larger discussion about the market value of auditing both media sites and signal providers). We hope to encourage more and more sites to join efforts such as Trust Project. We should also note that some sites may not and that in itself is not necessarily a negative signal.

Endorsement

A publication being well-respected among its peers is a good indicator of quality. Journalistic awards including the Pulitzer Prize, the Edward R. Murrow Award, and the Peabody Award help highlight the wheat from the chaff and reward quality publications. Membership in a news association with its own standards can also be a positive signal. As with the self-reported signals, it’s important to note that, while a publication receiving such prestigious award is a positive indicator, the lack of journalistic awards is not a negative indicator.

Public opinion

As mentioned above, we admire the work that the Reuters Institute for the Study of Journalism has done around public opinion. We are particularly interested in the notion of public opinions of “trustworthiness” and how it relates to audiences with various political standpoints. The Reuters Institute samples groups based on political leaning and by country, which could be helpful as we expand our work internationally.

Diversity

As we pointed out with the public opinion signals, many of the signals we are analyzing can be modified by diversity signals. In the case of religious or ethnic diversity signals, identifying that a publication is intended for an African-American audience might impact the weighting of particular signals around the presence of what in other places would be racially-charged keywords. Similarly, there are different expectations for different markets. For example, small local news sites often have higher ad density than more professional or major market news sites. It’s important in this case to compare apples to apples and understand how the site’s ad clutter stacks up relative to other sites serving similar markets rather than strictly comparing it on an absolute level.

Behavioral

We hope to work with a variety of fact-checkers, including Politifact, Poynter, and Duke to gather behavioral signals for this project. Using the ClaimReview schema, fact-checkers work at the level of the fact, quote, and story while we work at the level of the publication. We envision aggregating all of the fact-checks performed on a story level into a larger site-level bucket and creating an overall percentage of passed fact-checks for each publication. While each publication is bound to make mistakes, having a high percentage of passed fact-checks will be an important signal in evaluating quality.

Alerts of disinformation campaigns

Certain signals across this project will be considered non-starters, or those that will automatically consider a site to be of poor quality despite other factors of the site. These signals will include, but not be limited to, knowingly spreading disinformation, promoting conspiracy theories, and creating false news. These signals will not only come from fact-checkers, but also through Data & Society’s and First Draft’s antimanipulation campaigns. We can also trace known disinformation through various social channels and flag various publications that share the same stories.

Other thoughts

Many of these signals bring caveats or exceptions. This is why it is important to incorporate a large breadth of signals to weed out signals that might be misleading for a particular site. Human review will help us identify which combinations of signals are meaningful and which signals can be ignored on a case by case basis. Below are a few examples of signals that can have a variable impact on an overall site impression depending on context.

  • Recent updating of a site is generally a good indicator of quality. However, it’s important to note that time stamps can be faked, so that signal can’t hold too much weight on its own.
  • On their own, the presence of lots of content recommendation ads and sponsored links is not a negative feature. However, when these links appear all over a low-quality site, this signal is more meaningful.
  • Having a “Contact Us” or “About Us” page is often a sign of professional journalism. However, we see lots of examples with no content on these pages or just standard copy and pasted lorem ipsum from other domains.

Conclusion

We are just beginning this work. We need input from the ecosystem — social media, search, advertising, media companies as well as government, academics, and NGOs — to define needs and solutions. Please begin by commenting here.