End of year thoughts
Hi friends 👋
This is a an effort to offer an end of the year post based on my notes for a talk I gave in September at the amazing Hacks Hackers Bs As Media Party.
Mozilla was kind enough to send me to Argentina to talk with several hundred of the finest hacks and hackers in South America about our recent work at Meedan — specifically our work with Hacks/Hackers to create the Credibility Coalition.
Since this talk we have established the W3C Credibility Community Group as the gathering point for the technical work on the indicators developed through the Credibility Coalition. For those who were there this remix only bears a glancing resemblance to the _actual_ talk.
I am going to talk a lot about context. So might as well begin with a small piece of my context. I grew up in Minnesota, but my grandmother spent most of her life in Ames, Iowa. She said grace before every meal, but before every grace she gave a more fundamental offering, the disclaimer. She would perform a self-assessment of various subtle failings we should expect in the food. For this graduate of the first home economics class at Columbia University, this ritual was as important as the grace. For her grandson, the religion of the disclaimer stuck in a way the Christianity did not, alas.
Before I get into the central disclaimer of my talk — that we must approach the work of developing third party credibility indicators for the web knowing that the truth is often impossibly complex — a mashup of socially mediated semantics run through the intentions and interpretations of authors and listeners, placed into contested and complex historical, cultural, and linguistic frames and evolving with each node jump between sharers, likers, and remixers — I want to offer a direct disclaimer to the audience.
This is the disclaimer every speaker should offer to her audience at the outset of a talk. It makes explicit the pure asymmetry between speaker and the spoken to and the inherent debt that is accrued by a speaker in the form of the time invested by the listener and the agency afforded the speaker; the implicit demand that you will follow my ideas down the path that I chose to walk, and the cultural expectation that you don’t leave your seat or start a conversation with your neighbor. Cultural norms aside, every speaker should reflect on the fact that listening represents an attention investment decision. Especially at the current moment in the history of attention. The opportunity cost of listening to me is measured in relief against watching the Rolling Stones at Glastonbury or the second episode of the first season of Chef’s Table or just Snapping VR rabbits to your friends.
So, I thank you deeply for inviting me to your beautiful country and investing some of your attention in my meandering thoughts.
Another big-time midwestern value Grandma Barnard taught me was saving and reusing. So, since I have invested so much of your time into this initial disclaimer, I am going to scrape the leftovers into a pan, add some cream of mushroom soup, and repurpose this disclaimer as a metaphor (or what we in Minnesota call a hotdish) for our current conundrum: our current information disorder represents a bad exchange of value between us and the Internet.
While you are on the hook for the next 20 minutes or so with me, with 3.9 billion internet users and an average of 2 hours per day (just on social media according to Statista), humanity is spending about 325 million person days on the social Internet every 24 hours (for scale, that is about 9 Great Pyramids per day). 2017 is one long, surreal argument that the Internet owes us, the audience, aka half the inhabitants of this warming and warring planet, a massive and growing debt.
If you consider Facebook engagement as a proxy for distribution and influence, and if you understand the power of political endorsements, and if you accept the research showing that we are less inclined to question the veracity of claims in social settings, then the societal implications of Craig Silverman’s reporting, distilled in the graph above, should be deeply troubling. Putting a sharper point on the argument, if you are in the voter-behaviors-should-ground-in-valid-information camp, then the top performing/ most engaged with/ most influential of the top 20 election news stories from Craig’s piece should give you pause.
Game changer, indeed.
So, with this setting, the question is: Where should we as journalists and technologist be spending our precious energies at a moment which so thoroughly challenges our beliefs in the functionality of the open web and the effectiveness of journalism alike?
I would assert that a productive starting point is to acknowledge that we have large scale infrastructure failure.
Working in open source collaborative verification and fact-checking technologies, it would be quite predictable and easy for me to assert that Meedan’s work is the answer to this question. Indeed the typical and expected behavior of someone given the architectural and audio-visual advantages I currently enjoy, would be to invoke my role as social technology spokesperson and pitch you on the open source collaborative verification product called Check that we have been working on for five years. I would regale you with stories of why, working in Cairo from 2007 on open source translation and journalism tools, our experiences during the revolutions led us to begin work on media verification building a tool that would allow newsrooms or media collectives to collaboratively create a change log — a timestamped, multi-media citation composed of the links, notes, images, and videos they use to verify or debunk a link, claim, or source.
Yes, I might spend the next fifteen minutes showing you the crowdsourced Russian Military Vehicle geo-location work Bellingcat did using our tools, I might show you the power of the largest single day collaborative journalism project ever undertaken, Electionland, which had 1000 journalists and 300 media partners around the US monitoring the integrity of the US elections on election day, or the French Election follow-up, CrossCheck, that tracked rumors in the lead up to the Election. I would talk to you about First Draft and the role that organization we helped start is playing in driving important research like the Fake News Handbook and innovating new workflows around these projects. Or the Pop-Up Newsroom which is an initiative we have started with Dig Deeper to bring some of the practices our team uses in daily software usability and design to developing one-off collaborative journalism events with partners like First Draft, AJ+, and Animal Politico.
But, as you may have guessed, I do not intend to do this.
However, I will trust that all of you who are working on media verification and fact-checking and will attend my workshop tomorrow or stop by and talk at product fair. And ffs it is the open internet we are working on; those of you who don’t stop by will be savvy netizens and if Check does a better job of meeting your workflow needs then you will come to us without my showing a lot of large screenshots of our app on the stage here today. And if we’re not meeting your use cases and you want to run your project on open source technology built by ethical and humble people from four continents working across 11 time zones, then I am asking you to please come to us with your use cases and workflows, let us know what you need to make your work succeed and we will make you the heroes on our user-driven roadmap for Check. That’s end of the infomercial and the end of the talk I will not give to you today.
Now back to the misinformation.
I’m here to tell you that the current casual apocalypse we inhabit will not be addressed by building a single tool, or reinventing election monitoring, in mobilizing a global resistance against all things inauthentic, hateful, and shallow, or even in taking down the personal data extraction industrial complex — though, by all means let’s not stop working on all of these things.
Nope, we have to aim deeper in the stack.
Maybe we need to look at the web as it was first designed to understand where we went wrong.
This man, the late Doug Engelbart, was an early advisor to Meedan. I sat with him, and (in a hugely unbalanced exchange of value) listened to the person who envisioned hypertext (nod to Ted Nelson), video conferencing, and the mouse (no nod to Steve Jobs), describe the moment he gazed down a long brightly lit hallway and had the epiphany that the only work he would ever do was work in service of enabling human beings to more effectively collaborate. He imagined the hyperlink and the network would provide the path to a profound evolution in humanity.
During early days of the blog friendly internet there was a moment when it all felt deeply functional. Knowledge was networked through blogrolls, we built and controlled our own feeds with RSS, FOAF gave us control over our social graph.
But it didn’t take long for the advertisers, spammers, and clickbaiters to devalue the hyperlink and for a handful of platforms — as Hossein Derkashan (@h0d3r) has very eloquently described — to remove, obfuscate, and redirect hyperlinks in the interest of keeping our attention bouncing off their garden walls like some epically dysfunctional pong game of human attention.
The hyperlink asked for our trust, it said, ‘know that this underlined blue text has more to tell you.’ On the surface, the hyperlink is the path to deep meaning; it makes explicit the promise of the footnote, of grounding, deepening, contextualizing. In theory, anyway.
The problem is that this brilliant architecture for meaning making ran into the business model of the web. In its current form this model is fundamentally an attention real estate play; if you can convert attention on a 336x280 pixel banner ad to a click that serves a page occupied by 10 such banner ads you have converted 1 to 10. Do this several billion times a day and you win the Internet. The promise of networked knowledge turned into the ultimate attention direction game whose mathematics and incentives reflect the brilliant overlay of ponzi and pyramid schemes. And, we have always known how to gather attention —humans are wired to notice and investigate the incongruent; man bites dog gets more clicks than dog bites man. Fake news doesn’t just win because it is easier to produce but because it has at its disposal all implausibles. Yes, truth is sometimes stranger than fiction, but fiction has an unlimited set of possible stranges, and, unbound from service to a referent, invented news can be micro-targeted to a range of narratives.
The history of journalism can be understood as a continual, daily battle between reporting what is most newsworthy and reporting what is most likely to be read. In the previous era of journalism, when the daily product of work was bundled and sold as a single unit, the compromise was struck above the fold. There is still pressure to sensationalize the headline, but in post-home page world, every story is above the fold and, so — you are not going to believe what this means next — every headline is attention teasing and click-inducing.
From the earliest days of civilization we have had laws that prevent harm done through mis-representation. Its pretty a fundamental idea. The Code of Hammurabi had the notion four thousand years ago that there were legal consequences to mis-representing your ability to build a house. It is not contentious to assert that an Internet that enables false and dangerous information to influence real world decisions/behaviors is doing real harm to society. But, any effort to regulate what can be said, to filter or demote or otherwise ‘weed out’ the bad stuff amounts to censorship. Our efforts ought to impact the fabric of the web. We don’t need a code, we need code.
Context is, almost always, the antidote to mis and disinformation. We can in fact define mis-information as information for which supporting context is not available or wrongly described and/or for which refuting/invalidating context is available. Or, apologies, as information which is deployed into a setting where the supporting context does not pertain. Context is wide and deep. It can be human generated it can be machine generated. It changes over time through a dizzying mix and blend of evolving and colliding social, political, cultural, and linguistic frames with a complexity that renders climate science quaint. Like quantum observation, it is impacted by the gaze of the viewer — aka if I make an assertion about the context of an event it inherits the credentials I bring to that event (credentials that might be totally irrelevant to a spelling correction and that might be hyper-relevant to my feedback on blog post, ‘Everything you need to know about Ed Bice.’)
A community of people (full disclosure: most of these people are our friends) have been working to add third party context to claims, images, and sources in journalism for many years now. The global community of online fact-checkers and open source investigators (IFCN, Snopes, Politifact, Chequeado, Storyful, Full Fact, Bellingcat, etc) has been refining third party verification and fact-checking workflows. The challenge is the very simple fact that third party contributions that impact the content of a webpage are only successfully woven into the information fabric of a webpage in the rare case of a willing and responsive author/publisher who edits an article based on offered corrections shared through comments or outreach to a journalist/editor/author. In other words, we know how to address dis- and misinformation, we just need a better way to scale the impact of these efforts.
Standardizing the means of creating and signaling the context that supports or refutes links, claims, and sources on the web might enable us to address these issues at a structural level. So, here is what I have crossed hemispheres to talk to you about, the Knight Foundation Prototype Fund has seeded a Meedan and Hacks/Hackers effort called the Credibility Coalition to develop, test, refine, and standardize third party credibility signals.
Before I describe the work this talented, diverse, and creative group has begun to develop these standards, I want to go off on a bit of a tangent. Apologies in advance but nothing truly great is going to come from a geeky project unless you — the hacking hackers of the world — buy into a set of values that underlie the project. So, I am going to share a bit with you about what I think ‘the truth’ and what role I think the platforms should playing in a better version of the web.
Let me address what many of you might be thinking right now: If there is anything worse than an web of AI generated, bot-distributed, clickbait it is a sanitized and censored Web of Truth.
I totally agree — I hate the idea of a single rating that cascades through the machinery of the web and cleans out every non-compliant opinion. In fact, we decided to take on this project precisely because we have spent the last fifteen years working on tools that amplify and champion contextuality. We even built a platform to curate, translate, annotate, and display side by side, in two languages, event-aligned, but usually quite divergent narratives from Arabic and English language news sources and commentary. Of course, the information revealed in the artifacts of an event — the images, documents, and videos — will ground a somewhat common agreement on the who, when, and where, but the fact is that even when the details of an event are well documented, important aspects of the whys and hows — will always be deeply contextual and so will vary with contexts defined by language, location, culture, and ideology — to dismiss some of these as less truthful is, often, to deny the richness of humanity or wide/deep quality of truth. To say, we have drunk deeply from the well of contextuality and bring this to the work of helping to wire the internet for credibility. Our goal is to create a standard that will allow many teams, journalists, or fact-checkers to ground and support divergent, disagreeing assessments of articles, images, sources, and claims. The standards we are working on recognize the wide range of third party signals — from free text human generated assertions, to AI generated assessments of publishing patterns — and the need to accept multiple credibility annotations to the same
Let me also quickly address the rational assessment that this project is way too ambitious.
First, yes it is, and….we need to be too ambitious. This, as my friend Claire Wardle states, is an information war.
Second, I think there four factors that are contributing to a perfect storm (I flew into the remnants of Irma yesterday, so bring on the climate change metaphors) which might make this the right moment in to put a project like ours in motion.
- The web annotation standard was approved earlier this year (Our effort is fundamentally leveraging the scaffolding of this standard and designing a vocabulary of structured credibility indicators on top of that. The Hypothesis team members that helped design and implement that standard are core contributors to our effort.);
- The W3C has a Working Group on Verifiable Claims. If they succeed, or when they succeed, you will be able to deliver cryptographically verifiable identity and credentials via a web standard (personal identity and/or credentials is critical contextual data for many indicators, e.g., who is disputing the claim, do they have a certifiable PhD and 30 years of teaching experience in the field);
3. The platforms are embracing third party flagging/dispute/ related links mechanisms — schema.org ClaimReview and Facebook dispute flags, to name two such — and are motivated to figure out how to scale these efforts; and, most importantly,
4. Many brilliant, highly motivated journos, researchers, and computer scientists are drawn to the difficulty/complexity and social benefit of this challenge. E.g., Che(ck) Guevera ;)
Which is a nice way of bringing it back to the working group.
The rough recipe for what we are trying to do is to:
- Convene a diverse set of stakeholders. We want to work with the platforms on this effort because they are uniquely well placed to link these indicators to content and to make use of open data we hope will be available. Work with standards bodies closely (two members of the IPTC, one W3C member, and schema.org are in the working group) and explore the potential to bring our work into the W3C;
- Run an open process with that group over the next six months to develop a data model and working schema;
- Generate data and publish research on article markup and UX testing;
- Take lessons from open source software development and our own experiences with translation to enable multiple competing ‘writes’ to the system.
The Credibility Coalition has come together to begin the process of compiling a set of these credibility indicators in a way that enables the development of an initial training data set and preserves teams’ abilities to collaborate in the future. Specifically, we hope to facilitate these efforts by enabling cooperation in (1) credibility evaluation (2) testing evaluations and (3) sharing data.
If there are standards geeks among you, or fact-checkers who want to be early adopters to standards, please speak with me over the next few days.
[these random thoughts were not included in my talk but they were in my meandering notes. think: bonus track]
I am going to offer some concluding remarks about agency and context — some speculative ideas for how I imagine the web we want.
I began this talk speaking from my own context and I’d like to conclude with a return to my context. What is my context? And, what of it can I structure? And what value is my context and who should own that value? And why does this matter? Well, my context right now can be described by a geo-coordinate, maybe even one that reflects that I uniquely oriented to the south and that I am a few feet further above sea level than the rest of you and that we are sharing the bounding box described by this beautiful room across a common time range. Also, there are my credentials, which are not implied in this setting but relevant to your interpretation of my talk — that I was once lucky enough to attend a liberal arts university, and once foolish enough to take a degree in philosophy, and once crazy enough to design and build houses out of straw bales, and once crushed then reborn under the weight of my eldest child’s neurological disorder the logistics of which deposited my family in the Bay Area on September 11, 2001 and strangely led me to sixteen years of work on technology for peace and context. This is my context and I would prefer a web where my context was an asset I might be able to deploy rather than a value extracted by the services I use and sold to those who would like to gain, and perhaps shape, my attention.
These are all potentially actionable credentials that I might invoke to add value to an assertion I am making or a dispute I am raising. Specifically as the founder and longtime CEO of Meedan I am the world’s authority on certain aspects of Meedan. I can counsel any parent in the world on sudden onset Lennox Gastault Syndrome and I know more than I should about quixotic building techniques. I can/should be able to speak/annotate authoritatively on these issues and the web would be a better place if my annotations carried the weight of these credentials in a manner that was machine and human discoverable.
The value embedded in these obvious observations is that we should have better ways to enable signaling from a 20 year climate scientists annotation of a misinformed climate science reporting article.
To me this value is fundamentally about agency on the internet.
This graph shows my imagining of the evolution of distribution of value in an open system (assuming that the long arc of the internet is toward openness). Contra the implied morality in the graph, I think Google and Facebook will be very large, profitable, and important companies 100 years from now. In the near term the same outsized role they have played in the creation of our current dilemma they must play in the solution.
I also believe that in due time these companies will shift their business models from extracting revenues from our attention, that is to say from our personal digital DNA, the paths of our attentions reflected in a range of digital glances, purchases, creations, and explorations — to serving the deeper value of preserving, indexing, and building AIs we can deploy over our personal, encrypted, they-don’t-get-to-have-it-or-sell-it data — to ourselves extract data, meaning, and value from our digital DNA — and to, if we choose, sell, rent, or donate that data as we would our own blood. In my opinion the platforms will need to move from extractive industries to storage, interface, and services industries.
This optimistic vision for the web is, I believe, inevitable because the value of the web is created not by two very large companies and a few dozen ad tech firms and some content farms and some Macedonians and a hundred actual reputable media companies. It is created, rather by several billion unpaid and unwitting likers, hearters, curators, annotators, creators, and distributors. And the work of these billions should not be purely in service of ad tech companies that attempt to manufacture needs we don’t have . Following the work of Doc Searls, Kaliya Young, and many others, we need to invert the tables and imagine a web where our data, content, and annotations are owned, credentialed, and deployed by us and by trusted agents we permission to provide credentials or services on our behalf.
What will enable this next web is not another company, nor another tool, rather, it will come from us changing the broken infrastructure of the web. We think this is going to happen through wiring the web for context. The question is whether we can, with Engelbartian ambition, tweak the very infrastructure of the web to enable us — the collective us — to create our own layer of structured machine and human annotations on top of the web, to improve our ability to show and display context, or openly question the lack thereof. These values are fundamentally openness, transparency, and agency, manifest as a read/write web on which we are able to control and deploy our identity, attention, data, and content. They inform the work we do at Meedan, and they are the primary asset we bring to our work on the Credibility Coalition.
[great bot war of 2020]
Oh, yes, I almost forgot. The Great Bot War of 2020. If we get this work right, bots, which are and will continue to be a great and important part of the web will not be credentialed to annotate, share, like, with the same context as a human being (yes, this is contentious), and so the Great Bot War of 2020 wherein the last remnants of journalism on the web are quashed, will be avoided.
Thanks for you time — let’s create the next, better version of the open web.