21,000 Miles of Translating Social Media

ed bice
Words About Words
Published in
10 min readJul 1, 2015

--

On Translating Digital Stories for Out of Eden Walk, Paul Salopek’s Slow Journey Around the World

The International Multi-lingual User Group (IMUG) of Silicon Valley is a venerable community that has been gathering to discuss and share various GILTy interests since 1987. Joe Katz and Iris Oriss invited me to come to Facebook to talk about the social media translation work we are doing now with the National Geographic-sponsored Out of Eden Walk.

This post is an effort to put into writing some of that talk.

Facebook 6.18.15 — Son of Ping and Pong room Building 15 IMUG talk

Photo courtesy @I18Guy

Greetings— thanks everyone for braving the traffic and making it out tonight.

Before I discuss Paul Salopek’s incredible journey I want to map the journey of this talk. An undergraduate degree in philosophy is a license to lose one’s way, to chase ideas around corners and, as Wittgenstein said, to encounter with your forehead the sturdy wall understanding finds at the limits of language (paraphrase my own).

This map, then, is my effort to justify — in the way that assigning a location to a series of words and drawing a line through them asserts a degree of order — the arc of this talk from a walk around the world, language preservation, social media translation, the Bridge app, and slow journalism.

In the spirit of Paul’s work, tonight’s talk will be an exercise in storytelling and a traverse across a metaphorical 21,000 miles. This journey reflects a migration from the origins of Meedan through one of the truly inspired projects of modern journalism to the yet-to-be-realized passage of open linguistic data repositories for all languages, and to the envisioned journey’s end, where our tools help to create a more cross-lingual internet. We look to a world where understanding — and the global conversation that transports it — is a bit wider, deeper, and more effective.

So, let us start with the project that guides our work. Paul Salopek is very quick to disarm efforts to center the walk on his story, and this is what makes him the perfect journalist to take on what is likely the first effort of a single human to follow all humans on the 50,000 year and 21,000 mile migratory path which spread our forbears from the Rift Valley to Tierra del Fuego.

“My approach has always been immersive. I don’t try to compete with big guys like The New York Times who go in, cover stories very thoroughly and then leave. I stay. I seek out quiet points of the world where there is no news. Usually, that means something is happening there but no one is interested yet.

I’ve always recorded current events by going to places, getting out of vehicles and using my body as the main tool of collecting information. It’s an anthropological way of doing journalism.”

- Paul Salopek

At this point you might be wondering where this beautiful project intersects with translation and the multilingual web.

Cristina Calderón is an 87 year old woman who is the last full-blooded speaker of a language called Yaghán. The week before he started his journey Paul traveled to the visit the end point of his seven year journey, the southern edge of South America, Tierra Del Fuego and visited with Christina. Here is how Paul described the meeting:

“It was a symbolic, pre-walk pilgrimage to the finish line. I had heard about Christina through my research. A lot of what I’m writing about is the reclamation of memory, so I wanted to meet her.

More than 5,000 languages are in peril. I want to carry this woman’s words with me metaphorically across the world as a small light.

I recorded her words and I hope she’s still there when I make it to that part of the world. You have to start a circle somewhere to close it. She was the start.”

Which brings us to Meedan’s role in Paul’s walk around the world: language.

There are estimated to be somewhere in the neighborhood of 7,100 languages in the world, while Facebook is currently localized into about ‘70-plus’. It is held that a human language is dying every 14 days. At this point in history, the Internet is both the most remarkable hope for language preservation and a driver to linguistic homogenization. Current statistics support the view that our nascent global web does not reflect the linguistic diversity of our planet: according to Internet World Stats, in 2014 only 16% of internet users access the internet in a non-top ten language.

However, projecting the data forward we see a shift happening in the linguistic diversification of the Internet. As the next billion internet users come on board, based on a simple projection of growth over the past several years, we anticipate a doubling of the non-top 10 language community on the web. To the point that this community will nearly equal the English plus — Internet leading — Mandarin speakers web users.

Our Senior Advisor Steven Bird is working on field collection approaches to language preservation and it is our hope that a significant piece of our work over the next five years will help bring language preservation tools into the wider web in service of creating open data repositories for emerging language communities on the Internet. Steven agrees that the emergence of web-connected phones offers real hope for scaling the linguistic preservation work needed to save thousands of spoken human languages. This grand challenge sets the stage for our work until 2020 with Paul Salopek.

But, the observant audience member may be asking, what does this have to do with social media and slow journalism?

To make that connection we have to travel to the Egyptian town of Oxyrhyncus, a city in the desert south of Cairo where the dry climate kept intact scraps of papyrus buried nearly 2000 years ago.

The discovery of these papyrus missives sent between friends and siblings hardly compares with the grand and glimmering archeological finds of Ramses and Tutankhamun, but these informal notes give us a glimpse of the real concerns and aspirations of common people living thousands of years ago. They also serve — ironically — as a guide for our work to stitch together something of the meaning of a place through the digital ephemera we are gathering for the Out of Eden Walk.

When Paul and I first spoke about his project he explained to me his philosophy of Slow Journalism — this spirit and philosophy goes through our work.

“To be slow is an insult. An epithet. A flaw. If you are slow, you get scooped. You are left behind. You become irrelevant. You fail.

This adoration of speed has only intensified as the Web takes the lead in disseminating news. Thousands of micro-headlines bloom every day, every hour, across the globe. Watching breaking-news headlines churn online is like watching plankton surfacing and sinking ceaselessly, endlessly and in the end incomprehensibly, on a vast yet shallow digital sea. This is why I’m walking across the world”

I have to thank Ethan Zuckerman from the MIT Media Lab. He is an advisor to our Bridge project and also to Out of Eden Walk. In conversation with Ethan, Paul, Meedan colleague An Xiao Mina, and other members of the Meedan and Out of Eden Walk teams, we came to the idea that we ought to design a project using our nascent social media translation platform to create this 21,000 mile long experiment in digital ethnography.

Our somewhat ambitious idea was to complement Paul’s ‘dispatches’ made at 100-mile waypoints by pulling a dataset of geo-bounded social media at each of these time/place points and then curating and translating a representative sampling of the digital conversations and observations that happen to coincide with Paul’s travels.

About 16 months ago, we took on contract work to build a tool to process social media translations, and that contract allowed us to build our initial prototype. Bridge is the name of this social media translation platform. In a classic example of the buried lede, I will mention that today, June 26, we are formally beginning our work with National Geographic (on a sub-award from the Knight Foundation), to carry on two years of development and translation support on Bridge for Out of Eden Walk.

The essential workflow involves three distinct components and a few wonderful partners. We solve the non-trivial task of pulling time and geo-bounded queries of social media in partnership with the Dolly Project — The University of Kentucky and the Oxford Internet Institute have jointly pursued this project and have a gold mine of social media research data they are sharing with our team.

Next, we work with team members and volunteers to take these initial sets — which may number as many as 10,000 objects — and begin the work of parsing and curating to narrow down to 25–100 tweets/posts for translation by the amazing team at Translators Without Borders.

In terms of the product of this work, I will borrow from my colleague An’s telling of it —these translations “range from poignant reflections on life to prosaic observations about kid brothers, from deeply personal matters to regional and global events. Captured just a few days before and after each of Paul’s stops, they are simply fragments of the many musings, thoughts, conversations, and commentaries posted online. Yet each brief, shared journey brings its own range of surprises and insights — a stunning cross-section of life, like overheard conversations on a crowded subway car.”

In a single milestone we have an adoring fan of Justin Bieber saying — “ “1 universe, 8 planets, 204 countries, 809 Islands, 7 days, 7 billion people… and I still love only one Canadian,” alongside the words of Ali al Hajji, a 27-year-old Syrian farmer living at the Milestone, near Ghor al Safi, Jordan. Al Hajji said, “We want to go back to Hamā. It is on our minds all the time. We dream this. We don’t know when we can. It is the war.”

We are in the very early days of social media translation. The handful of big companies that generate global social media content are — we trust (ahem) — working to address the demands of global conversations that don’t always fit into a single, tidy language community. And we know they will address this because untapped language communities represent new trading markets for content, ideas, knowledge, and lolcats.

However, until these networks can prepare the (non-trivial) plumbing needed to easily move UGC between the zillion or so connections that are truly required to create seamless communications between ~450 written languages, Bridge enables users to move these digital papyri between language communities.

What does this look like? I will close the talk with a description of the current state of the Bridge application.

The basic logic is a backend that enables translators to choose a target language (for Out of Eden Walk, the target — for now — is English), submit a translation, attach an annotation (optional), and publish for review. The output from this work must be portable, so we have designed the system to generate ‘embeds’ from this work.

In order to serve this content into the National Geographic site we generate a single large embed from each milestone (number 25–75 translations — yes, we have a lazy loading and caching strategy) and serve it onto a designated page for each milestone on the Out of Eden Walk site.

A view of the OOEW site with the Bridge embed

In the current version of the application (v 0.7 for those keeping score) we are able to serve up pages built from a series of Bridge channels — each one forming a column in the embed. This allows us to spin up translations of breaking news events, individuals and topics of interest for the global community.

Where are we going with Bridge?

Well, we are building out a tool that will enable communities to move content between languages. Along the way we will encourage our users to open source any linguistic data they can, opening research on the hundreds of languages that are entirely neglected on the Internet circa 2015. This, we believe, is the most productive work we can do to address the manifold and apparent ills of a world that is becoming more tightly connected, in functional/formal and social/informal terms, each day.

We are hoping to be able to achieve this very large vision by taking on services work for companies who want to test new markets for their content and build new communities to help bring this content to new readers. We have a seven year plan and a hope that we can burst the hype that surrounds the tech startup community with the same ethos, rhythm, and vision that Paul and his team bring to their walk.

If you want to join our global conspiracy to strengthen global journalism and translation please drop us a line — hello@meedan.com.

-eab

--

--

ed bice
Words About Words

working every day to make the web a bit wider and more worldly with colleagues @meedan