Wikibabel: Equalizing Information Access on a Budget

Olya Irzak
Jun 14, 2018 · 3 min read
Met these students 3 hours outside of Mbeya, Tanzania. One of them can’t stop asking questions. Wikibabel is for her.

Our mission

At Wikibabel, our mission is to equalize information access by bringing the most valuable Wikipedia content to the world’s underserved languages.

Case study: Swahili

In Tanzania, 70% of the population is rural, and have little access to traditional educational materials. There is mobile connectivity in many areas, through which Swahili speakers — some 50–100M humans — -could reach a wealth of information, if only it was in a language they understood. An example of the discrepancy is the Swahili Wikipedia’s Math section containing 117 articles, compared to 31,444 in English; with advances in machine translation, we can close this information gap.

The problem of choice

Our approach leverages Google Translate to make English Wikipedia articles accessible to underserved communities. But with a limited number of Translate credits, which of the 5.5 Million English articles do we translate?

We’d love to estimate demand for information that has insufficient supply. In an ideal world, we’d have the ability to determine which Swahili search engine queries returned no useful results. Lacking that, we used Wikimedia’s list of 10k vital articles as a starting point, and developed prioritization principles.

Prioritization principles

  • Cultural universality. We want to be careful not to accidentally impose our culture and values. Therefore, categories that are highly variable among cultures, such as politics, may not be the best use of our limited $s. Categories such as science, math and technology are fairly independently of culture, so those are in.

So, here’s what we’ve got translated so far (we’re rate limited though, so more to come!).

We’re excited to find out all the ways in which we’re wrong and what information is truly valuable and interesting to our users. And we’re measuring that!

Join Us

Fundamentally, we want to make the information we use and love available to everyone. We can’t think of a better place to put our spare cycles. In the case of Swahili, Google Translate did the hard work of training amazing machine translation for a language with sparse training data; let’s use this to close some important gaps!

Got spare cycles? Catch us at close_the_gap@wikibabel.com