Map of Europe and the Mediterranean from the Catalan Atlas of 1375

“Wiki” — The First Truly Global Language?

The internet makes it possible for the first truly global language to emerge. This could be enabled through very inclusive open standards bodies, crowdsourcing, and flexible delegation (e.g. Liquid Democracy). Radical inclusiveness is important for establishing the legitimacy of any truly global language. Since regional languages contain an inherent bias towards the ethnicities, nations, and religions that used the language first, other adopters of the language are often treated as second class participants whose cultural backgrounds and contributions may be considered less valuable.

Perhaps an open standards body for the first truly global language could be sponsored by the Wikimedia Foundation and organized in ways that are similar to the very successful Internet Engineering Task Force. A tentative name that I like for such a global language is ‘Wiki’.

There are at least two major problems that must be dealt with when developing a global language:

  1. It is difficult to motivate people to learn and use a new language without a critical mass of interest.
  2. A new global language may cause traditional languages to disappear along with related cultures.

A solution to both of these problems may be to create an intermediate language that is easy for both humans and machines to work with. Providing an excellent translation service on the internet could attract enough interest to reach the critical mass required to make the language a success and effective translators may also allow people to maintain their local languages, while being able to easily contribute and access content in other languages.

Existing translation systems require a translator to and from every supported language. In the terminology of complexity theory, the number of translators needed is O(n^2). A functional intermediate language would allow people to build translators to and from the intermediate language only. Transitively, this would allow translation to every language that can be translated to and from the intermediate language. The number of translators needed would be reduced to only O(n). Users of a particular language would be able to focus their effort on maintaining a single translator to and from the intermediate language. Similar to browser plugins, open standards could allow many competing translators to coexist and be installed and activated by the end user.

The open standards body could define things like a script (written and typed), the basic phonetics, a bootstrap vocabulary, metadata formats, and an inflection model. A crowdsourced dictionary could allow anyone to propose new words based on recommendations from the open standards group for how to pronounce, spell, and define new words. Periodically a vote could be held to determine which words will be included in the official dictionary. This should allow translators to be updated in manageable ways at regular intervals.

The language should play the role of connecting more traditional (or “grandparent” cultures) in ways that are nonthreatening, and that promote universal human rights and mutually beneficial cooperation. In time, the users of the language may develop their own approach to the humanities, but I recommend that the language continue to play an important role in enabling grandparent cultures to interface with one another in ways that are ethical by secular standards.

Crowdsourcing and Liquid Democracy could be used to encourage people with many different approaches to language (e.g. linguists, analytical philosophers, poets, novelists, scientists, software engineers, etc.) to help in shaping the language. Perhaps definitions should be semantically grounded using the SI model. Contributors could be encouraged to state their definitions in ways that refer to the set of SI base units and in ways that provide demonstrations of how a word’s meaning impacts each sensory modality. In addition, a web crawler could track every use of a word on the internet in order to provide a set of real world usage examples to ground a word’s meaning in how it’s being used in the real world.

It may be possible to build good translators to and from an engineered intermediate language since, unlike natural languages, we have the ability to design the language precisely to ease translation. There could be an official inflection model that each translator could refer to when building their translators that would include a metadata standard for encoding inflection. The inflection metadata from the original language would travel with the document, whatever language it may be translated into. In this way, an intermediate language could use very little inflection and a relatively small and unambiguous vocabulary without dropping inflection information that could be valuable when performing a translation. Even if all the inflections were removed from a natural language document, when it is presented for reading in an intermediate language, the metadata could still be accessible by translator software when translating from the intermediate language to a natural language. In addition to the inflection model, each language translator could include an idiom guide and the idiom metadata could follow a document — whatever language it may be translated to.

The text should remain readable and understandable through any combination of translations that go through the intermediate language, however, a simplified form of the natural languages may often result. Translation software could make it easy to file problems to bug tracking communities for whatever translator plugin one is using. Teams focusing on translators for their primary language (and specialist applications of their primary language) may produce excellent translators after several development cycles. Perhaps a system could check against a user selectable language corpus (e.g. general news, sports, scientific research) to determine the likelihood of a particular phrase being used in a particular context. Alternative expressions could be presented such that developers could easily select a more probable phrase that maintains the intended meaning.

In addition to making an intermediate language that people enjoy using and a translation system that functions well, it may be desirable to select a script and phonetic style that ease human computer interactions (e.g. OCR, voice command, and brain machine interface).

If such a system is possible and desirable it seems inevitable that something like it will someday be implemented. There are many ethical concerns that need to be addressed. Thinking about how such a development would impact both global and local cultures is critical for ensuring that a new global language develops in an ethical, productive way. Promoting a global language and international culture could inspire hatred and paranoia amongst rightwing reactionaries. Increasing automated systems’ ability to interpret human speech and brain states could enable terrible tyrannies and malicious individuals to harm humanity in new ways. Open standards bodies with radical transparency could work against these threats, along with an explicit goal of accommodating “grandparent” cultures by enabling them to maintain their local languages while communicating more successfully with the rest of the world.

P.S. Something I’ve mentioned in other posts is that a new engineered language could be designed to include insights from recent work in linguistics and analytical philosophy. An engineered language could be designed to make it easier to express the providence of information and the certainty one has in it. For instance, it could make it easier to explicitly express when one is talking about deductive abstractions they know to be true as a consequence of logic, second hand knowledge they feel is true based on an expressed degree of trust, and knowledge of their own direct sensory experiences. I’m not an expert in the field of linguistics or analytical philosophy. An expert’s recommendations may be very different.

Another potential name for the language is “Neo-Sabir” or “New-Sabir” after an old Mediterranean language that included contributions from Spanish, Portuguese, Berber, Turkish, French, Greek and Arabic. The word “Sabir” is derived from a Romance base and means “to know”. This neatly relates to the word “science” which is derived from the Latin word “scientia”, meaning “knowledge”. Interestingly “Sabir” also sounds similar to “cyber”.

In addition to all the benefits I’ve mentioned, I hope a new global language will also help cultures that have been divided by wars to reconnect on equal footing and for the purpose of creating a new global, liberal, egalitarian culture.