A brief and untold history of machine translation

The early days of machine translation were marked by the successes of code-breaking during the Second World War and the Georgetown IBM experiment of the mid-50’s. This was the first live demo of what a machine translation could do and which garnered a lot of recognition from the public. The experiment in itself was very basic, translating 49 sentences from Russian into English. Even so, it opened up the field of exploration for a broader approach and stimulated international funding.

The research published during this era was contrastive and adopted statistical methods, which appeared blundersome and very perfectionistic. Scientists were focused on solving technicalities around basic hardware such as magnetic media and paper tapes, more than focussing on designing a functional machine translation system.

In 1956 during a Dartmouth Conference, the term artificial intelligence was first used to refer to machines taking on translation tasks. The first experiments (Perceptron and the LMS algorithm) with neural networks started in the late 60’s as a result of trying to filter noise from telephone lines. Both approaches cemented the notion that machines could be able to learn and achieve near human intelligence.

Here is a short video on the matter illustrating the mindset of the time:

Optimism and AI winter

Scientists in the 50s & 60s eager to prove that if machines were able to beat humans at chess, they were only brinks away from understanding the core of human behaviour and translate this into programming a new generation of AI. Both governments and the industry started investing a lot in machines, in what turned out to be an overenthusiastic attempt to fund as many machine learning projects as possible.

The media craze and level of enthusiasm of the time is perfectly distilled in the views of Herbert Frank - the founding figure in AI:

“there are now in the world machines that think, that learn, and that create. Moreover, their ability to do these things is going to increase rapidly until in a visible future the range of problems they can handle will be coextensive with the range to which the human mind has been applied.”
via http://gph.is/1sFWb5E

The 70’s rolled in and put a stop to this extreme optimism. Huge amounts of cash and man power went into several attempts of translating Russian to English, which led scientists to the conclusion that MT is an unsuccessful endeavour. Thus, the Automatic Language Processing Advisory Committee (ALPAC) was forced to report that: “we do not have useful machine translation [and] there is no immediate or predictable prospect of useful machine translation.”

What followed was the first AI winter, where nearly all neural and AI developments were killed off and researchers were denied funding. This affected AI funding on a global scale and what followed was a very AI sceptical phase. The UK cut back its AI program and concluded that machines were useless, the US DARPA program came to a halt and even the Soviet union got immensely bored with its attempts.

The only success in the field was registered in Canada through the syntactic transfer system built for English-French translation. The project entitled Traduction Automatique de l’Université de Montréal (TAUM) had the following achievements: it created a computational metalanguage for the manipulation of linguistic trees called Q-system formalism and it laid the foundation of what later came to be the Prolog programming language used in NLP. And it also did this nifty thing where it translated the Météo weather forecasts which has been operating ever since.

In come the 80’s and many new operational systems appeared, the commercial market for MT systems of all kinds expanded, and MT research diversified in many directions.

The US picked up AI funding because of some new AI tech, which was known as expert systems. These systems showed some remarkable results with automating human expertise. What did these systems do exactly? They were symbolic reasonings that allowed machines to explain chains of human reasoning through a simple rule based systems: “ If (X) then (Y)”. Only Problem? Expert systems proved super difficult to apply to new areas.

Via Giphy.com

During the 1980s AI activity increased exponentially in Japan with the new generation of computers which were able to leap over the competition due to faster software and hardware. This enabled the Japanese to create new translation software which relied on translation through intermediary linguistic representation involving morphological, syntactic, and semantic analysis.

Meanwhile in the US, MT entered its second winter due to machines failing to prove profitability on the long run. Another reason for failure of expert systems was that it proved to be really good at doing niche stuff, but couldn’t be practically applied to much else. During these two AI winters, the number of researchers declined, AI conferences became kind of a bummer and work continued at a tiny scale.

The only noteworthy exception was Systran, which came to be the most successful system as of yet. It started out as a direct translation system, developed by Petr Toma. Its oldest version used to stranlate Russian to English for the USAF Foreign Tech Division in Dayton, Ohio. After the success in Ohio, versions were developed for English to French, French to English, English Italian and then for all of the European Communities and lastly the EU.

The original design was vastly modified and adapted to current needs, having increased modularity, greater compatibility of the analysis and synthesis components of different versions, permitting cost reductions when developing new language pairs. A number of companies and governmental institutions use their own modified versions of it, amongst which you can count the NATO, General Motors, Xerox and many others. Xerox modified its application to virtually eliminate post editing through controlling the input language of tech manuals for translation.

Third generation AI

Via Giphy.com

During the 1990s, research led to a third generation of AI translation: corpus-based architectures, such as statistical and example based designs. These approaches were designed to break source text down into segments and to subsequently compare the bits to a bilingual corpus, using statistical evidence and distortion probabilities to choose the most appropriate translation. The example based approach imitates combinations of examples of pre-translated data in its database. For this approach to be successful, the database must contain close matches to the source text. This approach forms the basis of translation memory tools.

As the 1990s progress different types of MT systems were developed to cover the growing demand of translation needs. Among these were translation tools for professional translators, PC systems and solutions for occasional use, large MT solutions for corporations, first translation software “apps” for monolinguals and word speech translators. As the demand for MT systems grew, the research broadened which in turn opened up a whole new range of translation needs and solutions.

In part two of the history of machine translation we are going to be looking at the 90s and recent years in terms of machine development. So stay tuned for more and show us some ❤❤❤.

About Beluga

Beluga helps fast-moving companies to translate their digital contents. With more than a decade of experience, professional linguists in all major markets and the latest translation technology at use, Beluga is a stable partner of many of the most thriving enterprises in the technology sector. The business goal: To help fast-growing companies offer their international audiences an excellent and engaging user experience.