Modern machine translation is not immune from ridicule. Last month on The Jimmy Fallon Show, singer Miley Cyrus covered popular songs that had been translated from English into Croatian and back with Google Translate.
Ed Sheeran’s Shape of You put through the engine resulted in choice lines like: “But my organ drops right out, I like that cadaver” (originally: “Although my heart is falling too, I’m in love with your body”).
Song lyrics though – often nonsensical and lacking context to begin with – are an especially tricky test for machine translation, which has seen some major improvements in recent years.
The latest neural machine translation techniques have resulted in accuracy scores close to those achieved by human translators. Google, a leader in the field, says it has halved the number of errors its translation engines make in the last year alone.
But even if machine translation can surpass humans in quality, there is still some way to go before the systems can produce perfect output.
To reach that goal machine translators will need more than an extensive knowledge of language, says Google’s director of research Peter Norvig, they’ll also “need to understand the world”.
Machine translation has been widely available since 1997 with AltaVista’s Babelfish (named after the creature in Douglas Adams’ Hitchhiker’s Guide to the Galaxy, which users place in their ear to “instantly understand anything said to you in any form of language”).
Major providers like Babelfish, as well as Google Translate and Microsoft Translation, for many years used a method called Statistical Machine Translation (SMT). In Google’s case a form called Phrase-Based Machine Translation.
The technique works by digesting a huge index of content that has already been translated by humans. The machine uses statistical analysis to discover patterns and with this 'learn' a language.
As its name suggests, Phrase-Based Machine Translation works with blocks of word sequences although ones far short of whole sentences.
In more recent years, machine translation engines have begun used artificial neural networks, giving rise to Neural Machine Translation (NMT) which instead considers the entire input sentence as a unit for translation.
“Unfortunately, NMT systems are known to be computationally expensive both in training and in translation inference,” wrote Google’s Yonghui et al in their 2016 research paper. “These issues have hindered NMT's use in practical deployments and services, where both accuracy and speed are essential.”
Providers are now beginning to overcome the practical hurdles of this method, chiefly the huge computational resources and time it requires.
Google launched its ‘Google Neural Machine Translation (GNMT) system’ last September which utilises “state-of-the-art training techniques to achieve the largest improvements to date for machine translation quality”, the company wrote in an announcement.
Using bilingual-human-rated, side-by-side comparisons of SMT and GNMT translations of sampled sentences from Wikipedia and news websites, GNMT has reduced translation errors by more than 55 per cent to 85 per cent in several major languages.
In November, Microsoft too adopted NMT, providing “major advances in translation quality” over SMT technology, the company said.
Amazon Web Services is also expected to make machine translation services available this November – according to a CNBC report – building on its AI push, and likely using NMT. Facebook’s Mark Zuckerburg in May outlined his company’s machine translation research using convolutional neural networks, which work in a slightly different way. “With a new neural network, our AI research team was able to translate more accurately between languages, while also being nine times faster than current methods,” he wrote on his Facebook page.
There are still significant limitations. Microsoft said use of neural networks was “only a first step towards future improvements”. Google agreed, saying of their advances: “Machine translation is by no means solved”.