Many big companies like Google, Microsoft, Yahoo, Yandex, eBay and Amazon create and train general purpose Machine Translation (MT) systems, which use billions upon billions of datapoints (like the entire World Wide Web) in order to help make sense of online content in another language.
If you’re a keen observer of the MT world, you may have recently read that they all recently upgraded to “Neural MT.” However, General MT ≠ Business MT.
The quality has definitely improved, however, a general purpose MT system is like a fish out of water when used in a different domain to where it was trained on (typically news articles, parliament proceedings, etc). Plug one of them into email and chat conversations with international customers, or business-critical information like product descriptions and things start to sound funny pretty quickly.
Formal and informal tones become mixed, entity names become mistranslated, brand terms that should stay the same become confused, and numerous other errors are automatically churned out that make the content not fit for purpose in an enterprise environment.
There’s no doubt that neural network technologies are improving MT considerably (most notably in terms of fluency), but there’s still a huge gulf between these technologies and the quality that is expected by multinational businesses today.
Unbabel’s Domain Adapted MT Performance
In order to meet their demanding standards, we first accept the limitations of Machine Translation, and make it fit our workflow of MT + human editors. There is evidence that better MT leads to less post-editing and hence faster and better translation quality in the end.
We recently conducted a set of experiments comparing Unbabel’s domain-adapted Machine Translation to general MT systems (with and without neural MT) across 5 popular language pairs (English to Spanish, French, Portuguese, Italian, and German).
Unbabel’s domain-adapted MT consistently achieves higher scores, sometimes rather substantially, confirming our hypothesis that training our machines on customer data is highly valuable. This isn’t news to the scientific community, but it may well be to many in the business world.
We also omit our Glossary Set-up features, where we calibrate our systems on a per-customer basis to ensure that style guides, brand terminology and other metadata are taken account of. For example, Pinterest does not want most mentions of the word “Pin” to be translated as “Alfiler” in Spanish.
That said, it’s worth reinforcing the following: this is just the start of delivering what our customers need. At Unbabel we truly believe that you can only “solve” translation by merging artificial intelligence with human effort.
With the machines’ job nearly done, the next step is to distribute these outputs to intelligently-selected batches of our 45,000 mobile linguists, who then post-edit the content to the human quality our customers expect. We’ll cover that in a separate post.
Unbabel’s Head of Research André Martins PhD, led the experiments with the help of Maria Braga and Catarina Cruz Silva.
- We did some basic steps to avoid common pitfalls, making sure that no sentence pair in this dataset overlaps with the training set for our MT system
- Our comparison slightly favors Google’s system, since for these experiments the reference translations were obtained by human post-editing of Google Translate (hence the asterisk in the plots)
- We don’t report Google Neural MT for Italian, as it’s currently not supported in their premium API.