How Far Machine Translation Has Come (And Where It’s Going)

juni 8, 2021

In just the past decade, artificial intelligence (AI) has enabled machine translation to make quantum leaps. Today, computers are capable of translating language with greater accuracy and efficiency than ever before. But even more intriguing is how machine translation has evolved to interact with humans and other technologies in symbiosis. This new development has democratized access to products, services, and knowledge so that people can get what they need no matter what language they speak.

In this blog post, we take a look at the journey of machine translation, its current state at the forefront of language technologies, and its potential ability to transform the future. 

Early machine translation frameworks 

In the late 1980s and early 1990s, machine translation made a significant advancement when research progressed from rule-based machine translation (RBMT) into statistical machine translation (t). This method works by analyzing similarities between parallel texts in different languages and noting the patterns that emerge. SMT research was accelerated by the release of more advanced computers with increased processing power. 

Although this technology has been refined through several iterations over the last few decades (word-based SMT, phrase-based SMT, syntax-based SMT), it still has its limitations. For example, in languages where the order of words is flexible, such as Portuguese, SMT engines struggle to produce accurate translations. 

Today, leading machine translation companies have switched gears from SMT to focusing on the vast possibilities that neural machine translation (NMT) has to offer. 

A breakthrough moment for machine translation

In 2014, a succession of Forskning papers flipped the world of machine translation on its head. They were the  first academic papers to propose that neural networks could take machine translation to entirely new heights. Several leading tech companies quickly got to work, including Google. In 2016, they announced the Google Neural Machine Translation system, which leveraged an artificial neural network capable of deep learning to vastly improve the quality of Google Translate.

An NMT system trains itself with data fed by humans in order to progressively learn and consequently improve translation quality. Instead of involving a bunch of individually engineered components, NMT constructs one large network capable of analyzing text and producing translations. Because of its all-encompassing nature, NMT is usually better at recognizing things like syntax and similarities between words compared to RBMT and SMT. 

Several members of our team have been highly involved in AI and machine translation research over the past decade, studying advanced topics in NMT and natural language processing (NLP). For example, when our CTO ‪João Graca was performing his post-doctoral research at the University of Pennsylvania, he developed a new method that allowed for the insertion of descriptive knowledge during machine learning, unlocking previously unsolvable problems. 

Open-source machine translation frameworks

In just the past couple years, cutting-edge machine translation models such as Google’s MT5 and Facebook’s XLM-R and M2M-100 have made waves by providing open source technology for other AI experts to leverage. Organizations can now build upon these “pre-trained” models for their own specific purposes and use cases. For example, Unbabel has built translation quality estimation models on top of XLM-R and have contributed our machine translation research findings back to the AI community.

These breakthroughs in NMT and NLP have enabled and inspired Unbabel to release our own neural frameworks, OpenKiwi (open-source quality estimation) and COMET (Crosslingual Optimized Metric for Evaluation of Translation), for measuring the accuracy and quality of machine translations across many different languages. We believe that an advancement for one company is an advancement for all, which is why we chose to release OpenKiwi and COMET as an open-source framework. 

Machine translation and quality estimation

Machine translation quality is crucial because it can make or break a person’s experience and their outlook on the abilities of AI. When machine translation was in its primitive stages, shaky quality often made people doubt whether it would ever amount to anything useful. We’re glad that curious and determined academics, scientists, and engineers (including several of our own!) have put in the work to get us where we are today. 

It’s our job at Unbabel to help machine translation reach its full potential. Today, one of our primary areas of focus is pushing language translation quality estimation (QE) to be the best it can possibly be. Because of advancements like OpenKiwi and COMET, our AI technology can determine confidence in the accuracy of its own translations. If it thinks any parts need to be double-checked, those words or phrases will be reviewed by our multilingual community of over 100,000 editors. We think human-in-the-loop AI is the key to translation quality: Good AI QE makes life easier for human translators, and their feedback improves machine translation models moving forward. 

The future: Enhancing operations through machine translation

One of the most exciting and powerful applications of AI-powered, human-refined machine translation technology is its ability to help organizations expand internationally. As online spending increases and more companies look to take their products and services into new markets, the potential for globalization is vast. Even amidst the ongoing pandemic, a recently released statement from the International Monetary Fund (IMF) forecasted that the world economy will expand by a record-breaking 6% in 2021. 

To give any organization the ability to serve a global customer base and align a distributed workforce, we’re pioneering a new way to use AI: Language Operations (LangOps). LangOps leverages AI alongside existing tools in a company’s technology stack so that any person can communicate in any language. In the near future, we see more organizations building out LangOps teams. These groups will use machine translation technology to connect customer service, sales, marketing, product, and other teams across the company through language. 

We’re thrilled to be able to play a part in writing the history of machine translation. It’s an incredible time to be involved with this work, and we’re always looking for ways to support and learn from other organizations in this space. At the rate that this technology is progressing, it’s safe to say that the future of machine translation is bright. We can’t wait to see the new ways we’ll be using it together in the coming years.

About the Author

Profile Photo of André Martins
André Martins

André Martins is the VP of AI Research at Unbabel, an Associate Professor at IST, and a researcher at IT. He received a dual-degree PhD (2012) in Language Technologies from Carnegie Mellon University and IST. His PhD thesis received an Honorable Mention in CMU's SCS Dissertation Award and the Portuguese IBM Scientific Prize. His research interests include natural language processing (NLP), ML, structured prediction, and sparse modeling, in particular the use of sparse attention mechanisms to induce interpretability in deep learning systems. He co-founded and co-organizes the Lisbon Machine Learning School (LxMLS 2011--2019). He received a best paper award at ACL 2009 and a best system demonstration paper award at ACL 2019. A. Martins recently won an ERC starting grant for his DeepSPIN project (2018-23), whose goal is to develop new deep learning models and algorithms for structured prediction, for NLP applications.

DeutschFrançaisNLdanskSvenskaEnglish