Last week, Facebook announced a new machine translation model that translates directly between pairs of 100 languages without relying on English as a middle step. And just this week, Google announced MT5, a new multilingual language model trained on 101 languages. This news is significant because it’s a milestone step towards the long-standing goal of broadly available automated translation between any of the world’s languages. For the AI community, it indicates meaningful progress toward the development of a single common, baseline multilingual machine translation model, which can then be further adapted into any number of specialized models.
Here’s why this news is promising for the machine translation community and the customer service industry as a whole.
2020: A big year for language-based AI
Between Facebook’s M2M-100 translation model, and Google’s MT5, and OpenAI’s GPT-3 language models, there have been some big breakthroughs in AI this year centered around language. Many of these groundbreaking models have been released as “open source”, meaning they are freely available for both research and commercial applications. This is a big deal, since training these massive language models is extremely costly and requires immense computational resources. Very few companies have the resources to develop such models. But now, smaller expert AI companies such as Unbabel have the opportunity to adapt these existing, ”pre-trained” models to our own specific use cases. This has already resulted in major technology advancements with significant impact on our customers. For example, Unbabel has built its own translation quality estimation models for customer service-specific use cases on top of XLM-R and other pre-trained language models, and in turn has contributed our own machine learning research back to the community.
One of my favorite things about machine learning is that most major research in recent years is open-sourced for the greater good of the community. So, an advancement for one company is an advancement for all. Even though many major corporations have become prominent players within AI research, the academic spirit of sharing our learnings carries on.
Advancing toward a common machine translation model
When looking at advancements like M2M-100, it’s important to take into consideration that there’s still quite a lot of room to grow when it comes to translation quality. Facebook reported that translation quality, as measured using BLEU scores, improved after removing English as an intermediate language, sometimes by up to 10 points. That’s a good outcome. However, many of the direct language pairs covered by the M2M-100 model still have automated BLEU scores in the 10s or low 20s, which is indicative of a level of accuracy that is still insufficient for most practical commercial use-cases. For context, an understandable to reasonably good translation would typically score in the 30-40 range and get better as BLEU scores go up.
Beyond BLEU scores, our team at Unbabel is currently working on a new neural framework for measuring the accuracy and quality of machine translations across many different languages. This open-source framework, called COMET, is designed to predict human judgements of machine translation quality, at levels of accuracy unachievable by previous measurement methods. Many of the existing metrics, including BLEU scores, have limited value in accurately measuring translation quality once systems reach high levels of accuracy, as many of the latest state-of-the-art systems, powered by neural AI technology, currently do.
All of this talk of accuracy doesn’t discount the fact that Facebook combined significant, large-scale engineering efforts with new model architectures and training methodologies to create this advancement. It’s nice to see the machine translation community get a step closer to a technological solution for machine translation based on a single multilingual model that supports translation between more than 100 languages. Such a “pre-trained” machine learning model could be the basis of many further-adapted, improved machine translation models for different language-pairs and use-cases. For example, in customer service, a universal, multilingual translation model could be extremely relevant in cases where there is no English-speaking agent.
Humans need to remain in the loop
One thing that I expect to continue into 2021 and beyond is the need for humans in the loop for delivering multilingual machine translation. Humans in the loop are the best way to improve translation quality in enterprise use-cases. In addition, humans provide the critical feedback that allows us to adapt and improve machine learning models like M2M-100, MT5, and others. In high-stakes situations, such as customer service, AI augments human agents and makes their jobs easier and more efficient. Humans, for the foreseeable future, will remain a critical part of the last mile of customer support.