The missing kiwi of machine translation: open sourced quality estimation

March 21, 2019

It was 1954, and the future was clear: human translators would be obsolete in a few years’ time.

At least that’s what the researchers at IBM proudly declared at the first public demonstration of their machine translation system.

Now we know how far from the truth that statement was and continues to be. But even early in the history of machine translation, during the post-war years, it wasn’t all unbridled optimism.

Indeed, in 1947, American scientist and MT pioneer Warren Weaver said:

One naturally wonders if the problem of translation could conceivably be treated as a problem in cryptography. When I look at an article in Russian, I say: ‘This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode.’

A few years later, Weaver followed up with this: “No reasonable person thinks that a machine translation can ever achieve elegance and style.”

The thing is, translation requires more than a decoder ring. If you’ve ever tried to translate poetry or literature with an MT service, the output might look a lot like you’re decoding secret messages.

And this is why humans play a key role. Our unique knowledge about the world is crucial for translation. We humans understand the context of a conversation, the cultural background, the hidden meanings. Machines don’t yet have that kind of knowledge. As our CEO, Vasco Pedro, puts it: “Machine translation systems are trained to read parallel sentences, which is a bit like teaching a parrot to talk; the parrot may be able to do it, but they are never going to truly understand what they are saying.”

This explains why the extraordinary developments in MT in recent years have not yet reached the level where people are confident enough to allow it to proceed unchecked by humans. This is where quality estimation (QE) comes into play.

At Unbabel, we have been pushing the state of the art in QE to help deliver fast and accurate translations, at scale, to many of our customers, including companies like, Microsoft, Skyscanner, and Pinterest.

Because QE plays such an important role in our business, I decided to write this article to explain why I believe QE is really the missing link in translation, and how OpenKiwi (our brand new open-sourced framework for QE) will contribute to advance human-powered MT.

What is Translation Quality Estimation?

Before we go deeper into what QE can do to improve automated translations, it’s important to understand exactly what we’re talking about.

Quality Estimation is what we use to evaluate a translation system’s quality without access to reference translations. In other words, its goal is to figure out how good or bad a translation is, without human intervention.

It can be used for many different purposes:

  • To inform an end user about the reliability of translated content;
  • To decide if a translation is ready for publishing or if it requires human post-editing;
  • To highlight the words that need to be changed.

At Unbabel, we use QE to guarantee that, if a translation is not good enough to be delivered, it gets reviewed by our bilingual community of over 100,000 translators. They can quickly correct the mistakes and provide high-quality translations to our customers. The more we translate, the more the system learns, and the fewer mistakes it will make in the future.

Therefore, good QE eases the burden on human translators. With an automated system that highlights mistakes before the human process even begins, the translators can zero in on the areas of a piece of content that most likely needs attention.

Over the last few years, we’ve witnessed the remarkable things technology and community can accomplish together. We’d like to embed community even deeper in our technology and processes, with OpenKiwi, a collaborative framework for Quality Estimation.

Open Source Framework for Quality Estimation

OpenKiwi: an open-sourced framework for the Machine Translation community

At Unbabel, our machine translation models are running in production systems for 14 language pairs, with coverage and performance improving over time, thanks to the increasing amount of data produced by our human translators on a daily basis. This combination of AI and humans is what makes our translation pipeline better and faster.

However, our award-winning Quality Estimation systems weren’t available to external researchers, and this imposed a limit on what we could achieve together. At Unbabel, we strongly believe in reproducible and collaborative research. We want all of the AI research community to benefit from our findings, and we want us be able to build, thrive and experiment together.

This inspired us to build OpenKiwi.

OpenKiwi is an open-source framework that implements the best Quality Estimation systems, making it really easy to experiment and iterate with these models under the same framework, as well as developing new models. By combining these models we can achieve top results on word-level Quality Estimation.

The power of open-sourcing

Now, a lot of people may wonder what made us build an open-sourced framework, instead of keeping our QE technology to ourselves. If there’s anything we believe in, it’s collaboration.

Not long ago, the “barrier to entry” for even basic software projects was extremely high. It could take months to reproduce the results of one research paper, simply because the underlying code used in the project wasn’t readily available.

Open-sourcing software brings a set of benefits that far outstrips the perceived drawbacks. By allowing others to access what we’ve built, it not only enables us to get a bigger community of experts to work with us, but we can also make further and faster improvements together. In open sourced solutions, even the smallest issues are noticed, flagged, and fixed faster.

Look at machine translation itself. As a field, MT has benefited tremendously from open-source software such as Moses, OpenNMT, and Marian, among many others. These projects managed to aggregate a large community of contributors who are advancing the state-of-the-art in machine translation, coming from both industry and academia. We contribute to some of these projects, too. This is great!

However, nothing equivalent existed in Quality Estimation. There, the existing open-source initiatives are very few, used only by a few groups in academia, and they never really gained the same traction. This is the gap we are filling now with OpenKiwi.

By making OpenKiwi available to the community, I’m confident we’ll all contribute to a bigger picture and shape the future of translation.

For all the breakthroughs, machine translation remains highly mechanical — at least for now. But coupled with thoughtfully deployed data and human editors who know their language inside and out, machine translation is poised to increase access, improve consumer-business relationships, and create understanding the world over.

The post The missing kiwi of machine translation: open sourced quality estimation appeared first on Unbabel.

About the Author

Profile Photo of André Martins
André Martins

André Martins is the VP of AI Research at Unbabel, an Associate Professor at IST, and a researcher at IT. He received a dual-degree PhD (2012) in Language Technologies from Carnegie Mellon University and IST. His PhD thesis received an Honorable Mention in CMU's SCS Dissertation Award and the Portuguese IBM Scientific Prize. His research interests include natural language processing (NLP), ML, structured prediction, and sparse modeling, in particular the use of sparse attention mechanisms to induce interpretability in deep learning systems. He co-founded and co-organizes the Lisbon Machine Learning School (LxMLS 2011--2019). He received a best paper award at ACL 2009 and a best system demonstration paper award at ACL 2019. A. Martins recently won an ERC starting grant for his DeepSPIN project (2018-23), whose goal is to develop new deep learning models and algorithms for structured prediction, for NLP applications.