COMET (Crosslingual Optimized Metric for Evaluation of Translation) is a new neural framework for training multilingual machine translation (MT) evaluation models. COMET is designed to predict human judgments of MT quality (such as MQM scores). The resulting metric can be used to automate the process of evaluation, improving efficiency and speed.


A metric that performs better

COMET takes advantage of recent breakthroughs in cross-lingual neural language modeling resulting in multilingual adaptable MT evaluation models. It takes a unique approach by incorporating information from both the source input and the target-language reference translation in order to more accurately predict MT quality.

High-quality assurance

At Unbabel, we process large volumes of translations using highly specialized models built for customer service domains. Our goal is to provide our customers with the highest-quality translation possible. Using COMET allows our engineers to make well-informed modeling decisions faster than waiting for human evaluation.

Cost reduction

Having humans verify our engine deployment with MQM annotation is not scalable. With COMET we created a metric that correlates well with human quality judgments so that most model deployment decisions can be made based solely on COMET scores.

Unbabel’s MT development team is already using COMET as the primary metric for evaluating the quality and accuracy of our MT models. We are launching this “ready to use” trained COMET model as open-source to benefit the wider MT R&D community. You can find the code here.