How Unbabel’s language pipeline will translate everything to human quality
In a $38 billion-a-year market for translation and localization services, the largest player is a military contractor that makes $730 million a year. The rest of the Top 10 make between $80 and $430 million a year.
It’s a massively fragmented market, with a long tail of players who all operate very similar business models — marketplaces which process client briefs on one side, and large pools of professional translators on the other who bid for the work based on their skills, experience and rates. Once a bid is won, each translator will then set about applying their craft to one piece of content at a time. Need more languages? Hire more translators.
It’s how things have always worked. But recent history has shown that which isn’t scalable is not sustainable. And that once you do find a new way of doing things, huge new markets of untapped potential are there for the grabbing.
For Unbabel, translation as it currently exists is part of the problem.
How do you translate everything?
What if you want to translate all of the world’s information into every language and do it near instantaneously? Not just legal docs, terms and conditions and product catalogues, but every email, every chat conversation, every subtitle, every piece of content imaginable.
To meet the challenge, you need to fundamentally rethink how translation works. Not as a “throw more people at it” problem, but as a software problem where a process can be defined and continually improved to produce a better quality output.
Some technology companies already see the world this way, but fail to see the needs of most businesses and organisations in bridging the huge gulf between the most advanced machine translation, and that which sounds like it was done “the old fashioned way” — i.e. by an actual human being.
To be taken seriously, modern businesses need a solution which can understand the surrounding context of tone and style and subtext and spot the errors skimmed over in the rush to fully automate.
Shoemakers vs. Producing Shoes
To understand what Unbabel is doing to translation, think of shoemaking in the 18th Century.
For all of human history, shoes were made one at a time by hand. On such a personal scale, each pair was crafted to its owner at great expense of time and energy, and was consequently a service taken up by only a small percentage of a population.
But by the mid-18th Century, shoemaking started to become commercialised on a scale never before seen. A cottage industry exploded to heel the people of the industrialising world, with thousands of shoemakers pushing themselves around the clock to keep up with demand (much like the long tail of translation agencies mentioned earlier).
The Napoleonic Wars provided a push for mechanisation of the production process, with famed English engineer Marc Brunel developing machinery for the mass-production of boots for soldiers in the British Army. A visitor to his factory in Battersea wrote the following:
“Every step in it is effected by the most elegant and precise machinery… all the details are performed by the ingenious application of the mechanic powers; and all the parts are characterised by precision, uniformity, and accuracy. As each man performs but one step in the process, which implies no knowledge of what is done by those who go before or follow him, so the persons employed are not shoemakers, but wounded soldiers, who are able to learn their respective duties in a few hours.”
If you want to remove language barriers entirely, if you want to enable everyone to understand and be understood in any language, in any medium, then you need to abstract the problem of translation to a much higher level than “more people”.
You need to divide it into a series of steps with precise, uniform and accurate work by machines, guided by human hands at key intervals, and not necessarily by the experts who once did this work entirely by themselves.
A bit like Brunel’s shoe factory.
Unbabel’s language pipeline
What guarantees quality at Unbabel is not the quality of the individual translators themselves — which even at a professional level can vary widely and are naturally prone to human error (being, er, humans) — but the quality of the pipeline that produces the work in precise, uniform and accurate steps.
Human work is still required, but at non-critical junctures where they correct and edit machine work, rather than tasked with the entire job to be done. It removes human dependency, but it vastly increases their value for quality corrections overall, and allows for an exponentially higher throughput of content.
At a high level, Unbabel ingests text content in a source language at one end, and dispatches it to customers in one or all of 27 other target languages.
Looking closer, there are a number other steps that happen between A and B.
An order is a piece of text that needs translating. It could be a customer service email in a platform like Salesforce, Zendesk or Freshdesk, or one of millions of product descriptions on a global ecommerce platform, or subtitles for hundreds of hours of video footage.
Each content type will have its own custom flow in the pipeline, placing different priority weightings on aspects like quality and speed, but the overall process is much the same for all text.
At this stage, Unbabel analyses the source text, detecting and determining a range of factors that will influence its journey through the pipeline.
First, a number of actions are taken based on the Unbabel customer the text comes from. Custom glossaries and Style Guides that are a part of the onboarding process are automatically tagged to orders, and sensitive information like credit card numbers are hashed and anonymised.
Sophisticated analyses on the source language are made, detecting hard-to-translate elements like locations, names and addresses, and estimating the overall difficulty of the text, based on vocabulary used, lengths of sentences and other grammatical patterns.
A model is built uniting this data with other insights on the document’s tone (formal vs. informal), as well as detecting its topic, which allows for smart routing of the content to certain editors by their stated interests (travel, sports, medical, entertainment, etc).
Unbabel’s Adapted Machine Translation
Once that preparation has taken place, the first work of translation is done, entirely by machine. To start, Unbabel checks its Translation Memory — a huge, dynamic store of data that ensures that if a full sentence has already been done for the same client or same domain, it is retrieved and reused, allowing for a potential improvement in speed of delivery and translation consistency (although it can still be changed later on by human editors if the context is incorrect).
The next step is via the Machine Translation Router, which chooses the best specialised MT engines, based on content, domain and customer (a customer email versus product descriptions of handmade luxury watches have different requirements).
The machine-translated content then goes to the Automatic Post-Editor, where Unbabel is able to improve these translations automatically, by learning from what the human network has already done in the past. This produces a new version to be evaluated for quality (using our award-winning Quality Estimation system) and distributed to the right humans in Unbabel’s community.
Unbabel has a global community of 50,000 people who are tasked with reviewing the outputs of this adapted machine translation. But how to know who gets what task?
There are multiple criteria for editor selection, but the main ones come down to: who is available, how highly rated they are for certain types of content, and how urgent tasks needing completion are.
Task priority is judged by customer SLAs and other factors, and are sorted in a green and a red queue. All tasks start in the green queue, and in an ideal world there is no red queue, but this is there for backup and redundancy, ensuring that deadlines are met.
Additionally, we have been running tests that show that editors who are paired with content that they express an interest in, perform better on the tasks.
Once the right people have got the right tasks, Unbabel then sets about helping them do the best job possible, in as little time as possible.
Unbabel’s Smartcheck is like a supercharged version of Grammar Correction found in common document editors. It checks a range of potential errors with helpful suggestions for one-tap corrections, including spelling, tone, lexical consistency (subject and verb agreeing; pronouns matching; gender, etc) and more specific rules related to the customer’s stated requirements.
It’s inefficient to make each editor read each customer’s style guides, so Smartcheck automatically overlays tips throughout the text, making it quick and easy to correct non-grammatical errors like subjective vs objective tonalities or writing numbers as words rather than digits.
Unbabel’s Self-Learning Network
The magic of this whole process is that the more Unbabel translates, the better the outputs of the system become. Machine Translation engines can be retrained, Translation Memories can be enlarged, and the Automatic Post Editing improves with every pass of new text.
The more you throw at it, the better it gets.
International businesses like Pinterest, Skyscanner, Under Armour, Trello and Oculus VR trust Unbabel’s enterprise platform to open up and grow new markets.
To schedule a demo, get in touch today.