How do we keep our customers’ data safe at Unbabel?

Every day at Unbabel we handle tens of thousands of translation requests across all kinds of content. Whether it’s directly via our API and customer order forms, or through one of our platform integrations like Salesforce and Zendesk, be it business critical or a low priority chat message, we must guarantee that all of that data stays private and safe.

To do so, we have built several layers of protection which are continuously monitored and improved upon.

Unbabel’s Anonymisation Pipeline

For the uninitiated, Unbabel’s translation pipeline combines domain-adapted machine translation with a global community of bilinguals who post-edit the outputs to human quality.

It’s incredible to see these flows of data being translated, disassembled, distributed, edited, reassembled and delivered super fast — AI and humans working together in a real symbiotic relationship.

However, this does bring additional challenges in terms of protecting our customers’ privacy, which is where our aptly named Anonymisation Pipeline comes in.

Unbabel does not make use of customer sensitive data of any kind. It’s simply not required and in order to reduce any privacy-related risk with our human network’s post-editing work, we automatically remove sensitive, personally-identifiable data from content before it is dispatched to them.

Credit Card numbers, Social Security numbers, URLs, dates and email addresses are all stripped out and replaced by an anonymised term block with the type of content that it hides; this helps our editors to continue working without losing the context, and ensures that private data is never put at risk.

Stripping Training Data

We even take these precautions with internal processes that are not visible outside of the company, when feeding data into our AI systems and machine learning engines, which require continuous improvement and training.

Editor Vetting

When a new editor joins the Unbabel community, they are required to sign a Non Disclosure Agreement, and they will be continuously monitored and evaluated by internal tools and staff to make sure that all jobs they work on are compliant with quality standards and privacy and security expectations.

Access

Unbabel’s products and services are all encrypted and protected by firewall, and we do keep evolving our penetration barriers, from physical to digital, as well enforcing a company-wide Two-Factor Authentication on communication, knowledge and administration systems.

Access to data storage systems is highly restricted, with encrypted applicational access and unique/non-historical credentials being used to guarantee security isolation and prevent thread propagation.

All access credentials are segregated by work-group areas (for instance, the Sales Department does not have access to Engineering systems), provided on a need-to-know basis, and are routinely audited based for internal security purposes and to ensure compliance with best practices, procedures and performance.

Compliance

On a regular basis, third party vendors perform audits on our applications and infrastructure to make sure new implementations and products are aligned with the required security and privacy best practices.

On a lower level, our Continuous Integration pipeline also adds a security layer through static code analysis — this measure allows any potential unsafe code to trigger an alert, thus allowing us to prevent any potential risk.

Unbabel complies with the EU regulation with regard to data processing and on the free movement of such data and is certified under the EU-US Shield.

Always evolving

It’s easy to think that standard compliance with privacy and security are one-off tasks but it’s proven to be quite the opposite. Pushing the boundaries of what AI+Human can do also means every day there’s a new project, integration, feature or research break-through that makes us reassess our policies and standards.

Going forward GDPR is an important standard for all information technology companies, and although Unbabel already implements most of the regulatory protocol mechanisms contained within it, it reminds us that data is now one of the world’s most valuable resources. More than protected, it needs to be respected.

The post How do we keep our customers’ data safe at Unbabel? appeared first on Unbabel.

Unbabel’s Anonymisation Pipeline

Stripping Training Data

Editor Vetting

Access

Compliance

Always evolving

More content

Customer portal

Manage your Language Operations

Editor interface

Start translating

Be an Unbabel insider