Europe steps up its game to boost AI sovereignty with launch of ‘EuroLLM’

décembre 2, 2024

[Brussels, 02.12.24] Unbabel today announces the release of the EuroLLM-9B model – a large language model (LLM) created especially to support all 24 official EU languages. 

Built from scratch on extensive training data on MareNostrum 5 at the Barcelona Supercomputing Center leveraging the advanced European HPC infrastructure for large-scale training. The model outperforms most global models of similar size and signals a win for Europe’s mission to accelerate the pace of homegrown AI innovation.

Europe is the only continent in the world to have a large public network of supercomputers, managed by the EuroHPC Joint Undertaking (EuroHPC JU). It has succeeded in holding its own in the global race for GPU access and in the latest Top500 ranking of the world’s fastest machines, two out of the Top 10 and within the top 200, with this number increasing soon with the upcoming launch of two new exascale computers.

As a highly advanced “EU-made” multilingual AI model, the release marks a significant step in Europe’s drive to lead in multilingual AI innovation. It aims to set a new standard for multilingual LLMs with best in class task specific accuracy, efficiency, and speed.  

EuroLLM is completely open so anyone from individuals to startups, researchers and beyond can build on top of it.This openness aims to serve as a flywheel for EU homegrown innovation by reducing barriers to entry for smaller enterprises, encouraging experimentation, and help accelerate AI-led innovation in Europe.

While its initial focus is multilinguality—supporting all 24 official EU languages as well as 11 additional languages—the EuroLLM project has an ambitious roadmap with new, larger models on the make and plans to expand its capabilities to encompass speech and vision capabilities.

EuroLLM was developed by a consortium of partners including Unbabel, Técnico, Instituto de Telecomunicações, University of Edinburgh, Paris-Saclay University, Aveni, Paris Sorbonne University, Naver Labs, and University of Amsterdam, supported by Horizon Europe, the EU’s flagship research and development initiative. The initiative is supported by a EuroHPC Extreme Scale Access call. 

One of the major challenges in the development of large language models (LLMs) is the persistent English language bias. EuroLLM emerged from a pressing need to bridge gaps in language access across the EU and create a model tailored to the linguistic and cultural diversity of Europe.

Andre Martins, Unbabel’s VP of AI of Research and Professor at Técnico, says: ‘We’re very proud to launch EuroLLM today. This model has come to life through our team working relentlessly to develop it at breakneck speed and ensuring the greatest quality through careful data filtering. 

We see this as an exciting first step to closing the global innovation gap and strengthening Europe’s digital sovereignty, which is more important now than ever before. Our goal is that EuroLLM becomes a flywheel for innovation with the opportunity for anyone to use this EU homegrown LLM and develop on top of it. EuroLLM is also a success story for the European supercomputing network and how it can help advance AI—proof that amazing things can happen through open collaboration across multiple organizations. This model is fully open, so we actively encourage everyone to use it, improve it, and develop new technology on top of it.”

With major players like OpenAI, Google, and Meta dominating the AI landscape, reliance on their models poses significant risks, including limited openness and uncertain future availability. EuroLLM  aims to counter this trend by offering an open and accessible alternative designed to serve Europe’s needs without compromising its independence.

By prioritizing transparency and accessibility, the EuroLLM Consortium has created a model that aligns with the EU’s core values, while ensuring that Europe retains control over its critical AI infrastructure. The ability to support all official EU languages and the potential of this model to drive inclusive innovation across the continent, from public services to private enterprise was at the heart of its premise.

EuroLLM is available via Hugging Face today—here you can see more technical information and comparison with other models in public benchmarks.

For more information or interview requests please contact farah.pasha.ext@unbabel.com

About the EuroLLM Consortium
The EuroLLM Consortium brings together Unbabel, Técnico, Instituto de Telecomunicações, the University of Edinburgh, Paris-Saclay University, Aveni, Sorbonne University, Naver Labs, University of Amsterdam among Europe’s leading AI researchers to create cutting-edge, ethical, and multilingual AI technologies. With a mission to strengthen Europe’s digital sovereignty, the consortium develops solutions that reflect the EU’s commitment to innovation, diversity, and independence.

About Unbabel’s Research Science Team
Comprised of experts committed to advancing the frontiers of language technologies, the Unbabel Research team specializes in long-term multilingual NLP challenges, particularly in advancing Machine Translation (MT) and Quality Estimation (QE) technologies. Their groundbreaking work aims to revolutionize language translation systems and enhance global communication and understanding. Currently, the team is focused on developing and refining multilingual large language models, taking us closer to Unbabel’s vision: creating a world without language barriers. Unbabel’s research team were the brains behind the creation of Unbabel’s latest product – Widn AI. Widn is a smart, straightforward Language AI solution built for businesses who want reliable, fast and high-quality translations without the high cost.

About the Author

Profile Photo of Content Team
Content Team

Unbabel’s Content Team is responsible for showcasing Unbabel’s continuous growth and incredible pool of in-house experts. It delivers Unbabel’s unique brand across channels and produces accessible, compelling content on translation, localization, language, tech, CS, marketing, and more.

DeutschFrançaisNLdanskSvenskaEnglish