Unbabel’s Head of Research, André Martins has been awarded a prestigious European Research Council (ERC) Starting Grant for his proposed 5-year research project DeepSPIN — Deep Structured Prediction in Natural Language Processing.
With a multitude of new language interfaces like digital assistants, messenger apps, and customer service bots on the rise, André has rightly stated that these emerging technologies still have a long way to go:
Despite the many breakthroughs in Natural Language Processing, machine translation and speech recognition, deep neural networks are currently missing the key structural mechanisms for solving complex real-world tasks.
For example, current machine translation systems output one word at a time, which tends to propagate errors. And even when they generate fluent output, they still miss information from the source text too often. On top of this, current neural networks are not capable of complex reasoning, their training requires too much supervision, and their decisions are not interpretable by humans.
If we want AI and humans to work together, this needs to be fixed.
We need a new generation of machine learning models, methods and algorithms that consider the structure of natural language to enable deep reasoning about the world around us.
With €1.44 million in funding, André hopes to recruit three post-doctoral researchers and three PhD students to push the state-of-the-art in deep learning, machine translation, and translation quality estimation over a period of five years.
André sat down with us to give some more detail on the project:
Q&A with Unbabel Head of Research, André Martins
You said that “Deep learning is revolutionizing the field of Natural Language Processing”. How so? Can you explain what deep learning is?
Sure! Deep learning is a suite of statistical learning methods which help machines learn from and improve their performance with the experience of having more data pass through them. What distinguishes deep learning from other methods is its ability to learn internal representations.
Neural networks are the most popular example: they consist of multiple units (called artificial neurons) connected together in several layers; different layers capture different representation levels (from words to syntactic phrases to semantic concepts).
In the last 2-3 years, these models have achieved new breakthroughs in natural language processing tasks such as machine translation, speech recognition, and question answering.
In recent years we’ve seen the rise of digital assistants like Amazon’s Alexa, or Apple’s Siri, customer service bots and messenger apps. What do you think of the evolution of these technologies? How far are we from these tools being human-like? Why?
These technologies have evolved a lot in the last few years, to the point where they are now finally becoming useful. However they’re still very, very far from being “human-like.”
They can often sound fluent, and they can automate and customise a lot of daily tasks, but it only takes a couple of rounds of interaction to realise that these tools are not capable of any deep reasoning to solve more intricate tasks. For the time being, you need to to combine AI and humans for that.
You say that this research project will focus on bringing together deep learning and structured prediction to solve challenging tasks in natural language processing, including machine translation, quality estimation, and syntactic parsing. Can you explain what you mean by this?
Language is full of structure: we form words from morphemes, which we then combine into phrases, which in turn form sentences, and so on.
Understanding this syntactic structure is key to understanding the meaning of a sentence, as in As she was eating the pizza arrived. Since eating the pizza is such a common phrase, we can easily be trapped into a wrong interpretation. Despite the latest advances in deep learning for natural language processing, the existing methods still fall short of dealing with and understanding this kind of structure.
For example, machine translation systems typically generate words left to right, greedily, being prone to this sort of “garden path” traps. In this project, we will develop a new generation of deep learning methods specifically designed to discover and deal with structure.
Moreover, we want these systems to be understandable by humans: when they estimate that a translation is low quality, we want them to provide a rationale for that decision, pinpointing the words that are translated incorrectly. This will make it easier for humans and AI to work collaboratively.
Can you give examples of the most challenging tasks in natural language processing?
Every task that requires language understanding (a stronger term than processing) is extremely challenging. These days we can do increasingly well in speech recognition. But how do you go from there to make machines understand humans and help them solving problems? For this we need machine translation (to eliminate language barriers), question answering (to assist humans in finding information), and goal-oriented dialogue systems (to work interactively with humans to solve tasks like booking a flight). These are really hard tasks.
What do you hope to achieve in 2023 when this study comes to an end?
I hope to have a suite of Deep Learning 2.0 methods that (i) can handle and identify structure in language, (ii) are interpretable to humans, and (iii) are data-efficient, including for low-resource languages. With these ingredients we really can take a quantum leap towards solving multilingual communication!
If you’re interested in applying to be a part of the project, check the full project description here.