A recent breakthrough in artificial intelligence by a team of computer scientists has introduced a new, flexible approach to machine learning models that hinges on the ability to periodically “forget” information. This novel method, devised to enhance how AI systems understand language, represents a significant step forward in the field according to Jea Kwon, an AI engineer at the Institute for Basic Science in South Korea.
Currently, language processing in AI relies heavily on artificial neural networks. This intricate network, consisting of mathematical functions known as neurons, conducts calculations and relays signals across multiple layers. The interconnected web of neurons isn’t efficient at the onset but grows more adept as it adapts to provided training data. Customarily, to construct a bilingual AI model, the system would be fed a huge volume of text from both languages to modify neural connections. Though effective, this method requires extensive computing resources and is difficult to adjust should new needs arise.
Mikel Artetxe, a coauthor of the new study and founder of the AI startup Reka, points out the challenges of adapting a multilanguage model – a timely and resource-heavy task if a new language needs to be incorporated. To address this, Artetxe and his team proposed an innovative solution. They focused on a key aspect of the neural network called the embedding layer, where the foundational elements of words known as tokens are stored. The researchers trained the model in one language, wiped the learned tokens, and then retrained it on a new language, replenishing the embedding layer with tokens from the fresh language set.
Despite what appeared to be discordant information in the model, the retraining was successful. This success led the researchers, including lead author Yihong Chen, to realize that while the embedding layer holds language-specific data, the deeper network layers capture more universal, abstract concepts applicable across human languages. Chen likens this to the idea that humans conceptualize similar objects with different words across various languages.
Taking this approach a step further, Chen proposed intermittent resetting of the embedding layer even during initial training. The adjustment aims to make the model acclimatize to continuous learning and forgetting, streamlining the process when adapting to additional languages. This revised training method was applied to a commonly used language model known as Roberta.
The team’s “forgetting” model was pitted against a conventionally trained model. Although it showed a marginally lower performance initially (85.1 vs. 86.1 in a language accuracy measure), its capability of adapting to other languages using substantially smaller datasets far outstripped its counterpart. Furthermore, under computational constraints during retraining, the forgetting model’s performance remained relatively stable, whereas the standard model’s efficacy drastically declined.
Periodic forgetting appears to confer an advantage in general language learning. As per researcher Evgenii Nikishin from Mila, a deep learning research institute, this process may ease the integration of new knowledge, suggesting a more profound understanding of language beyond mere word-to-word correspondence.
The concept mirrors how the human brain functions, according to neuroscientist Benjamin Levy from the University of San Francisco. Humans are better at grasping the gist rather than precise details, a trait that adaptive forgetting in AI could mimic to achieve more flexible performance.
Mikel Artetxe envisions broad implications for this technology, potentially enabling a diversity of AI models that are accessible to a broader range of languages, including less-common ones like his native Basque. Chen anticipates the wide-ranging benefits, especially for producing language models that can swiftly adapt to new domains, thereby democratizing AI innovations.
This promising work exemplifies the potential of marrying AI development with principles of human cognition, which could pave the way for more intelligent and adaptable language processing technologies. The continued exploration and understanding of these models not only provide practical applications but also offer deeper insights into the nature of language comprehension itself.