“Bigger works better” has been the attitude of those designing AI language models in recent years. A host of other tech companies have jumped on the bandwagon, developing their own large language models and achieving similar boosts in performance.
Scaling models up can lead to continual progress also means that fewer resources go into looking for promising alternatives.
First built their own large language model, called Gopher, which is more than 60 percent larger than GPT-3. Then they showed that a far smaller model given the ability to look up information in a database could compete well against Gopher and other large language models in comprehension and interaction.
The researchers have dubbed the smaller model RETRO, which stands for Retrieval-Enhanced Transformer. Transformers are the specific type of neural network used in most large language models.
It works by making predictions about what text should come next based on its training. As well as cutting down the amount of training required, the researchers point out that the ability to see which chunks of text the model consulted when making predictions could make it easier to explain how it reached its conclusions.
The reliance on a database also opens up opportunities for updating the model’s knowledge without retraining it, or even modifying the corpus. These models easily outperformed the original, and even got close to the performance of RETRO models trained from scratch.
It’s important to remember that RETRO is still a large model by most standards. In the Gopher paper they found that while increasing model size didn’t significantly improve performance in logical reasoning and common-sense tasks, in things like reading comprehension and fact-checking the benefits were clear.
Perhaps the most important lesson from RETRO is that scaling models isn’t the only, or even the fastest route to better performance.