As AI models get ever larger, the amount of money and energy required to train them has become a hot-button issue. A new approach that rewrites one of the fundamental building blocks of the discipline could provide a potential workaround.
Ever since GPT-3 demonstrated the significant jumps in performance achievable by simply increasing model size, leaders in the AI industry have been piling resources into training increasingly massive neural networks.
But this costs huge amounts of money, requires massive computing resources, and uses enormous amounts of power. That’s increasingly seen as a problem, not only because of the environmental implications, but also because it’s making it difficult for smaller AI outfits to compete, and as a result concentrating power in the hands of industry leaders.
Now though, researchers from Oxford University have outlined a new approach that could potentially slash training times in half. They do so by rewriting one of the most fundamental ingredients in today’s neural network-based AI systems: backpropagation.
How a neural network processes data is governed by the strength of the connections between its various neurons. So to get them to do useful work, you first need to adjust these connections until they process the data the way you want them to. You do this by training the network on data relevant to the problem using a process called backpropagation, which is split into two phases.
The forward run involves feeding data through the network and getting it to make predictions. In the backward pass, measurements of the accuracy of these predictions are used to go back through the network and work out how the strength of various connections should be adjusted to improve performance. By repeating this process many times using lots of data, the network gradually works towards an optimal configuration of connections that solves the problem at hand.
This repetitive process is why it takes so long to train AI, but the Oxford researchers may have found a way to simplify things. In a pre-print posted on arXiv, they describe a new training approach that does away with the backward pass entirely. Instead, their algorithm makes estimates of how weights will need to be altered on the forward pass, and it turns out these approximations are close enough to achieve comparable performance to backpropagation.
The researchers showed that the approach can be used to train a variety of different machine learning algorithms, but because it only involves a forward pass it was able to slash training times by as much as half.
It’s a simple mathematical trick, Andrew Corbett from the University of Exeter in the UK told New Scientist, but could help tackle one of the most pressing challenges facing AI today. “It’s a very, very important thing to solve, because it’s the bottleneck of machine learning algorithms,” he said.
How broadly applicable the approach is remains to be seen, though. In their paper, the researchers show that the difference in runtime costs shrinks as the number of layers in a neural network increases, suggesting the technique may have diminishing returns with larger models.