At an MIT event in March, OpenAI cofounder and CEO Sam Altman said his team wasn’t yet training its next AI, GPT-5. In an interview, Altman told the Financial Times the company is now working to develop GPT-5. Though the article did not specify whether the model is in training-it likely isn’t-Altman did say it would need more data. The data would come from public online sources-which is how such algorithms, called large language models, have previously been trained-and proprietary private datasets.
The rise of the large language models behind chatbots like ChatGPT was driven by ever-bigger algorithms consuming more data. Of the two, it’s possible even more data that’s higher quality can yield greater near-term results.
Recent research suggests smaller models fed larger amounts of data perform as well as or better than larger models fed less. After scraping much of the internet to train GPT-4, it seems the low-hanging fruit has largely been picked.
Foundation models like OpenAI’s GPT-4 require vast supplies of graphics processing units, a type of specialized computer chip widely used to train and run AI. Chipmaker Nvidia is the leading supplier of GPUs, and after the launch of ChatGPT, its chips have been the hottest commodity in tech.
Altman said they recently took delivery of a batch of the company’s latest H100 chips, and he expects supply to loosen up even more in 2024.
In tests released this week by AI benchmarking organization MLPerf, the chips trained large language models nearly three times faster than the mark set just five months ago. Reading between the lines-which has become more challenging as the industry has grown less transparent-the GPT-5 work Altman is alluding to is likely more about assembling the necessary ingredients than training the algorithm itself.
The company is working to secure funding from investors-GPT-4 cost over $100 million to train-chips from Nvidia, and quality data from wherever they can lay their hands on it. Altman didn’t commit to a timeline for GPT-5’s release, but even if training began soon, the algorithm wouldn’t see the light of day for a while.
It took the company eight months to polish and release GPT-4 after training. Though the competitive landscape is more intense now, it’s also worth noting GPT-4 arrived almost three years after GPT-3. The enhanced algorithm includes more up-to-date information-extending the cutoff from September 2021 to April 2023-can work with much longer prompts, and is cheaper for developers.
Google DeepMind is currently working on its next AI algorithm, Gemini, and big tech is investing heavily in other leading startups, like Anthropic, Character. In the longer term it’s not clear if the shortcomings associated with large language models can be solved with more data and bigger algorithms or will require new breakthroughs.
At the MIT event in March, Altman said he thought the age of scaling was over and researchers would find other ways to make the algorithms better.
“Until we go train that model, it’s like a fun guessing game for us,” he told FT. “We’re trying to get better at it, because I think it’s important from a safety perspective to predict the capabilities. But I can’t tell you here’s exactly what it’s going to do that GPT-4 didn’t.”