Predicting GPT-5's Emergent Capabilities: UC Berkeley's Breakthrough
Discover how UC Berkeley predicts emergent AI capabilities using checkpoints. Insights on model scaling, finetuning, and the Law of Emergence.
A fundamental challenge in scaling LLMs lies in the lack of understanding of emergent capabilities. Specifically, while the pretraining loss of language models is highly predictable, the predictability of downstream capabilities is significantly lower. Sometimes, these capabilities exhibit emergent jumps, making it challenging to forecast the abilities of future models.
Recently, a research team from UC Berkeley proposed the task of emergent capability prediction: can we predict whether GPT-N+1 (the future model) will exhibit emergent capabilities by using only the checkpoints of GPT-N (the current model)?
The answer is provided in their paper, "Predicting Emergent Capabilities by Finetuning."
Notably, one of the authors of this paper is Sergey Levine, a prominent figure in reinforcement learning.
The study fits a parametric function—termed the "Law of Emergence"—to model how the emergence point changes with varying amounts of data.
To validate this, the researchers used four standard NLP benchmarks: MMLU, GSM8K, CommonsenseQA, and CoLA. By fitting the Law of Emergence using only small-scale LLMs, they were able to accurately predict the emergence points.
Finally, the study presents two practical case studies of emergent phenomena, demonstrating that the proposed Law of Emergence can be used to predict more complex capabilities.
Jason Wei, the proponent of chain-of-thought prompting, praised the work:
"This is a very clever paper that predicts the downstream performance of pre-trained models. It’s incredibly valuable as it provides a way to predict and justify the capital investment for the next large model training run."