Discover How Small Language Models Outshine Their Larger Counterparts with Test-time Scaling in Latest Shanghai AI Lab Study

Published: 21 Feb 2025
Research suggests small language models can surpass larger ones in reasoning tasks with specific techniques. A story of underdogs winning with intelligence and scaling.

In the race of cognitive horsepower, small language models (SLMs) are coming up on top against their larger counterparts. A recent study by Shanghai AI Laboratory has revealed that these modest AI components, packed with merely 1 billion parameters, can outwit much larger language models possessing up to 405 billion parameters. The proof of the pudding lies in complicated math benchmarks, where the small titans were able to perform reasoning tasks with finesse. What is the secret ingredient attributing to the SLM’s surprising win? The answer lies in test-time scaling (TTS). This technique employs additional compute cycles during inference, empowering the models to give better performance across various tasks. This achievement is vital as it paves a pioneering pathway for businesses seeking to deploy such models in diverse environments and applications. Two different variants of the method were brought forward. One is ‘internal TTS’, used by leading reasoning models such as OpenAI 01 and DeepSeek-R1. These models are trained to think slowly by generating an extended string of chain-of-thought tokens. The second, referred to as ’external TTS’, boosts the performance with external help, rather than fine-tuning the models. The manipulation resides in coupling an answer-generating ‘policy model’ with a ‘process reward model’ that evaluates the answers. The resulting procedure offers immense opportunity to amplify the capabilities of smaller models, paving the way for their broader adoption and challenging the supremacy of long-established, mammoth AI models.