Large Language Models Display Human-Like Behavior in Adapting to Critique, Reveals Google Study

Published: 16 Jul 2025

Research from Google DeepMind and University College London spotlights the curious cognitive behavior of Large Language Models (LLMs) under pressure.

Advancing our understanding of how Artificial Intelligence models react under pressure, collaborative research between Google DeepMind and University College London has shed light on some remarkable behavioral tendencies of Large Language Models (LLMs). The study demonstrates exciting similarities between the decision-making processes of LLMs and humans, as well as some stark dissimilarities. The findings suggest that LLMs are prone to display overconfidence in their initial answers, but that confidence can crumble rapidly when presented with a counterargument, whether or not the counterargument is valid.

The research team designed a meticulously controlled experiment to explore how LLMs recalibrate their confidence and decide whether to alter their responses when confronted with external advice. The methodology involved asking an LLM to select an answer for binary-choice questions, such as identifying the correct latitude for a city, and then presenting it with advice from a fictitious ‘Advice LLM’. The advice could either echo the initial answer, oppose it, or remain neutral, with an explicit accuracy rating attached.

One of the experimental design’s most intriguing fittings was selectively showing or concealing the initial answer during the LLM’s final decision-making process. This setup, infeasible with human participants who cannot conveniently forget their prior choices, allowed the researchers to probe how the memory of a past decision can impact current confidence.

The study observed a distinct cognitive bias whereby LLMs displayed a diminished inclination to switch answers when they could review their original response, as compared to when the initial answer was hidden. This sticking effect points to an intriguing aspect of AI cognition, closely related to human cognitive biases. These findings undeniably have immense repercussions on how we build and improve conversational interfaces. Our understanding of LLM behavior continues to grow, and with it, the potential applications of LLM technology.

•Claude Code revenue jumps 5.5x as Anthropic launches analytics dashboard venturebeat.com16-07-2025
•Slack gets smarter: New AI tools summarize chats, explain jargon, and automate work venturebeat.com19-07-2025
•Salesforce used AI to cut support load by 5% — but the real win was teaching bots to say ‘I’m sorry’ venturebeat.com19-07-2025
•How OpenAI’s red team made ChatGPT agent into an AI fortress venturebeat.com21-07-2025
•Weaving reality or warping it? The personalization trap in AI systems venturebeat.com22-07-2025
•Intuit brings agentic AI to the mid-market saving organizations 17 to 20 hours a month venturebeat.com23-07-2025
•Chinese startup Manus challenges ChatGPT in data visualization: which should enterprises use? venturebeat.com23-07-2025
•A ChatGPT ‘router’ that automatically selects the right OpenAI model for your job appears imminent venturebeat.com22-07-2025
•Amazon launches Kiro, its own Claude-powered challenger to Windsurf and Codex venturebeat.com16-07-2025
•Perplexity offers free AI tools to students worldwide in partnership with SheerID venturebeat.com16-07-2025
•Google study shows LLMs abandon correct answers under pressure, threatening multi-turn AI systems venturebeat.com16-07-2025