Salesforce Pioneers New Open-Source MCPEval Toolkit, Ushering a New Era in AI Agent Evaluation

Published: 23 Jul 2025

Enterprises are embracing the Model Context Protocol (MCP), and now, Salesforce has found an innovative way to use it: a game-changing AI agent evaluation tool, MCPEval.

Companies have been gradually integrating the Model Context Protocol (MCP) into their systems, primarily assisting with identification and better use of agent tools. Salesforce researchers have gone a step further and found an additional application for MCP – aiding in the evaluation of AI agents themselves.

These scientists launched MCPEval, a new approach underpinned by an open-source toolkit built on the MCP system’s architecture, designed to scrutinise the performance of agents using tools. They noted existing evaluation methods were limited and often relied on fixed, predefined tasks, which fail to capture the real-world interactivity of agentic workflows.

The researchers explained that MCPEval transcends traditional success/failure metrics by systematically collecting task trajectories and protocol interaction data, resulting in unparalleled access to agent behaviour. Besides, by automating both task formation and validation, the derived high-quality trajectories enable quick agent model fine-tuning and continual improvement.

The MCPEval infrastructure generates, verifies, and assesses tasks, leveraging multiple large language models (LLMs), giving users the flexibility to choose models they are comfortable with. MCPEval essentially revolutionises the evaluation of agent performance, producing high-quality synthetic data and databases for benchmarking agents. Users now have the option to select which MCP servers and specific tools to examine the agent’s performance with.

The unveiling of MCPEval signifies an important step in the journey towards a more effective and transparent evaluation of AI agents, promising a significant boost for businesses aiming to harness AI’s true potential.

•Open-source MCPEval makes protocol-level agent testing plug-and-play venturebeat.com23-07-2025