Alibaba Ups the AI Game with Open Source Qwen3-235B-A22B-2507, Outperforming Kimi-2

Published: 24 Jul 2025
Chinese e-commerce behemoth, Alibaba has further solidified its establishment in the realm of artificial intelligence with the launch of Qwen's newest family member.

With the advent of Alibaba’s latest member of the Qwen generative AI model family, the global tech and business landscapes are abuzz. Following a series of high-scoring AI models since the release of the original Tongyi Qianwen LLM chatbot in April 2023, Alibaba is pushing the envelopes once again.

The Qwen models, renowned for their prowess in math, science, reasoning, and writing tasks, have established a firm foothold in the AI ecosystem under permissive open source licensing terms. This has democratized AI, empowering enterprises to download, customize, and widely apply these models.

This week, Alibaba’s AI division, known as the ‘Qwen Team’, unleashed the next iteration - Qwen3-235B-A22B-2507, which is swiftly generating ripple effects across the AI turf. The model surpasses the recently launched ‘Kimi-2’ from the Chinese competitor Moonshot, signifying an unprecedented leap in AI functionalities.

The mantle, however, does not stop here. The Qwen Team has also ventured into developing an innovative ‘FP8’ version which reduces the model’s numerical operations, thereby saving memory and computational power without any performance trade-off.

This pioneering model optimizes AI capabilities for deployment on smaller, cost-effective hardware or efficient cloud-based solutions. With faster response times and lower energy costs, the model is highly scalable without significant infrastructure inputs, thereby lending attractiveness to environments with stringent latency or cost pressures.

Even though the team hasn’t explicitly stated efficiency savings, given the decreased memory and compute, it’s estimated to be substantial. The FP8 version facilitates Qwen3’s capabilities to scale to single-node GPU instances or local development machines, thereby bidding adieu to multi-GPU clusters. Also, the model lowers barriers for private fine-tuning and on-premise deployments where resources are constrained.