Mistral Small 4 Review: Impressive Specs, Shaky Launch — Is It Worth the Hype?

By
CTOL Editors - Ken
1 min read

Mistral has never been shy about ambition. But with Mistral Small 4, the French AI company has swung harder than ever — releasing a 119-billion-parameter model it insists on calling "small," one that collapses four specialized systems into a single, Apache 2.0-licensed package.

The architecture is genuinely striking. Built on a Mixture-of-Experts framework deploying 128 experts — four active per token — the model draws on the capabilities of Magistral for deep reasoning, Pixtral for multimodal vision, and Devstral for agentic coding. A 256,000-token context window and a configurable reasoning_effort parameter round out a feature set that reads like a wish list for enterprise AI teams.

The efficiency numbers, if they hold, are hard to dismiss. Mistral claims a 40% reduction in end-to-end completion time and triple the throughput of its predecessor. On LiveCodeBench, the company reports it outperforms GPT-OSS 120B while generating 20% shorter outputs — and on competitive coding benchmarks, it reportedly beats Qwen models using 3.5 to 4 times less output to reach comparable scores.

But the launch cracked under its own weight.

Engineers at CTOL Digital Solutions, who evaluated the model at release, delivered a measured but pointed verdict. The efficiency story, they acknowledged, is compelling — shorter outputs mean lower API costs and better latency, metrics that matter enormously at scale. The unified architecture is a genuine feat of consolidation.

Yet the cracks were difficult to ignore. Most critically, the flagship reasoning_effort parameter — the very feature that defines Small 4's identity — was non-functional and undocumented in the API on launch day, leaving early adopters unable to access the model's core differentiator. Without it, CTOL's evaluators found the coding performance "very, very bad" and at times completely broken.

Then there is the naming problem. Calling a 119B-parameter model requiring at minimum four NVIDIA H100 GPUs "Small" invites ridicule, and CTOL did not hold back. The label creates confusion and, more practically, puts local deployment beyond reach for the vast majority of developers — a painful irony for a model released under an open-source license.

Internal benchmark comparisons further muddied the picture. CTOL's own testing showed Mistral Small 4 scoring below Qwen3.5 122B, and evaluators expressed skepticism about real-world performance at the full 256k context depth.

CTOL's conclusion was unambiguous: a meaningful step forward for European AI, and a genuine achievement in open-source development — but not good enough to claim the top position in its class, and far from the obvious choice for teams seeking a capable, deployable small model today.

Mistral has built something architecturally impressive. Whether it can deliver on that architecture — reliably, at launch, for real users — remains an open question. In AI, the distance between a benchmark and a product is where reputations are made or lost.

Sources: https://mistral.ai/news/mistral-small-4

You May Also Like

This article is submitted by our user under the News Submission Rules and Guidelines. The cover photo is computer generated art for illustrative purposes only; not indicative of factual content. If you believe this article infringes upon copyright rights, please do not hesitate to report it by sending an email to us. Your vigilance and cooperation are invaluable in helping us maintain a respectful and legally compliant community.

Subscribe to our Newsletter

Get the latest in enterprise business and tech with exclusive peeks at our new offerings

We use cookies on our website to enable certain functions, to provide more relevant information to you and to optimize your experience on our website. Further information can be found in our Privacy Policy and our Terms of Service . Mandatory information can be found in the legal notice