Mira Murati's First Product Tinker Faces Uphill Battle as Engineers Question Value Proposition

Mira Murati’s First Startup Product Meets a Tough Crowd

Engineers doubt the payoff of her fine-tuning API as open-source rivals hold strong

SAN FRANCISCO — When Mira Murati walked out of OpenAI last fall after months of reported tension, the AI world held its breath. The former chief technology officer, long seen as one of the company’s most influential voices, had something new in the works. This week, her startup Thinking Machines finally pulled back the curtain. The debut product? Tinker — a managed API that promises to make fine-tuning large, open-weight language models far less painful.

Instead of applause, though, the launch landed with skepticism.

“Unsloth is way much better,” an engineer at CTOL.digital remarked in our internal Slack channel that sums up much of the industry’s first reaction. Our team’s analysis points to serious doubts about whether Tinker actually delivers anything new.

For Murati, the stakes couldn’t be higher. By rolling out a fine-tuning service instead of chasing the next big GPT-style model, she’s betting that the future of AI lies in customization. It’s a direct swipe at her former employer’s closed-box philosophy — and a gamble that could either validate Thinking Machines’ lofty valuation or expose it as overhyped.

The Promise: Simplifying the Hardest Part

On paper, Tinker makes a simple offer. It takes care of the messy infrastructure headaches — think scheduling, resource allocation, and recovering from failures — while giving researchers control over their data and algorithms. Teams can move between models, from smaller builds to massive giants like Qwen-235B-A22B, with a single line of code.

The system runs on Thinking Machines’ internal clusters and uses LoRA (Low-Rank Adaptation) to stretch compute resources across multiple training jobs, potentially cutting costs. To help developers get started, the company also released the “Tinker Cookbook,” an open-source library of modern post-training methods.

Some heavyweight research groups have already kicked the tires. Teams at Princeton, Stanford, and Berkeley tested Tinker on projects spanning everything from math theorem proving to chemical reasoning. Redwood Research even used it to train Qwen3-32B on tricky AI control problems.

The Problem: Convincing Anyone It’s Different

Here’s the snag: none of that answers the question engineers keep asking — why switch from the open-source tools they already trust?

Our CTOL.digital engineering team’s review highlights two weak points. The first is plain old doubt. Without published benchmarks comparing Tinker against proven systems like Unsloth or TRL, developers have no hard numbers to judge whether it’s faster, cheaper, or more stable. “Clear, proven advantages” are what they want. So far, they haven’t seen them.

The second hits harder. Some engineers dismiss Tinker as “investor show,” a tool built to impress backers rather than serve real users. Once that perception takes hold, hand-waving about ease of use won’t fix it.

“We want transparent, reproducible results that beat current stacks on cost and performance,” one of our engineers says flatly. Until those appear, suspicion wins out.

What’s Missing: Proof That Stands Up

The biggest hole in Tinker’s debut is easy to spot: no independent benchmarks. Not a single training run has been published comparing it to alternatives on the metrics that actually matter — cost per token, throughput, training stability, time to convergence.

That silence leaves engineers guessing instead of evaluating. They can’t tell if Tinker’s managed infrastructure truly lightens the load, or if its LoRA trick really saves money compared with running Unsloth on rented GPUs.

Equally noticeable is what isn’t being said. The lack of detailed bug reports or failure analyses suggests most developers haven’t even invested serious time in testing it yet. Once beta access opens wider and users start sharing logs, configs, and reproducible errors, the feedback will either harden into sharp critiques or mellow into acceptance.

The Bigger Picture: A Bet Against the AGI Rush

Tinker’s launch also reveals something deeper about Murati’s outlook. By choosing fine-tuning infrastructure over frontier-model development, she’s signaling that she doesn’t expect a breakthrough leap toward artificial general intelligence anytime soon.

That view puts her in the company of other OpenAI alumni, like co-founder John Schulman and researchers Barret Zoph and Luke Metz, who’ve all pivoted toward open-weight models. Together, their moves suggest a shared belief: right now, tailoring open models offers more practical value than racing toward the next giant closed system.

The debate cuts to the heart of the AI industry. Does progress come from building ever-larger, tightly guarded models, or from inventing smarter ways to adapt the ones already out there?

The Road Ahead: Prove It or Fade Away

Thinking Machines is gradually moving users off the waitlist. The service is free during beta, but pricing will switch to a usage-based model soon. When asked about the frosty reception among engineers, the company declined to comment. It also didn’t share any benchmark data against rival systems.

That silence leaves only one path forward. To earn trust, Murati’s team needs to publish hard evidence: reproducible benchmarks, real-world cost savings, stability improvements, and productivity gains documented with actual training curves. Without them, Tinker risks being remembered as a flashy debut that failed to stick.

Some of CTOL.digital’s engineers put it bluntly: “Expect more substantive critiques once the beta expands and users publish configs, logs, and failure results. But wait, are investors crying now?”

Murati’s reputation from her OpenAI years still buys her attention. Whether she keeps it depends on what comes next — not in promises, but in proof.