🧑💻 Developer-First #195 - The AI Cost Reckoning
As enterprises move from AI experimentation to AI-native engineering, token economics is becoming the new infrastructure discipline.
Hello friend,
For the past two years, the AI industry has behaved as if compute was infinite. Most AI coding tools were designed around one objective: maximise capability by throwing more inference at the problem. Bigger context windows, more agents, more iterations, more generated code. In the experimentation phase, this made sense. The priority was proving what was possible.
But enterprises are now entering a very different phase. As AI coding tools spread across engineering organisations, token consumption is no longer an abstract infrastructure metric hidden behind an API bill. It is becoming an operational constraint discussed in boardrooms, budget reviews and procurement negotiations.
The industry is now transitioning from “Can AI generate software?” to “Can AI generate software economically at scale?” That is a fundamentally different question.
Because once AI moves from isolated prototypes to thousands of agents running every day, token economics starts looking a lot like cloud economics did a decade ago. Suddenly architecture matters. Context management matters. Workflow orchestration matters. Deterministic systems matter. Efficiency matters.
The winners will not necessarily be the companies with the biggest models or the largest context windows. They will be the platforms that deliver the highest business outcome for the lowest compute cost through better orchestration, smarter memory systems, tighter context engineering and more deterministic workflows.
Now, let’s dive into this week’s signals.
P.S.: I recently launched Rubbr, a controlled AI-driven software delivery platform for legacy systems. If you’re interested in talking about AI engineering in real-life setups, let’s talk!
The new token economics
A discussion around AI coding economics exploded last week after claims that Microsoft cancelled internal Claude Code licenses because token-based pricing had become too expensive, while other large enterprises reportedly started reassessing their AI coding budgets. I argued on LinkedIn that beyond the competitive dynamics between Microsoft, GitHub Copilot, Codex and Anthropic, the deeper signal is about enterprise AI economics. Most AI coding tools today optimise for raw capability and speed rather than token efficiency, often relying on massive context windows and brute-force inference. But enterprise adoption changes the equation. As companies move from experimentation to scaled deployment, token consumption becomes a real operational constraint. The next winners in enterprise AI will not simply be the companies with the best models, but the ones delivering the highest value per token through structured context, deterministic orchestration and smarter workflow design, especially as open-weight models and local inference become increasingly viable.
The comments on my post largely reinforced this idea while challenging some of the more dramatic interpretations around Microsoft “walking away” from AI coding altogether. Several engineers and CTOs pointed out that Microsoft’s move is likely as much about competitive positioning against Anthropic as cost reduction, with many expecting Microsoft to consolidate around its own Copilot ecosystem instead. But the broader discussion quickly shifted toward token economics as a new engineering discipline. Many commenters argued that runaway token spend is often a workflow problem rather than a model problem: vague prompts, oversized context windows and poorly orchestrated agents force models to spend expensive inference cycles reconstructing intent instead of executing well-scoped tasks.
Tokenmaxxing workd record
The Internet went rogue after Peter Steinberger, creator of OpenClaw and now at OpenAI, revealed he had spent more than $1.3M on tokens in just 30 days while pushing agentic software development to its limits. On LinkedIn, I argued that while these kinds of experiments are strategically valuable, they also expose a flawed assumption spreading across the AI ecosystem: that compute efficiency no longer matters. Many AI coding platforms today optimise primarily for raw generation speed and capability, often assuming that more tokens automatically means better outcomes. But enterprise reality is very different. Most CTOs I speak with are already struggling to contain exploding AI budgets across engineering teams.
The comments reflected a broader tension emerging across the industry between “infinite compute” experimentation and operational reality. Some argued inference costs are naturally deflationary and that expensive experiments today help define the future frontier of software engineering before costs collapse. Others strongly disagreed, pointing out that frontier model costs are actually rising in many enterprise environments, especially for Western proprietary models. Several engineers questioned whether spending millions on tokens was producing proportional quality improvements at all, with some criticizing OpenClaw itself as unstable, buggy and overly reliant on brute-force inference. A recurring theme was that throwing more tokens at software generation may improve short-term output quality, but often at the expense of maintainability, security and long-term system coherence.
Beyond source code
In a recent essay on Martin Fowler’s blog, Unmesh Joshi argues that code historically served two intertwined purposes: instructions for machines and conceptual models for humans. My personal view, which I shared on LinkedIn, is that this dual role may be starting to break apart in an AI-native world. Software engineering today is still heavily organised around the assumption that humans must continuously read implementation code: programming languages, frameworks, repositories and workflows are all designed around human interpretability. But as AI increasingly generates the implementation layer, generated code starts looking less like a durable human artifact and more like an intermediate representation optimised for machines. Replacing human-written code with AI-generated code that humans still manually review does not feel like the long-term equilibrium. Instead, the industry may need entirely new abstraction layers where humans operate above code itself, focusing on architecture, constraints, intent and system behaviour rather than reading thousands of generated files line by line.
The comments on my post revealed a deep divide between engineers who believe readable code will remain essential and those who think software development is moving toward entirely new human-machine interfaces. Some engineers argued that as long as AI systems remain probabilistic and non-deterministic, humans will still need to inspect at least parts of the generated codebase. Others proposed the opposite direction entirely: if human review becomes the bottleneck, perhaps AI should generate code that is even more readable and self-explanatory than what humans produce today, reviving ideas similar to Knuth’s “literate programming.” But most commenters eventually supported the broader thesis that source code itself may become less central over time.
The Changelog - Week of May 18th, 2026
Last week, 11 companies raised $1.57 billion in 5 countries. Europe-based companies attracted 3% of total funding vs 96% for North America-based companies and 1% of Israel-based companies. Two of these companies distribute or contribute to an open-source project. On the M&A side, 2 companies were acquired.
Funding Rounds
Zyphra, from San Francisco 🇺🇸, raised $500 million in Series B funding led by AMD. Zyphra is an open superintelligence research company developing open-weight AI models and cloud infrastructure services optimised for AMD hardware. (more)
Modal Labs, from New York 🇺🇸, raised $355 million in Series C funding led by General Catalyst and Redpoint. Modal Labs provides serverless cloud infrastructure purpose-built for AI workloads including inference, reinforcement learning, and AI agent runtimes. (more)
Decart, from San Francisco 🇺🇸, raised $300 million in Series C funding led by Radical Ventures. Decart is a frontier AI research lab building real-time world models and AI optimisation software that enables models to run efficiently across different chip architectures. (more)
Exa, from San Francisco 🇺🇸, raised $250 million in Series C funding led by Andreessen Horowitz. Exa builds an AI-native search engine and web retrieval API that provides structured real-time web access for AI agents and developers. (more)
Socket, from San Francisco 🇺🇸, raised $60 million in Series C funding led by Thrive Capital. Socket protects enterprises from malicious open-source dependencies through real-time software supply chain threat detection. (more)
Unframe, from Cupertino 🇺🇸, raised $50 million in Series B funding led by Highland Europe. Unframe delivers production-grade AI applications by integrating directly with enterprise systems and data without lengthy systems integration projects. (more)
Dust, from Paris 🇫🇷, raised $40 million in Series B funding led by Abstract. Dust is an enterprise multiplayer AI platform that enables employees and AI agents to share workflows, context, and organisational knowledge across collaborative workspaces. (more)
Greenpixie, from London 🇬🇧, raised $6.3 million in Seed funding led by VERBUND X Ventures. Greenpixie provides cloud and AI sustainability intelligence software that helps enterprises reduce waste from underused compute resources and optimize AI workloads for lower-carbon infrastructure. (more)
NanoCo, from Tel Aviv 🇮🇱, raised $12 million in Seed funding led by Valley Capital Partners. NanoCo develops NanoClaw, an open-source AI agent platform focused on secure containerised deployments and zero-trust orchestration. (more)
StitcherAI, from Redmond 🇺🇸, raised $3 million in Seed funding led by Founders Co-op. StitcherAI injects real-time cloud, AI, SaaS, and infrastructure spending data into engineering workflows and AI agent systems to optimise technology costs. (more)
Craci, from Helsinki 🇫🇮, raised $1.6 million in Seed funding led by Lifeline Ventures. Craci automates software supply chain compliance, vulnerability tracking, and security documentation generation for CI/CD pipelines. (more)
M&A Rounds
Stainless, from New York 🇺🇸, was acquired by Anthropic for $300 million. Stainless automates the creation and maintenance of SDKs, CLI tools, and MCP servers from API specifications across multiple programming languages. (more)
Emmi AI, from Linz 🇦🇹, was acquired by Mistral AI. Emmi AI develops AI-driven physics simulations that accelerate industrial engineering and manufacturing workflows in real time. (more)



