Why You're Burning Tokens in Three Turns

Prompting taught operators to be customers. The X anguish is the cost of running a kitchen from a customer’s seat.

“You’ve exceeded your token usage — buy more to continue.”

Every operator building something with AI tools (ChatGPT, Claude, Gemini) has seen some version of this. I quickly scan my X feed every morning and the same shape keeps coming round — four turns into a session, watching the meter climb in a black cab stuck in traffic, and then the wall. The session craters. They start again — same dish, same instructions, fresh wallet — and by the third reload they’re swearing at the model, the framework, the rent.

The replies underneath are the familiar litany — use a smaller model, chunk your prompts, try a different framework, switch providers — none of them wrong exactly, none of them touching what’s actually broken.

They’ve been taught the wrong physics, and nobody’s writing into that.

The model was never going to hold the world in its working memory — the world belongs on disk. For the operators bleeding tokens, it isn’t there yet. That’s the whole problem.

Prompting taught you to be a customer

The prompt-engineering culture that grew up around the first wave of LLMs taught operators a stance: the model is the intelligence, you are the user, your job is to ask well. Better prompts, better outputs. Iterate until the gap closes between what you wanted and what came back.

Inside that stance every session begins from nothing. The context the last session learned doesn’t carry. The project you’ve been inside for three weeks vanishes. You re-explain the codebase, re-state the constraints, re-paste the file, re-describe the architecture — all before any actual work begins. The model is brilliant and amnesiac, and you spend the first half of every session catching it up.

It isn’t the model failing. It’s the stance.

The operator’s standing in the wrong place, asking the wrong question.

A customer at a restaurant places an order, and every visit begins from the menu. The waiter may know the regular’s name, but the kitchen builds the dish from scratch each time — and rightly so, because the customer isn’t running the kitchen. They’re being served.

But the operators bleeding tokens on X aren’t coming to Claude to be served. They’re coming to build. The moment you cross that threshold — the moment you stop ordering and start cooking — the customer stance becomes the thing that’s costing you. The model treats them like customers because they’re standing where customers stand. The mismatch eats the budget.

The architect stance

There’s a different way to stand in front of the model. I’ve taken to calling it ingeniculture — the practice of providing the infrastructure for the AI to thrive. The operator isn’t a supplicant asking the oracle for guidance. The operator is an architect, laying floor before service starts.

I’ve been working this way for about a year, and the shift happens when you stop trying to make the model smarter and start making the substrate denser. In the architect stance, the intelligence isn’t in the model alone. It’s in what the model can read — the documents, the wiki, the routing files, the code, and the git history. Every decision you wrote down. Each session loads only what it needs and adds back what it learned, so the system grows denser, not lighter — and the density never turns ruinous, because no session ever loads the whole of it. The operator stops being the institutional memory because the institution has finally got one of its own.

This isn’t a clever framework. It’s what running a kitchen has always required, dressed in different clothes. Marco Pierre White didn’t tell his cooks the recipe at the start of every service. The recipe had been learned, written down, drilled, internalised long before service ever started. By the time the first ticket landed, the kitchen had spent hours on the dish nobody had eaten yet. That wasn’t waste. That was the precondition for service.

Mise en place is tokens spent in advance

Every brick of substrate you lay down before service is what the session reads instead of what you’d otherwise type. The wiki entry written in February is what the model loads this morning, instead of you re-explaining the project at 9am. The principle named after last month’s cock-up isn’t a lesson anymore — it’s a rail. The atomic commit — one thing changed, named with the reason — is what did we do here and why already paid for. Bundled commits like various fixes force the next session to load the diff and re-derive intent; atomic ones don’t. That’s why the discipline exists. Not aesthetic, not engineering hygiene — it’s pre-paid context, substrate the next session reads cheaply. Reading the substrate still costs tokens. But it costs fewer tokens than re-derivation, and what gets read is better than what you’d type in the moment.

Skip the prep and you pay full retail — the whole world, reloaded from nothing, every session.

That’s what burning tokens in three turns is — re-loading the world from scratch because nothing was laid down to be re-loaded from. The token cost of running a kitchen with no prep is exactly what you’d expect: brutal, repetitive, and getting worse the longer you keep doing it.

A well-prepped kitchen runs differently. The cook walks in to find the mise on the board, the stock simmering, the lamb broken down and resting — the day’s work already half done before the first ticket calls. The model walks into a prepped session and finds the wiki loaded, the routing clear, the principles already read. The tokens still spend — but they spend on the actual work of the session, not on re-deriving the context that lets the session happen at all.

Mise en place, the pass, prep before service — each is a token-economics principle in disguise. A way of organising scarce attention against a service window that won’t wait.

What this changes

The prompting question changes. You stop asking what’s the best prompt for this task and start asking what should this session not need to ask in the first place? Anything the session keeps re-explaining is a brick that wants laying. The repeated question is the system telling you what to write down next.

The cost calculation changes. A session that takes longer because it pauses to lay a brick is not a slow session — it’s an investment. The brick lowers the cost of every session that follows. Compound interest works the same way on tokens as on money. The operators getting blown out in three turns are paying compound debt. The operators who keep going for hours are earning compound interest. The difference between them isn’t the model or the framework or the rent. It’s whether anyone bothered to write anything down.

The model isn’t an oracle. It’s staff.

The relationship to the model itself changes. You stop asking it to remember things; you start writing them down. The model gets brilliant at execution because the operator is finally being the institutional memory the institution didn’t have.

The first move

If you’re burning tokens in three turns, the move isn’t another framework, another tutorial, another opinion from someone on X. It’s to start listening differently to your own sessions.

Watch the next session for the moment you find yourself re-explaining something — not a one-off detail, but the stuff that’s true about the project itself: the stack, the convention, the decision you made three weeks ago, and the thing you keep clarifying. The moment you catch yourself doing it, stop. Write the thing down in plain language. Commit it. Then point the next session at it.

That’s the entry. You haven’t built an operating system. You haven’t subscribed to a framework. You haven’t credentialled into anyone’s methodology. You’ve laid one brick — the one your sessions were already telling you to lay.

The next brick names itself the first time you catch yourself re-explaining something else. And the one after that. Pretty soon you’ve got a wiki. Pretty soon you’ve got routing. Pretty soon the model walks into a session already knowing where it is, what it’s building, who you are, what you’ve decided, what’s drifted and what’s clean. Sessions stop collapsing in three turns because the world is on disk now, where it belongs, and the model is reading what you wrote, not re-deriving it from your prompts. Reading isn’t free. But you’d be paying anyway — and reading what you wrote costs less than re-deriving from scratch.

That’s the architect stance. It isn’t faster because the AI got better — it’s the same model anyone else can call this afternoon. It’s faster because you did the prep.

What I’m not selling you

This isn’t a course. It isn’t a framework you have to download. It isn’t a methodology with my name on the front that you have to credential into. The whole point is that it’s portable — plain markdown, plain git, plain text. Whatever you build with these ideas belongs to you. The substrate compounds for the operator who lays it, not for whoever first wrote down the moves.

And I’m not religious about any of it. There are still days the substrate doesn’t catch what I need and I’m prompting from scratch like everyone else. The discipline isn’t perfection. The discipline is that the next time the same gap opens, there’s a brick where the prompt used to be.

What I’m doing is naming the physics so it’s visible. Once you can see it, you can’t unsee it. Every session you run after reading this will feel different. You’ll notice where you’re paying retail, and where the prep has lowered the bill. You’ll notice the bricks you’ve been meaning to lay. You’ll notice the corrections you keep making that haven’t been written down yet.

The X anguish is real — operators bleeding tokens trying to build serious things with a machine that wasn’t designed to remember them, taught a stance that mistakes the asking for the architecture. The fix isn’t a smarter prompt. The fix is a kitchen.

Lay one brick. See what happens.

← Back to Writing