Context Discipline Is the Skill

Every few weeks the same complaint surfaces on X. Someone pastes eight hundred lines of code into a chat window, asks a vague question, gets a four-thousand-token answer, does it again, and runs out of allowance before lunch. Then comes the verdict: the model is too expensive, the window is too small, the thing is unusable for real work.

I read these and recognise the shape, because I made the same mistake for months. The complaint is real. The diagnosis is wrong.

The model is fine. The window is enormous. The cost only spikes when you force the model to re-derive everything it needs every single turn instead of routing it to where the answer already lives. You’re not paying for the model. You’re paying for synthesis you could have externalised.

That’s not a model problem. It’s a discipline problem. And the discipline is teachable.

The reframe

Here’s the thing nobody selling AI workflows wants to say plainly: a fresh chat window is the most expensive way to think.

Every turn, the model holds whatever you’ve pasted into the conversation and reasons across all of it again. Paste eight hundred lines of code and ask three questions, and you’ve paid to load and re-read those eight hundred lines three times. The window isn’t the constraint. The habit is. You’re using the conversation as your memory, your filing system, and your reasoning surface all at once — and it’s the worst possible tool for the first two jobs.

Context discipline is the practice of keeping those jobs separate. The model reasons. Everything else lives somewhere it can be fetched, not carried.

The evidence

I keep a real measurement, so let me show you one. A session from earlier today — seven turns, one live deploy shipped, full operational arc start to finish.

TurnCostWork
”Tell me about Joan”60kPure synthesis — the outlier
”Show Joan’s briefing”12kTool call, paraphrase
”Client 360 on a site”33kMulti-document load and synthesis
”Pull GSC and ten content ideas”15kTwo data pulls, two tables
”Draft the title proposals”12kAnalysis behind a decision gate
”Execute, recover, file the bug”31kFour edits, a failed verify, a recovery, a logged finding
”Commit and push”9kAtomic commit, parallel-session check

The whole session cost roughly 172,000 tokens. Twenty-two per cent of a one-million-token window. And look at where the money went: the turn that actually shipped work — four edits, a failure caught, a recovery, a finding filed — cost 31k. The turn where I asked the model to explain a concept from scratch cost 60k. Nearly double.

172k
tokens for the entire session — strategic read, agentic work with recovery, atomic commit, push. The arc that produced a live deploy cost less than three of the explaining turns combined.

The expensive turns weren’t the ones that did things. They were the ones that asked the model to generate understanding it could have looked up. That gap is the whole article.

The Matthew principle, in token form

The seven turns are one session. Step back to the whole day and the compounding shows up in the bill.

Four hours of working time. Three Claude Code sessions running in parallel, a strategic conversation going alongside them. The time the model actually spent thinking across all of it: twenty-two minutes. The cost: $11.28 — six per cent of the weekly limit. Ninety-four per cent of the week still sitting there for tomorrow, Sunday, Monday.

$11.28
Four hours of work across three parallel sessions and a strategic conversation. Twenty-two minutes of model time. Six per cent of the weekly limit — the other ninety-four still on the table.

The wall-clock-to-thinking ratio is roughly ten to one. Four hours of me working compresses into twenty-two minutes of the model thinking, because the model spends most of the clock waiting — reading files, running commands, hitting Netlify — while I read the output, decide the next move, and name the gates. The substrate sets a pace a person can actually sustain.

Run the same four hours without the discipline and you’d be at thirty or forty per cent of the weekly limit, not six. Pasting code into chat, asking for explanations, reloading state every session, externalising nothing. Same model. Same window. A completely different bill.

That’s the Matthew principle in token form: to the one who spends the resource well, more is given. Spend it badly and you get locked into surface-skim work by your own bill — no room to run an agentic loop, no room to verify your own output, no room to recover from a failure or file a finding the moment it surfaces. Spend it well and all of that runs on the same allowance.

Same allowance. A completely different depth of work.

Claude Code’s own dashboard teaches the discipline almost in passing. “Longer sessions are more expensive even when cached. /compact mid-task, /clear when switching to new tasks.” That is context discipline by another name. “Each subagent runs its own requests. Be deliberate about spawning them.” That is route-don’t-carry — every subagent is another kernel load if you let it be. The mechanism is named in plain sight, for anyone who reads it.

To the one who has, more is given.

Three patterns

The discipline comes down to three habits. None of them are clever. All of them are the difference between two-turn burnout and a full day’s work inside a fifth of the window.

Route, don’t carry

The instruction file at the centre of my system doesn’t contain what it knows. It contains an index of where to find it. The answer is in this wiki page. The state is in this database. The history is in this commit. The model loads the index once and then knows where to look.

Contrast that with the X complaint. Paste eight hundred lines into chat and the model now carries all eight hundred, every turn, until the session dies — synthesising across the lot whether or not the current question touches it. The cost compounds with each exchange. Two turns, two hundred thousand tokens, and you’ve barely started.

Externalise memory, and give it structure

I run five surfaces for five kinds of state. A wiki for methodology. A routing file for what loads when. A working-memory file for accumulated corrections. Database models for live numbers. Git for trajectory. Each one does a single job, and each one lives outside the chat.

Use the conversation as your only memory and you’ve picked the most expensive surface there is — it reloads every turn, has no schema, can’t be searched, and forgets the moment the window closes. A database query costs almost nothing and returns the same answer every time. The chat costs everything and returns a slightly different answer each time you ask.

Let tools replace synthesis

When I want to know what’s on the floor, I don’t ask the model to reason about the state of the business. I run a command. It queries the database, assembles the structured answer, and hands the model something to translate rather than something to invent. That’s why the briefing turn cost 12k and the explain-it-to-me turn cost 60k. One ingested a structured result and paraphrased it. The other built understanding from nothing.

Externalise the computation. Have the model translate, not generate. The most expensive thing you can ask an LLM to do is think through state it could have queried.

The expensive shape

Look back at that table and notice which turn was the outlier. “Tell me about Joan” — 60k. Pure synthesis. No tool, no lookup, just the model building an explanation from the ground up.

Here’s the uncomfortable part: that’s the shape of the easiest prompt to write. “Explain this.” “Summarise that.” “Think through my situation.” Every one asks the model to generate from scratch what discipline would have routed to. The pure-synthesis prompt is the natural one to reach for and the expensive one to run. The allowance doesn’t burn on hard work. It burns on questions a lookup could have answered.

The cost I almost got wrong

I’ll own the cautionary tale, because it’s the honest centre of this. I spent real effort recently worried about how much context my system loads at boot — convinced the kernel was getting heavy, that the startup cost was the thing to cut.

Then I measured it. Boot context is small against a million-token window. The work the model does each turn dwarfs whatever the kernel costs to load. I’d been optimising the cheap thing. The kernel pays for itself many times over precisely because it routes instead of carries — and the people complaining about cost are paying for the absence of that routing, not the presence of a few thousand tokens of index.

That’s the line I’d staple to every “AI is too expensive” thread if I could. The model is a commodity, available to everyone on the same terms. What separates two turns from a full day’s shipped work isn’t the model. It’s whether you built somewhere for the context to live.

The door, not the pitch

I built an operating system on top of this — a named cast, a wiki, a routing layer, a boot sequence. But the system isn’t the point of this piece. The discipline is. Westworld is one expression of context discipline. There are others. You could build a far simpler version this afternoon with a folder of notes, a habit of routing instead of pasting, and a refusal to use the chat window as your filing cabinet.

The discipline is teachable. The substrate is just the proof that it works.


Related: The Imperial Rise of the Context Engineer names the practitioner this piece describes the daily habit of. The Five-Layer Stack is the externalised-memory architecture in full. Stop the Documentation Sprawl is route-don’t-carry applied to the knowledge base. The Correction Loop shows how the routing surface compounds, one fix at a time. Why You’re Burning Tokens in Three Turns is the companion piece — the same X-anguish read as the architect’s stance rather than the measured bill. Ingeniculture is the name for all of it.

← Back to Writing