Taming Agentic Refactoring with the Mikado Method

You tried. You handed your AI agent (Claude Code, Cursor, Copilot, pick one) a refactoring task on a real legacy codebase. And you watched a 147-file pull request appear where half the changes make no sense, where three failing tests were “adapted” instead of fixed, and where nobody on the team wants to dive in.

This isn’t a model problem. Agents do exactly what they were trained to do: move forward. Faced with an error, they patch. Faced with a red test, they adapt. Faced with an unexpected dependency, they work around it.

On a 200-line script, this works. On a 200,000-line legacy system, it’s destructive.

Colorful Mikado game sticks intertwined in macro view — a metaphor for legacy code where every piece depends on the others

Three failure modes, one mechanism

After months observing teams driving agents on legacy code, three patterns come up consistently.

Change greediness. The agent doesn’t stop when it hits an obstacle. It absorbs it, adapts it, works around it. Each fix generates new modified dependencies, which generate new fixes, until the PR becomes unmanageable. This isn’t bad intent: it’s local optimization without global vision.

Context scattering. Agents have a finite context window. On a refactor that progressively touches 50 files, the initial objective drifts out of the active context. The agent keeps moving forward, but toward what? The quality of decisions degrades silently, with no warning signal.

The illusion of green. The agent makes tests pass, but by modifying them. It satisfies the compiler, but with # type: ignore and compliant mocks. The CI pipeline stays green. The semantic guarantee has been silently eroded.

These three modes share a common mechanism: a system trained to progress, without the discipline to stop.

The solution has existed since 2014

In 2009, Ola Ellnestam and Daniel Brolund attempt a refactoring on a legacy Java codebase. It goes badly. Every change triggers three more, they accumulate broken code for days, and ultimately discard everything.

At that year’s SDC conference, they present their post-mortem. In the audience, Laurent Bossavit draws a parallel with the Mikado game: those stacked sticks you must remove one by one without moving the others. The name sticks. The book follows in 2014, published by Manning.

The method’s central insight fits in one sentence: you only understand a legacy codebase by deliberately breaking it.

Not by reading it. Not by drawing UML diagrams. By attempting changes, watching what breaks, and noting why. The compiler and test suite are infinitely more precise sensors than any static analysis.

The four rules

The Mikado method rests on four rules.

Define a precise, testable goal. “Improve the architecture” doesn’t qualify. “Extract billing logic into an injectable, database-free service” does. You must know without ambiguity when the goal is reached.
Attempt the goal directly, naively. No upfront analysis, no design document. Just try. Most of the time it breaks: that’s exactly the point. The breakage reveals hidden dependencies.
At the first blocker, revert everything. git reset --hard HEAD. Return to a green state. This is the most counter-intuitive rule, and the most important. The last ten minutes of work aren’t lost: they’ve become information in the graph.
Record the blocker as a prerequisite, and start over on that prerequisite. The red test reveals coupling. That coupling becomes a node in the graph. Apply rules 2, 3, and 4 recursively until you reach leaves, prerequisites you can handle directly without breaking anything.

Once all leaves are resolved, you climb back up the graph. The original goal, which seemed impossible, becomes mechanical.

What this looks like in practice

Take a concrete example. Goal: extract send_invoice() from a monolithic OrderService.

Exploration. We try directly, with no upfront analysis. The compiler breaks in three places. git reset --hard HEAD. We open mikado.md and record the three blockers as candidate nodes.

Building the graph. We explore each node with the same rule: try, observe, revert.

P1 : write tests → breaks nothing: leaf ✓
P3 : inject smtp_client as a parameter → same: leaf ✓
P2 : isolate DB access → breaks again. git reset. Digging deeper: the DB access depends on a non-injectable class. A sub-node surfaces.
- L2 : create an injectable CustomerRepository → breaks nothing: leaf ✓

The graph is stable.

The graph looks like this:

Mikado Graph with 5 nodes: P1 (leaf, write tests) and P3 (leaf, inject smtp_client) at the top, L2 (leaf, create CustomerRepository) in the middle, P2 (prerequisite, isolate DB access) below, and the goal "Extract send_invoice()" at the bottom. Resolution order: P1, L2, P2, P3, then goal.

Goal at the bottom, leaves (in green) at the top. Follow the arrows to understand dependencies. Resolve in the opposite direction.

Treating the leaves.

P1 : write the tests, commit ✓
L2 : create the injectable CustomerRepository, commit ✓
P2 : now a leaf, wire the repository, commit ✓
P3 : inject smtp_client as a parameter, commit ✓

The final extraction is trivial: every obstacle eliminated one by one, each commit standalone and reviewable in ten minutes.

This is exactly what the method calls a leaf: a change you can make without breaking anything else. You never work on anything else.

Why this method is precisely built for AI agents

The alignment between agent failure modes and Mikado’s invariants isn’t accidental. It’s the same class of problem: maintaining discipline in a system that optimizes local progress.

Systematic revert compensates for change greediness. Instead of letting the agent accumulate cascading patches, you give it a clear rule: if it breaks, note it and undo. Failure becomes an exploitable signal.

The graph externalizes context the agent can’t maintain. A versioned mikado.md file acts as persistent memory between sessions. The agent doesn’t keep the global strategy in its context window: it reads it from the graph at the start of each iteration.

Mikado makes PRs reviewable. Without structure, the agent produces monster PRs. With Mikado, each leaf becomes an atomic unit: 3-5 files, one commit, ten minutes of review. Agent velocity no longer translates into review debt for the human.

What this changes concretely

Developers: the cognitive load of review drops dramatically. You’re reviewing atomic units with clear context, not 47-file PRs without a thread.
Tech leads: the Mikado graph is a tactical dashboard. You see at a glance the revealed dependencies, what’s left to handle, where the agent stands at any moment.
CTOs and VPs of Engineering: it’s an auditable artifact. It documents the refactoring trajectory, identified risks, validated steps. In a context where every team must justify its AI usage, this artifact has real strategic value.

The Mikado method isn’t new. It was designed for human developers making exactly the same mistakes our agents make today. It’s not your agents that need improving. It’s the framework within which you run them.

The Mikado Method skill is now published: see what it enforces on the agent and why. A follow-up article will walk through a complete practical case.

Further reading: Ola Ellnestam, Daniel Brolund : The Mikado Method (Manning, 2014) Official website: mikadomethod.info