From Test-Driven to Loop-Driven Development
The evolution of AI coding from tests and prompts to agents, harnesses, and supervised loops.
Software engineering was always a loop. You change something, run it, check the result, and repeat. Test-driven development made this loop explicit: write a failing test, make it pass, then refactor. The loop is small, fast, and grounded in feedback.
Other development practices widened the same idea. Behaviour-Driven Development (BDD) moved the loop toward shared behavior and examples. Acceptance testing moved it toward user-visible definitions of done. CI moved it into the delivery pipeline. The shape stayed the same: define the next bit of intent, run the system, check the result, and tighten the design.
AI is widening the loop again. The thing inside one iteration used to be a line, a function, or a failing test. Now it can be a task, a pull request, a migration, or a recurring workflow. That is what I mean by loop-driven development: the engineer designs the trigger, goal, context, harness, verifier, and state around an agent loop.

This is not a claim that agents can safely own arbitrary software delivery. It is the opposite. The more autonomy you give the loop, the stronger the checks have to become. TDD did not remove engineering judgment. It pushed judgment into tests and refactoring. Loop-driven development does the same at a larger scale.
The progression is additive
The useful parts of each era do not disappear. Each era keeps the previous layer and adds a new control surface.
The progression looks like this:
code completion -> prompt loop -> repo context -> harness -> supervised loopThe unit of work keeps getting wider, and the engineer’s leverage point keeps moving up.
1. Autocomplete
2021-2022
Autocomplete put the model inside the editor. GitHub Copilot made this mainstream by drawing context from the code being edited and suggesting whole lines or functions. Cursor Tab belongs in the same era. It is the completion side of Cursor, where the model predicts the next edit and the developer accepts or rejects it while writing code.
The loop still lives mostly in the developer’s hands. You type, inspect the suggestion, accept or reject it, and continue. The benefit is speed: less boilerplate, fewer mechanical edits, and faster movement through familiar code. The limit is scope. The model helps with the next edit, but it does not own the task.
What got added:
Model
Local file context
Inline completion
This was the Autocomplete era.
2. Prompt Engineering
2022-2023
The next step was to move from completion to task steering. ReAct was not a coding assistant, but it gave agents an important primitive: reason, act, observe, repeat. The model could think about a step, call a tool, read the result, and continue.
AutoGPT made the idea feel autonomous. Instead of asking for one answer, you gave the system a goal and let it prompt itself. That shift created the first native discipline of this era: prompt engineering. The developer was no longer only writing code. The developer was writing instructions that caused code to be written.
I still believe this skill never fully goes away. You have to know how to talk to these models well, in one form or another, which is what I have been collecting in my Prompt Patterns book.
The benefit was delegation at the task level. You could ask for a script, a test suite, an investigation, or a migration plan. The limit was convergence. A prompt loop without disciplined context and a reliable stop condition can drift, repeat itself, or optimize for the wrong thing.
What got added:
✓ Model
Tools
Goal
Prompt loop
This was the Prompt Engineering era.
3. Context Engineering
2024-2025
Once agents could act, the bottleneck became what they could see. A coding agent needs repo context, not only a prompt. It needs files, tests, logs, conventions, architecture notes, issue history, and the current state of the work.
This is where Cursor Agent, Devin, and Ralph-style loops fit. Cursor Agent moves beyond tab completion into autonomous coding tasks, terminal commands, and file edits. Devin is positioned as an autonomous software engineer that can write, run, and test code. Ralph made a narrower but important point: durable state should live in files and git, not only in the chat transcript.
The benefit is scope. Agents can work across files, run commands, inspect failures, and make repo-aware changes. The limit is that context is not correctness. A well-contextualized agent can still finish the wrong task unless the environment can tell it what done means.
I wrote about the broader tool landscape in AI Coding Assistants Landscape. The pattern that matters here is the move from assistant-in-editor to agent-in-codebase.
What got added:
✓ Model
✓ Tools
✓ Goal
Repo context
Terminal / files
Tests
This was the Context Engineering era.
4. Harness Engineering
2025-2026
A harness is the environment a single agent runs inside. It includes the prompt, repo context, tools, sandbox, permissions, tests, linters, type checks, CI, evals, and review gates. The point of a harness is not to make the model magically correct. The point is to make the work observable, constrained, and checkable.
OpenAI Codex is a clear example. Codex runs in isolated cloud containers, works against the provided repository, edits files, runs commands, and can propose changes for review. OpenAI’s own harness engineering write-up describes the role shift directly: engineers design environments, specify intent, and build feedback loops that let agents do reliable work. Claude Code fits the same era from the terminal side: it understands a codebase, edits files, runs commands, and handles git workflows.
This is why I think the harness framing is more useful than another prompt taxonomy. In 12 Agentic Harness Patterns from Claude Code, I broke the harness into reusable patterns: persistent instructions, scoped context, memory tiers, tool permissions, lifecycle hooks, and workflow separation. In The Missing Quality Layer for AI Coding Agents, I argued that the next bottleneck is proving the diff is safe to review, not generating the diff.
The benefit is repeatability. The agent no longer just generates code. It runs inside a system that can reject bad work. Deterministic checks should come first: tests, builds, type checks, lint, contract tests, benchmarks, screenshots, traces, and CI. Model-based judging can help with subjective checks, but maker and checker should be separated. These checks matter most when they push back on the agent in the moment rather than after the fact, giving the loop backpressure so it self-corrects before a human has to step in.
What got added:
✓ Model
✓ Tools
✓ Goal
✓ Repo context
Sandbox
Verifier
CI / eval harness
This was the Harness Engineering era.
5. Loop Engineering
Now
Once the harness is reliable, the next layer is the loop that runs it. A loop is not just an automation. An automation executes fixed steps. A loop has a decision inside it. It checks whether the goal is met and decides whether to continue.
A practical agent loop has five parts:
Trigger: human kickoff, schedule, or event
Goal: the desired end state
Harness: the environment the agent runs in
Verifier: the check that decides whether to continue
State: memory outside the current model call
This is where current tools are converging. Codex has /goal for long-running work with a verifiable stopping condition and Automations for recurring tasks. Claude Code has /goal, /loop, and scheduled tasks for recurring work. MCP gives agents a standard way to connect to external tools and data sources. Addy Osmani’s Loop Engineering framing captures the same ingredient set: automations, worktrees, skills, connectors, sub-agents, and memory.
The benefit is leverage. A loop can watch CI, triage issues, update dependencies, fix flaky tests, chase review feedback, prepare PRs, and keep working until a condition holds. The risk grows with the leverage. A bad prompt wastes a turn. A bad loop can waste hours, mutate the repo, and generate a pile of plausible work that still needs human judgment.
Skills and playbooks become especially important here. A loop that has to rediscover project conventions every run is fragile. A loop that can call well-scoped skills has a better chance of doing the same work consistently. That is why I see skills as part of the loop substrate, not only better prompt files. I wrote more about that in 9 Principles That Separate Useful Skills from Markdown Essays.
What got added:
✓ Model
✓ Tools
✓ Goal
✓ Repo context
✓ Verifier
Automations
Worktrees
Skills / playbooks
Connectors (MCP)
Durable memory
Orchestration
This is the Loop Engineering era.
The Engineering Leverage Stack
What the stack below shows is not five tools but a single control point moving up. Each era lets you author less of the code directly and more of the system that produces it, trading fine-grained control for reach. The higher you stand, the more a single decision is worth, and the more it leans on the checks in the layers beneath it.

A harness is the environment for one agent run. A loop is the control system around that harness. A factory is a system of loops: one loop finds work, another implements it, another verifies it, another opens or updates the PR, and another escalates what needs human judgment.
That is the leverage shift:
The important word is wrap. Prompting did not replace coding. Context did not replace prompting. Harnesses did not replace context. Loops do not replace harnesses. Each layer wraps the one below and changes where engineering judgment is applied.
What loop-driven development means
Loop-driven development is TDD at a larger unit of intent. In TDD, the loop wraps a unit of behavior: write the failing test, make it pass, refactor. In loop-driven development, the loop can wrap a task, a PR, a migration, or a recurring workflow.
The verifier is the difference between a loop and a vibe. Without a verifier, you have repeated prompting. With a verifier, the loop can converge. The verifier can be deterministic, like tests and builds, or probabilistic, like a separate reviewer model, but it has to exist outside the agent’s desire to be done.
This is also where the human role becomes more important, not less. The engineer chooses the goal, designs the context, sets the permissions, defines the checks, reviews the result, and decides what risks are acceptable. The loop can run faster than you can type, but it cannot decide what should matter.
The takeaway
Software was always written in a loop. TDD made the loop explicit around behavior. BDD and acceptance testing widened it toward product intent. AI is widening it again around agents, harnesses, and recurring workflows.
That is the shift from test-driven to loop-driven development.
Not because tests stop mattering. Because tests, evals, reviewers, sandboxes, worktrees, skills, memory, and CI are becoming parts of a larger loop.
Build the loop. Stay the engineer.
Subscribe to The Generative Programmer for practical maps, pattern catalogs, and production notes on AI coding agents.


