Last month I shipped 37 story points across 2 repos in a single day. I wrote maybe 20 lines of code. The rest was written, tested, reviewed, and merged by AI agents I configured and directed. This is not a future-state pitch. This is how I build software right now.

The Old Way Was Already Dead

Six months ago my workflow looked like most developers who had adopted AI tooling. I would open a file, write some code, lean on Copilot or Claude for autocomplete, review what it suggested, tweak it, test it, commit. AI was my assistant. I was still the one holding the keyboard.

That model caps out fast. You are still the bottleneck. You still context-switch between writing, reviewing, testing, and deploying. You still get pulled into the mechanical parts of software development that have nothing to do with solving the actual problem. I kept thinking: if the AI can write a function, why can't it write the whole feature? Why am I still the one copying file paths and running test suites?

So I stopped. I rebuilt my entire workflow around AI agents that own the full software development lifecycle, from spec to merged PR. I use this system daily to build OnPlane, an AI golf coaching app, across five repositories. Here is how it works.

Specs Become Agent-Ready: The /enrich-issue Process

Agents cannot work from vague tickets. "Add user preferences" means nothing to a system that needs exact file paths, test expectations, and acceptance criteria. So I built a skill called /enrich-issue that transforms rough human tickets into agent-ready specifications.

I write a one-line ticket: "Add club selection persistence to practice sessions." The enrich skill reads the codebase, identifies every file that will need to change, determines which services are involved, writes concrete test cases, defines acceptance criteria, and sizes the issue. What comes out is a structured spec that an agent can execute without asking me a single clarifying question.

The sizing matters. I do not size issues for humans. An M-sized issue (5 story points) means 1-2 hours of agent time. That typically includes a database schema change, a new service layer, an API endpoint, and full test coverage. That is a day of focused human work compressed into an agent-sized unit.

The Execution Loop: /execute-issue End-to-End

Once a ticket is enriched, I run /execute-issue. This single skill runs the full SDLC autonomously:

  1. Create a git worktree off the main branch so the agent works in isolation
  2. Implement the feature following the enriched spec, writing production code and tests
  3. Run make quality before any commit — linting, type checking, test suite, the works
  4. Commit with conventional commits enforced by the rules system
  5. Create a pull request with a structured description
  6. Spawn a code-reviewer subagent that reviews the PR against project standards
  7. Address review feedback if the reviewer flags issues
  8. Merge once all checks pass

The whole loop runs without me touching it. I get a Slack notification when the PR is ready for final review, and most of the time the code-reviewer subagent has already approved it. If something fails — a test breaks, CI rejects a check — the agent reads the error, fixes the code, and retries. I built in CI conservation budgets per issue size so agents do not burn through unlimited pipeline minutes on a stuck problem.

Multi-Agent Coordination

Single agents hit limits on complex features that span multiple services. So I run multi-agent teams. A team lead agent reads the enriched spec and breaks it into sub-tasks: one for the backend agent, one for the mobile agent. Each specialist works in its own worktree, against its own repo, following repo-specific rules.

This works because of how I structure configuration. I use a layered .claude/ directory across every repo with shared rules symlinked from a central location. Conventional commits, OpenAPI standards, telemetry conventions — these are universal. Repo-specific rules layer on top: the backend repo enforces database migration patterns, the mobile repo enforces SwiftUI architecture conventions.

The permission model is explicit. Each repo's settings.json has an allowlist and denylist for what agents can do. Backend agents can run database migrations. Mobile agents cannot. No agent can force-push to main. This is not trust — it is engineering.

Quality Without Micromanagement

People hear "AI writes my code" and assume quality drops. The opposite happened. Every commit goes through make quality before it is even staged. Every PR gets reviewed by a code-reviewer subagent that checks against the same standards I would: test coverage, error handling, naming conventions, API contract compliance. If the reviewer rejects the PR, the implementing agent fixes the issues and resubmits.

The rules system is the backbone. My .claude/ configuration encodes years of engineering opinions: how to structure error responses, when to use transactions, what telemetry to emit, how to name database columns. Agents follow these rules every time. Humans forget. Humans get lazy on a Friday afternoon. The rules system does not.

I also use checkpoint files for context management across long sessions. When an agent finishes a task, it writes a checkpoint summarizing what was done, what state the codebase is in, and what comes next. If I need to restart a session or hand context to a different agent, the checkpoint file gets it up to speed in seconds instead of re-reading the entire codebase.

The Infrastructure That Makes It Work

Agents are only as good as the information they can access. I run MCP server integrations that give agents direct access to the systems they need: Grafana for observability, AWS for infrastructure state, Postgres for database schema introspection, Slack for notifications, Notion for project documentation. When a backend agent needs to understand the current database schema before writing a migration, it queries Postgres directly. No copy-pasting. No stale documentation.

This connectivity is what separates a toy demo from a production workflow. An agent that can read your monitoring dashboards, check your infrastructure state, and query your actual database schema makes fundamentally better decisions than one working from a markdown file you updated three weeks ago.

What I Actually Do Now

My day looks nothing like a traditional developer's day. I spend mornings reviewing what agents shipped overnight — reading PRs, checking metrics, verifying that new features behave correctly in staging. Then I spend time on the things that actually require a human: talking to users, deciding what to build next, designing system architecture, writing the rough tickets that /enrich-issue will transform into agent-ready specs.

I still write code occasionally. When I am exploring a new architecture pattern, prototyping an interaction model, or debugging something genuinely novel, I will open an editor. But the mechanical work — the implementation, the boilerplate, the test writing, the PR cycle — that belongs to agents now.

The shift is not from "writing code" to "not writing code." It is from "I build software" to "I direct teams that build software." The team just happens to be AI.

What This Means for Engineering Leaders

If you manage engineers, this should change how you think about capacity planning. A single developer with a well-configured agent system can sustain the output of a small team. Not because the AI is magic, but because the human bottleneck — context switching, mechanical implementation, review cycles — gets removed.

The investment is in configuration, not in prompting. I spent weeks building my .claude/ rules, my skills, my MCP integrations. That is the moat. Anyone can type "build me a login page" into an AI chat. Very few people have built the layered configuration system that lets an agent autonomously implement a feature across multiple services, test it, review it, and ship it while respecting your team's specific engineering standards.

The developers who thrive in this model are the ones who think in systems. You need to understand architecture deeply enough to write the rules agents follow. You need to size and spec work in a way that agents can execute cleanly. You need to build quality gates that catch problems before they hit production. The job shifts from writing code to engineering the system that writes code.

I am not going back. The 37-point day was not an outlier — it was a Tuesday. This is just how I build software now.