Agents of productivity and chaos

May, 2026 ∙ 11 minute read

It’s been over a year since I published Creative Flow vs. Critical Review, as much of my writing on AI has been internal. I’m hoping to publish a bit more here and wanted to start by sharing a little bit of my agent setup (as of May 2026). A version of this was originally published on an internal discussion board titled “Agents of productivity and chaos - multi-repo, multi-agent learnings and workflows”. For context, I’m mostly working in the GitHub Copilot App and the GitHub Copilot CLI, across a few dozen repos. The thing I wanted to share is that I’ve got a couple of skills that have been doing a lot of heavy lifting lately: one for planning multi-agent projects and another for delegating work — they show up a few times below, if you don’t read any further check them out!

Agents multiply effort, but if you’re going in the wrong direction they don’t help at all, they make it worse! Nobody wants to go faster in the wrong direction. Agents are also prone to making messes. They tend towards more code and more chaos. Entropy and all that. If you want to code with agents, one of your primary tasks is to counter both those behaviors. You need to:

The key is setting direction, steering, validating, and iterating on your processes. You’ve got to a) set them up for success b) validate the outputs and c) feed back what works and doesn’t. I’m NOT perfect at this (see failure modes section below), but I have learned a lot in the last few months. Here’s how I’m thinking about all this right now:

Failure modes

Not everything is golden in the world of agentic coding. Just a few of the things I’m struggling with right now for discussion:

In some ways, our code and coding process have always had these problems. GitHub itself was coded by thousands of engineers over almost 2 decades: it is the very definition of legacy code. Agents mean you get legacy code like that in a few weeks, days, hours (?). There’s no silver bullet, but I expect there are better abstractions yet to be discovered.


Examples

A few real examples of this process in action…

Decomposing a 1,800-line function

A vibe-coded Rust service I work on had a match in handle_client_message that had grown to ~1,800 lines and 228 message-type arms — the next likely candidate in a string of tokio worker stack-overflow incidents we’d been dealing with. I turned on clippy::large_futures at 16 KiB during diagnosis; the function blew past it immediately, which gave me a concrete lint to anchor on.

The fix was relatively mechanical, but this code base moves forward so quickly that it was important to sequence out a small series of patches.

  1. Introduce the futures size lint as a hard backstop so that we get CI failures, not application crashes.
  2. Land some repo-level skill changes so that everyone else’s agents know how to avoid the bad pattern going forward.
  3. Refactor the large match in a series of 7 sequential PRs designed to minimize merge conflicts, immediately reduce the stack sizes in this code path, and be authored largely in parallel by agents.

handle_client_message went from 16,296 bytes to 664 bytes, and the chain fed a pile of lessons back into the planning-multi-agent-projects and delegating-plan-work skills (the same skills that helped plan and execute the work to begin with).

Running Copilot CLI inside GitHub Actions

The GitHub Copilot App vendors the Rust copilot-sdk and the vendored copy needs to track upstream multiple times a day. We do this with a mix of deterministic and agentic Actions workflows.

  1. A standard Actions workflow just syncs any upstream changes into our vendored copy (obeying some rules about additional code we have that we haven’t upstreamed yet).
  2. That job pushes and opens a PR, assigns some humans as reviewers, and CCR starts reviewing as well.
  3. If everything is green and there are no review comments: we’re done.
  4. If anything fails, another action running the Copilot CLI kicks off and does the work to fix any breaking changes. It addresses CI failures, review comments, etc, and pushes to the same PR when it’s done.
  5. Everything still requires a human review and approval before merging.

A complexity-first rewrite

The vibe-coded diff viewing feature in the GitHub Copilot App had cost functions that grew with the size of the underlying diffs. To fix this, we didn’t need a faster diff algorithm or more caching (there were already too many layers of unnecessary caching from over eager agents trying to make things better) — it was some computer science fundamentals around complexity and use of appropriate data structures.

We’re still untangling this, but the process is: a multi-agent plan was written from the output of /research as a series of markdown documents that laid out a phased approach for refactoring the full stack feature. We wanted the complexity contract to be O(V) where V is the size of the viewport (the visible diff), not the size of the entire patch. In addition, we developed a skill that would hold agents to that contract and would pair with some deterministic tests to verify behavior. The skill and associated design documentation now live in the repo as markdown files for humans and agents to reference. Scrolling large diffs is much improved, but we’re still working through untangling some of the misguided caching and unnecessary code complexity, moving carefully to not break end users.


  This post was written with the help of AI (Claude Opus 4.7). The vast majority of the text was hand written, but I had an agent copy edit, fill in links, and resolve placeholders (e.g. TODO: grab this stat from datadog). See my ai attribution page for more.

Tim's Avatar Building GitHub since 2011, programming language connoisseur, Marin resident, aspiring surfer, father of two, life partner to @ktkates—all words by me, Tim Clem.