Rules of Engagement
How to maintain code quality and mastery when working with LLMs
Context
I’ve tried vibe coding, and I’m not sold. It seems great at first, but all of a sudden, you realize you’re dealing with brand new legacy code that you’re completely unfamiliar with. So, I put together some rules of engagement: when to use or not use LLM-assisted coding.
The purposes of these rules are, in short, to use LLMs to make your life easier as a developer without compromising the quality of the code or your deep understanding of it.
Rules
Focus on what AI is good at: initial setup, reading a lot very quickly, and throwaway code.
DO offload large reading tasks.
You can hand off an absolutely insane quantity of code, documentation, etc to an LLM and get useful information out of it almost instantly. You can also ask questions about it, and get answers that are pretty accurate, although not perfect. Or, ask for a summary of details specific to your use case.
However, while reading through documentation yourself takes much longer, I don’t generally find it to be a waste of time. You might accidentally learn something unrelated to your current goals, but like, you’re still learning. This is how you become knowledgeable about things—you are explicitly giving up some of this benefit when you only read AI summaries that only contain what you need in the moment.
Do NOT use AI as a first-line reviewer.
Review things yourself first, whether it’s your own changes or someone (or something) else’s. Jumping to AI review first means you’ll be reading all of its findings blind, with limited ability to judge their correctness.
It can also direct focus to the wrong place, which you may not even know is happening. For example: AI brings up problems in file X. You look at that file, and after some back and forth, determine that there’s no real issue in practice. You chalk it up to a false positive, but the whole exchange implicitly hints to you that file Y is fine—but it isn’t! Another way to say this is that it adds noise, which makes false negatives by the AI harder to catch.
However, once you’ve reviewed something and determined that it LGTM, absolutely let an agent take a pass at it. You will find mostly false positives, which you can now easily dismiss, because you’re already familiar with the changes. Any true positives will be easy to spot, and easy wins.
DO let agents go wild in separate, easily deleted places
If an agent’s output is entirely contained to a single file, or single directory, that can be deleted at any time, without affecting any other functionality: go nuts!
This is a strategy that enables some interesting behavior. Want a mockup of some design or functionality as a standalone piece? Have an agent make an isolated file with the entire thing. Have it make 5 versions, so you can compare. Try out a refactor by asking for a rewritten version of something in a separate file. Be vigilant not to fall for the sunk cost fallacy—even if this output is good, it’s meant to be thrown away. At most, use it as a reference when you make the changes yourself.
Do NOT let agents add code to your codebase
This will probably be the controversial one. Yes, I actually think that, as a rule, you shouldn’t let a coding agent make changes to your code. There are obviously exceptions, like the initial setup of a new project, or filling in pseudocode (below). But overall, in my experience, letting agents make changes on your behalf is the root of all evil, when it comes to (human) comprehension rot and cognitive debt.
Yes, seriously. Even if you pinky-promise to be very careful and review all the changes yourself. There’s something about opening a file, navigating to the right place, and typing the individual characters yourself, that forces comprehension at a different level. It’s not magic, it just slows things down, which is the intention here.
Also, you’re not going to add 5,000 SLOC a day if you have to type them all. No one likes typing that much. That’s the point; if you’re doing that, you do not fully understand most of those lines anyway.
DO let AI fill in pseudocode for tests
I feel differently about letting agents write tests for a few reasons:
- I’m lazy about tests. If I decide that I’m going to manually write all of them, then I just won’t write enough tests.
- There tends to be a lot of minutiae and ceremony specific to the tests themselves. They also tend to be details that I don’t care about, because the tests are “leaves”—there’s nothing downstream of them. As long as the test passes or fails correctly, then it just doesn’t matter that much.
- The actual process of writing “do this, then that, then assert this” is fucking boring.
That said, whenever I’ve fully delegated test-writing to AI, the results have been pretty awful. It tests a lot of things that don’t need testing, and seemingly skips tests that would be complex. These tools are trained off of real code, so I guess that’s not too surprising.
I’ve had some success with prompts like “You’re the red team for testing, try to write failing tests that are still valid use cases”. (The phrase “red team” really seems to activate something in LLMs.) But ultimately, I think the best solution is to write out a test file in pseudocode, containing all the cases you can think of, and then have AI translate that into actual code.
Finally
These rules are not meant for agent-owned codebases, where no one truly knows what’s going on. They’re for human-owned code. The intent is that you, the human, can benefit from AI, without that detached feeling that comes from relinquishing too much control to AI.
Also, please note that none of these rules have the purpose of increasing speed. For those rules that do speed things up, it is an accidental side-effect of reducing uninteresting work.