Well said, I have been saying the same. Besides helping agents code, it helps us trust the outcome more. You can't trust a code not tested, and you can't read every line of code, it would be like walking a motorcycle. So tests (back pressure, deterministic feedback) become essential. You only know something works as good as its tests show.
What we often like to do in a PR - look over the code and say "LGTM" - I call this "vibe testing" and think it is the real bad pattern to use with AI. You can't commit your eyes on the git repo, and you are probably not doing as good of a job as when you have actual test coverage. LGTM is just vibes.
My mental model is that ai coding tools are machines that can take a set of constraints and turn them into a piece of code. The better you get at having it give its self those constraints accurately, the higher level task you can focus on.
Right now i spent a lot of “back pressure” on fitting the scope of the task into something that will fit in one context window (ie the useful computation, not the raw token count). I suspect we will see a large breakthrough when someone finally figures out a good system for having the llm do this.
This jumps to proof assistants and barely mentions fuzzing. I've found that with a bit of guidance, Claude is pretty good at suggesting interesting properties to test and writing property tests to verify that invariants hold.
With Visual Studio and Copilot I like the fact that runs a comment and then can read the output back and then automatically continues based on the error message let's say there's a compilation error or a failed test case, It reads it and then feeds that back into the system automatically. Once the plan is satisfied, it marks it as completed
Running all shorts of tests (e2e, API, unit) and for web apps using the claude extension with chrome to trigger web ui actions and observe the result. The last part helps a lot with frontend development.
Yeah, I spent way too long trying to think of how what the author was talking to was related to back pressure... I had a very stretched metaphor I was going with until I realized he wasn't talking about back pressure at all
I am not sure if I am missing something, since many people have made this comment, but isn't this in some ways similar to the shape of the traditional definition of back pressure, and not "entirely different"? A downstream consumer can't make its work through the queue of work to be done, so it pushes work back upstream - to you.
Well said, I have been saying the same. Besides helping agents code, it helps us trust the outcome more. You can't trust a code not tested, and you can't read every line of code, it would be like walking a motorcycle. So tests (back pressure, deterministic feedback) become essential. You only know something works as good as its tests show.
What we often like to do in a PR - look over the code and say "LGTM" - I call this "vibe testing" and think it is the real bad pattern to use with AI. You can't commit your eyes on the git repo, and you are probably not doing as good of a job as when you have actual test coverage. LGTM is just vibes.
My mental model is that ai coding tools are machines that can take a set of constraints and turn them into a piece of code. The better you get at having it give its self those constraints accurately, the higher level task you can focus on.
Eg compiler errors, unit tests, mcp, etc.
Ive heard of these; but havent tried them yet.
https://github.com/hmans/beans https://github.com/steveyegge/gastown
Right now i spent a lot of “back pressure” on fitting the scope of the task into something that will fit in one context window (ie the useful computation, not the raw token count). I suspect we will see a large breakthrough when someone finally figures out a good system for having the llm do this.
This jumps to proof assistants and barely mentions fuzzing. I've found that with a bit of guidance, Claude is pretty good at suggesting interesting properties to test and writing property tests to verify that invariants hold.
If you give Claude examples of good and bad property tests, and explain why, it gets much better than it was out of the box.
With Visual Studio and Copilot I like the fact that runs a comment and then can read the output back and then automatically continues based on the error message let's say there's a compilation error or a failed test case, It reads it and then feeds that back into the system automatically. Once the plan is satisfied, it marks it as completed
Beyond Linting and Shell Exec (gh, Playwright etc), what other additional tools did you find useful for your tasks, HN?!
Most of my feedback that can be automated is done either by this or by fuzzing. Would love to hear about other optimisations y'all have found.
Running all shorts of tests (e2e, API, unit) and for web apps using the claude extension with chrome to trigger web ui actions and observe the result. The last part helps a lot with frontend development.
Teaching them skills for running API and e2e tests and how to filter those tests so it can check if what it did works quickly.
Back pressure is not a good name for this. You already listed one that makes more sense - “feedback”
This use of the term “back pressure” is pretty confusing in a computer science context.
Yeah, I spent way too long trying to think of how what the author was talking to was related to back pressure... I had a very stretched metaphor I was going with until I realized he wasn't talking about back pressure at all
Others have pointed out the incongruity of back pressure here, I would have loved “feedback”.
I thought you are talking about back pressure pipes in my housing complex.
I’ve been wondering why I can’t use it to generate electricity.
"Back pressure" is already a term widely used in computing for something entirely different: https://schmidscience.com/what-does-back-pressure-in-compute...
I am not sure if I am missing something, since many people have made this comment, but isn't this in some ways similar to the shape of the traditional definition of back pressure, and not "entirely different"? A downstream consumer can't make its work through the queue of work to be done, so it pushes work back upstream - to you.
I have the same argument with “crypto”
And web 3? ;)