My tips for using LLM agents to create software

(efitz-thoughts.blogspot.com)

59 points | by efitz 7 hours ago ago

23 comments

manmal 2 hours ago ago
One weird trick is to tell the LLM to ask you questions about anything that’s unclear at this point. I tell it eg to ask up to 10 questions. Often I do multiple rounds of these Q&A and I‘m always surprised at the quality of the questions (w/ Opus). Getting better results that way, just because it reduces the degrees of freedom in which the agent can go off in a totally wrong direction.
[-]
- deadbabe 24 minutes ago ago
  This is a little anthropomorphic. The faster option is to tell it to give you the full content of an ideal context for what you’re doing and adjust or expand as necessary. Less back and forth.
  [-]
  - manmal 15 minutes ago ago
    Can you give me the full content of the ideal context of what you mean here?
athrowaway3z an hour ago ago
> One of the weird things I found out about agents is that they actually give up on fixing test failures and just disable tests. They’ll try once or twice and then give up.
Its important to not think in terms of generalities like this. How they approach this depends on your tests framework, and even on the language you use. If disabling tests is easy and common in that language / framework, its more likely to do it.
For testing a cli, i currently use run_tests.sh and never once has it tried to disable a test. Though that can be its own problem when it hits 1 it can't debug.
# run_tests.sh # Handle multiple script arguments or default to all .sh files
scripts=("${@/#/./examples/}")
[ $# -eq 0 ] && scripts=(./examples/*.sh)
for script in "${scripts[@]}"; do
```
    [ -n "$LOUD" ] && echo $script

    output=$(bash -x "$script" 2>&1) || {

        echo ""

        echo "Error in $script:"

        echo "$output"

        exit 1

    }
```
done
echo " OK"
----
Another tip. For a specific tasks don't bother with "please read file x.md", Claude Code (and others) accept the @file syntax which puts that into context right away.
CuriouslyC 3 hours ago ago
If I paid for my API usage directly instead of the plan it'd be like a second mortgage.
[-]
- 3abiton an hour ago ago
  To be fair, allocating some token for planning (recursively) helps a lot. It requires more hands on work, but produce much better results. Clarifying the tasks and breaking them down is very helpful too. Just you end up spending lots of time on it. On the bright side, Qwen3 30B is quite decent, and best of all "free".
pmxi an hour ago ago
> If you are a heavy user, you should use pay-as-you go pricing
if you’re a heavy user you should pay for a monthly subscription for Claude Code which is significantly cheaper than API costs.
[-]
- ramesh31 an hour ago ago
  Am I alone in spending $1k+/month on tokens? It feels like the most useful dollars i've ever spent in my life. The software I've been able to build on a whim over the last 6 months is beyond my wildest dreams from a a year or two ago.
  [-]
  - fainpul an hour ago ago
    > The software I've been able to build on a whim over the last 6 months is beyond my wildest dreams from a a year or two ago.
    If you don't mind sharing, I'm really curious - what kind of things do you build and what is your skillset?
  - zppln an hour ago ago
    Care to show what you've built?
xwowsersx 4 hours ago ago
This lines up with my own experience of learning how to succeed with LLMs. What really makes them work isn't so different from what leads to success in any setting: being careful up front, measuring twice and cutting once.
efitz 7 hours ago ago
I spent much of the last several months using LLM agents to create software. I've written two blog posts about my experience; this is the second post that includes all the things I've learned along the way to get better results, or at least waste less money.
[-]
- afeezaziz 4 hours ago ago
  you should write more about your experience using LLM. Is this solely using LLM?
rvz an hour ago ago
> I’m not a professional developer, just a hobbyist with aspirations
Stopped reading.
[-]
- navane 23 minutes ago ago
  If you kept reading you'd realize the guy was just humble bragging.
- exe34 36 minutes ago ago
  > I'm doing a (free) operating system (just a hobby, won't be big and professional like gnu) for 386(486) AT clones.
- indigodaddy an hour ago ago
  Why?
  [-]
  - compootr an hour ago ago
    I guess you need an active developer license to write blog posts
    [-]
    - rvz an hour ago ago
      Or maybe this industry still trusts experienced software engineers to write well maintained and robust software used by millions that make money.
  - rvz an hour ago ago
    It's quite simple.
    I perfer building and using software that is robust, heavily tested and thoroughly reviewed by highly experienced software engineers who understand the code, can detect bugs and can explain what each line of code they write does.
    Today, we are now in the phase where embracing mediocre LLM generated code over heavily tested / scrutinized code is now encoraged in this industry - because of the hype of 'vibe coding'.
    If you can't even begin to explain the code or point out any bugs generated by LLMs or even off-load architectural decisions to them, you're going to have a big problem in explaining that in code review situations or even in a professional pair-programming scenario.
    [-]
    - exe34 34 minutes ago ago
      > I perfer building and using software that is robust, heavily tested and thoroughly reviewed by highly experienced software engineers who understand the code, can detect bugs and can explain what each line of code they write does.
      that's amazing. by that logic you probably use like one or two pieces of software max. no windows, macos or gnome for you.
      [-]
      - XenophileJKO a few seconds ago ago
        LOL.. I was going to say after working in the tech industry.. half the time it is a rats nest in there.
        There are excellent engineers.. but their are also many not so great engineers and once the sausage is made it usually isn't a pretty picture inside.
        Usually only small young projects or maybe a beautiful component or two. Almost never an entire system/application.
    - rolisz 27 minutes ago ago
      Unfortunately, all of modern software depends on some random obscure dependency that is not properly reviewed https://xkcd.com/2347/