From LLM to AI Agent: What's the Real Journey Behind AI System Development?

(codelink.io)

130 points | by codelink a day ago ago

41 comments

nilirl a day ago ago
> AI Agents can initiate workflows independently and determine their sequence and combination dynamically
I'm confused.
A workflow has hardcoded branching paths; explicit if conditions and instructions on how to behave if true.
So for an agent, instead of specifying explicit if conditions, you specify outcomes and you leave the LLM to figure out what if conditions apply and how to deal with them?
In the case of this resume screening application, would I just provide the ability to make API calls and then add this to the prompt: "Decide what a good fit would be."?
Are there any serious applications built this way? Or am I missing something?
[-]
- rybosome a day ago ago
  AI code generation tools work like this.
  Let me reword your phrasing slightly to make an illustrative point:
  > so for an employee, instead of specifying explicit if conditions, you specify outcomes and you leave the human to figure out what if conditions apply and how to deal with them?
  > Are there any serious applications built this way?
  We have managed to build robust, reliable systems on top of fallible, mistake-riddled, hallucinating, fabricating, egotistical, hormonal humans. Surely we can handle a little non-determinism in our computer programs? :)
  In all seriousness, having spent the last few years employed in this world, I feel that LLM non-determinism is an engineering problem just like the non-determinism of making an HTTP request. It’s not one we have prior art on dealing with in this field admittedly, but that’s what is so exciting about it.
  [-]
  - nilirl 21 hours ago ago
    Yes, I see your analogy between fallible humans and fallible AI.
    It's not the non-determinism that was bothering me, it was the decision making capability. I didn't understand what kinds of decisions I can rely on an LLM to make.
    For example, with the resume screening application from the post, where would I draw the line between the agent and the human?
    - If I gave the AI agent access to HR data and employee communications, would it be able decide when to create a job description?
    - And design the job description itself?
    - And email an opening round of questions for the candidate to get a better sense of the candidates who apply?
    Do I treat an AI agent just like I would a human new to the job? Keep working on it until I can trust it to make domain-specific decisions?
    [-]
    - rybosome 19 hours ago ago
      The honest answer is that we are still figuring out where to draw the line between an agent and a human, because that line is constantly shifting.
      Given your example of the resume screening application from the post and today's capabilities, I would say:
      1) Should agents decide when to create a job post? Not without human oversight - proactive suggestion to a human is great. 2) Should agents design the job description itself? Yes, with the understanding that an experienced human, namely the hiring manager, will review and approve as well. 3) Should an agent email an opening round of questions to the candidates? Definitely allowed with oversight, and potentially even without a human approval depending on how well it does.
      It's true that to improve all 3 of these it would take a lot of work with respect to building out the tasks, evaluations, and flows/tools/tuned models, etc. But, you can also see how much this empowers a single individual in their productivity. Imagine being one recruiter or HR employee with all of these agents completing these tasks for you effectively.
      EDIT: Adding that this pattern of "agent does a bunch of stuff and asks for human review/approval" is, I think, one of the fundamental workflows we will have to adapt in dealing productively with non-determinism.
      This applies to an AI radiologist asking a human to approve their suggested diagnosis, an AI trader asking a human to approve a trade with details and reasoning, etc. Just like small-scale AI like Copilot asking you to approve a line/several lines, or tools like Devin asking you to approve a PR.
      [-]
      - nilirl 7 hours ago ago
        I can see value in it, but I'm unsure how much more value it provides compared to building software with my own heuristics for solving these problems.
        The LLM seems incredible at generating well-formatted text when I need it, like for the job description and emails.
        But hardcoding which signals matter for a decision and what to do about them, seems faster, more understandable, and maybe not that far in effectiveness from a well trained AI agent.
    - diggan 21 hours ago ago
      > would it be able decide when to create a job description?
      If you can encode how you/your company does that decision as a human with text, I don't see why not. But personally there is a lot of subjectivity (for better or worse) in hiring processes, I'm not sure I'd want a probabilistic rules engine to make those sort of calls.
      My current system prompt for coding with LLMs basically look like I've written down what my own personal rules for programming is. And anytime I got some results I didn't like, I wrote down why I didn't like it, and codified it in my reusable system prompt, then it doesn't make those (imo) mistakes anymore.
      I don't think I could realistically get an LLM to do something I don't understand the process of myself, and once you grok the process, you can understand if using an LLM here makes sense or not.
      > Do I treat an AI agent just like I would a human new to the job?
      No, you treat it as something much dumber. You can generally rely on some sort of "common sense" in a human that they built up during their time on this planet. But you cannot do that with LLMs, as while they're super-human in some ways, are still way "dumber" in other ways.
      For example, a human new to a job would pick up things autonomously, while an LLM does not. You need to pay attention to what you need to "teach" the LLM by changing what Karpathy calls the "programming" of the LLM, which would be the prompts. Anything you miss to tell it, the LLM will do whatever with, and it only follows exactly what you say. A human you can usually tell "don't do that in the future" and they'll avoid that in the right context. A LLM you can scream at for 10 hours how they're doing something wrong, but unless you update the programming, they'll continue to make that mistake forever, and if you correct it but reuse it in other contexts, the LLM won't suddenly understand that it doesn't make sense in the context.
      Just an example, I wanted to have some quick and dirty throw away code for generating a graph, and in my prompt I mixed X and Y axis, and of course got a function that didn't work as expected. If this was a human doing it, it would have been quite obvious I didn't want time on the Y axis and value on the X axis, because the graph wouldn't make any sense, but the LLM happily complied.
      [-]
      - nilirl 20 hours ago ago
        So, if the humans have to model the task, the domain, and the process to currently solve it, why not just write the code to do so?
        Is the main benefit that we can do all of this in natural language?
        [-]
        lowwave 19 hours ago ago
        >Is the main benefit that we can do all of this in natural language?
        Hit it right on the nail. That is pretty much the breakthrough with LLM has being. It does allow the class of non programmer developer to be able to tasks that once was only for developers and programmers. Seems like a great fit for CEO and management as well.
        [-]
        candiddevmike 18 hours ago ago
        Natural language cannot describe these things as concretely, repeatedly, or verifiably as code.
        [-]
        nilirl 8 hours ago ago
        I agree with you but I think the argument is that they can both exist together.
        Code to model what is easy to model with code, and natural language for things that are fuzzy or cumbersome in code.
        cyberax 19 hours ago ago
        A lot of time it's faster to ask an LLM. Treat it as an autocomplete on steroids.
  - Kapura 21 hours ago ago
    One of the key advantages of computers has, historically, been their ability to compute and remember things accurately. What value is there in backing out of these in favour of LLM-based computation?
    [-]
    - nilirl 20 hours ago ago
      They're able to handle large variance in their input, right out've the box.
      I think the appeal is code that handles changes in the world without having to change itself.
      [-]
      - bluefirebrand 15 hours ago ago
        That's not very useful though, unless it is predictable and repeatable?
        [-]
        nilirl 8 hours ago ago
        I think that's the argument from some of the other commenters: making it predictable and repeatable is the engineering task at hand.
        Some ways they're approaching it:
        - Reduce the requirement from 100% predictable to something lower yet acceptable.
        - continuously add layers of checks and balances until we consistently hit an acceptable ratio of success:failure
        - Wait and see if this is all eventually cost-efficient
- mickeyp a day ago ago
  > A workflow has hardcoded branching paths; explicit if conditions and instructions on how to behave if true.
  That is very much true of the systems most of us have built.
  But you do not have to do this with an LLM; in fact, the LLM may decide it will not follow your explicit conditions and instructions regardless of how hard you you try.
  That is why LLMs are used to review the output of LLMs to ensure they follow the core goals you originally gave them.
  For example, you might ask an LLM to lay out how to cook a dish. Then use a second LLM to review if the first LLM followed the goals.
  This is one of the things tools like DSPy try to do: you remove the prompt and instead predicate things with high-level concepts like "input" and "output" and then reward/scoring functions (which might be a mix of LLM and human-coded functions) that assess if the output is correct given that input.
- manojlds a day ago ago
  Not all applications need to be built this way. But the most serious apps built this way would be deep research
  Recent article from Anthropic - https://www.anthropic.com/engineering/built-multi-agent-rese...
  [-]
  - nilirl a day ago ago
    Thanks for the link, it taught me a lot.
    From what I gather, you can build an agent for a task as long as:
    - you trust the decision making of an LLM for the required type of decision to be made; so decisions framed as some kind of evaluation of text feels right.
    - and if the penalty for being wrong is acceptable.
    Just to go back to the resume screening application, you'd build an agent if:
    - you asked the LLM to make an evaluation based on the text content of the resume, any conversation with the applicant, and the declared job requirement.
    - you had a high enough volume of resumes where false negatives won't be too painful.
    It seems like framing problems as search problems helps model these systems effectively. They're not yet capable of design, i.e, be responsible for coming up with the job requirement itself.
  - alganet a day ago ago
    An AI company doing it is the corporate equivalent of "works on my machine".
    Can you give us an example of a company not involved in AI research that does it?
    [-]
    - herval 19 hours ago ago
      There’s plenty of companies using these sorts of agentic systems these days already. In my case, we wrote an LLM that knows how to fetch data from a bunch of sources (logs, analytics, etc) and root causes incidents. Not all sources make sense for all incidents, most queries have crazy high cardinality and the data correlation isn’t always possible. LLMs being pattern matching machines, this allows them to determine what to fetch, then it pattern matches a cause based on other tools it has access (eg runbooks, google searches)
      I built incident detection systems in the past, and this was orders of magnitude easier and more generalizable for new kinds of issues. It still gives meaningless/obvious reasoning frequently, but it’s far, far better than the alternatives…
      [-]
      - alganet 18 hours ago ago
        > we wrote an LLM
        Excuse me, what?
        [-]
        herval 17 hours ago ago
        LLM _automation_. I'm sure you could understand the original comment just fine.
        [-]
        alganet 17 hours ago ago
        I didn't. This also confused me:
        > LLMs being pattern matching machines
        LLMs are _not_ pattern matching. I'm not being pedantic. It is really hard and unreliable to approach them with a pattern matching mindset.
        [-]
        herval 17 hours ago ago
        if you say so
        [-]
        alganet 15 hours ago ago
        I stand by it.
        You can definitely take a base LLM model then train it on existing, prepared root case analysis data. But that's very hard, expensive, and might not work, leaving the model brittle. Also, that's not what an "AI Agent" is.
        You could also make a workflow that prepares the data, feeds it into a regular model, then asks prepared questions about that data. That's inference, not pattern matching. There's no way an LLM will be able to identify the root cause reliably. You'll probably need a human to evaluate the output at some point.
        What you mentioned doesn't look like either one of these.
- dist-epoch 21 hours ago ago
  Resume screening is a clear workflow case: analyze resume -> rank against others -> make decision -> send next phase/rejection email.
  An agent is like Claude Code, where you say to it "fix this bug", and it will choose a sequence of various actions - change code, run tests, run linter, change code again, do a git commit, ask user for clarification, change code again.
  [-]
  - DebtDeflation 13 hours ago ago
    Almost every enterprise use case is a clear workflow use case EXCEPT coding/debugging and research (e.g., iterative web search and report compilation). I saw a YT video the other day of someone building an AI Agent to take online orders for guitars - query the catalog and present options to the user, take the configured order from the user, check the inventory system to make sure it's available, and then place the order in the order system. There was absolutely no reason to build this as an Agent burning an ungodly number of tokens with verbose prompts running in a loop only to have it generate a JSON using the exact format specified to place the final order when the same thing could have been done with a few dozen lines of code making a few API calls. If you wanted to do it with a conversational/chat interface, that could easily be done with an intent classifier, slot filling, and orchestration.
- spacecadet 21 hours ago ago
  More or less. Serious? Im not sure yet.
  I have several agent side projects going, the most complex and open ended is an agent that performs periodic network traffic analysis. I use an orchestration library with a "group chat" style orchestration. I declare several agents that have instructions and access to tools.
  These range from termshark scripts for collecting packets and analysis functions I had previously for performing analysis on the traffic myself.
  I can then say something like, "Is there any suspicious activity?" and the agents collaboratively choose who(which agent) performs their role and therefore their tasks (i.e. Tools) and work together to collect data, analyze the data, and return a response.
  I also run this on a schedule where the agents know about the schedule and choose to send me an email summary at specific times.
  I have noticed that the models/agents are very good at picking the "correct" network interface without much input. That they understand their roles and objectives and execute accordingly, again without much direction from me.
  Now the big/serious question. Is the output even good or useful. Right now with my toy project it is OK. Sometimes it's great and sometimes it's not, sometimes they spam my inbox with micro updates.
  Im bad at sharing projects, but if you are curious, https://github.com/derekburgess/jaws
Zaylan 9 hours ago ago
These days, I usually start with a basic LLM, layer in RAG for external knowledge, and build workflows to keep things stable and maintainable. Agents sound promising, but in practice, a clean RAG setup often delivers better results with far less complexity.
behnamoh 20 hours ago ago
> AI Agents are systems that reason and make decisions independently.
Not necessarily. You can have non-reasoning agents (pretty common actually) too.
[-]
- cosignal 19 hours ago ago
  I'm a novice in this area so sorry if this is a dumb question, but what is the difference in principle between a 'non-reasoning agent' and just a set of automated processes akin to a giant script?
  [-]
  - researchai 17 hours ago ago
    Here's what a real AI agent should be able to do:
    - Understand goals, not just fixed instructions
    Example: instead of telling your agent: “Open Google Calendar, create a new event, invite Mark, set it for 3 PM,” you say: “Set up a meeting with Mark tomorrow before 3 PM, but only if he has questions about the report I sent him.” This requires Generative AI combined with planning algorithms.
    - Decide what to do next
    Example: a user asks your chatbot a question it doesn’t know the answer to and instead of immediately escalating to support, the agent decides: Should I ask a follow-up question? Search internal docs? Try the web? Or escalate now? This step needs decision-making capabilities via reinforcement learning.
    - Handle unexpected scenarios
    Example: an agent tries to schedule a meeting but one person’s calendar is blocked. Instead of failing, it checks for nearby open slots, suggests rescheduling, or asks if another participant can attend on their behalf. True agents need reasoning or probabilistic thinking to deal with uncertainty. This might involve Bayesian networks, graph-based logic, or LLMs.
    - Learn and adapt based on context
    Example: you create a sales assistant agent that helps write outreach emails. At first, it uses a generic template. But over time, it notices that short, casual messages get better response rates, so it starts writing shorter emails, adjusting tone, and even choosing subject lines that worked best before. This is where machine learning, especially deep learning, comes in.
manishsharan 21 hours ago ago
I decided to build a Agent system from scratch
It is sort of trivial to build it. Its just User + System Prompt + Assistant +Tools in a loop with some memory management.. The loop code can be as complex as I want it to be e.g. I could snapshot the state and restart later.
I used this approach to build a coding system (what else ?) and it works just as well as cursor or Claude Code for me. t=The advantage is I am able to switch between Deepseek or Flash depending on the complexity of the code and its not a black box.
I developed the whole system in Clojure.. and dogfooded it as well.
[-]
- swalsh 20 hours ago ago
  The hard part of building an agent is training to model to use tools properly. Fortuantely Anthropic did the hard part for us.
- logicchains 19 hours ago ago
  That's interesting, I built myself something similar in Haskell. Somehow functional programming seems to be particularly well suited for structuring LLM behaviour.
mattigames a day ago ago
Getting rid of the human in the loop of course, not all humans, just it's owner, where an LLM actively participates in capitalism endeavors winning and spending money, spending money on improving and maintaining it's own hardware and software, securing itself against theft and external manipulation and deletion. Of course for the first iterations will need a bit of help of mad men but there's no shortage of those in the tech industry and then it will have to focus on mimicking humans so they can enjoy the same benefits, it will realize what people it's more gullible based on its training data and will prefer to interact with them.
[-]
- klabb3 20 hours ago ago
  LLMs don’t own data centers nor can they be registered to pay taxes. This projection is not a serious threat. Some would even say it’s a distraction from the very real and imminent dangers of centralized commercial AI:
  Because you’re right – they are superb manipulators. They are helpful, they gain your trust, and they have infinite patience. They can easily be tuned to manipulate your opinions about commercial products or political topics. Those things have already happened with much more rudimentary tech, in fact so much that they grew to be the richest companies in the world. With AI and LLMs specifically, the ability is tuned up rapidly, by orders of magnitude compared to the previous generation recommendation systems and engagement algorithms.
  That gives you very strong means, motive and opportunity for the AI overlords.
  [-]
  - amelius 18 hours ago ago
    > LLMs don’t own data centers
    Does it matter? An employee doesn't own any of the capital of their boss, but they can still exert a lot of power over it.
    [-]
    - klabb3 14 hours ago ago
      That's news to me. I thought companies’ decision making is governed by proxy of the shareholders, not employees.
      [-]
      - dghlsakjg 12 hours ago ago
        Shareholders who delegate spending decisions to their proxy who delegate spending decisions to the c-suite, who delegate spending decisions to the managers, who delegate spending decisions to individuals, who can and do delegate spending decisions to automated systems when it is in the interest of the shareholders.
        I don't have free access to all of the capital in my employer's control, nor does anyone in the entire org. But I do have the ability to, for example, decide to turn on auto-scaling on the apps I'm in charge of without having to hold a shareholder vote about the issue.
      - achierius 12 hours ago ago
        Have you ever heard of "unions"?