Agents turn simple keyword search into compelling search experiences

(softwaredoug.com)

64 points | by softwaredoug 2 days ago ago

33 comments

OhMeadhbh 2 days ago ago
These people really mis-understand how people like me use search. I don't want "an experience," I want a list of documents that match the search specifier.
[-]
- a day ago ago
  [deleted]
- gus_massa a day ago ago
  I agree, I agree, I search in Google, but Gemini is quite good and most of the times the answer is correct or close enough to save a lot of my time.
  [-]
  - inetknght a day ago ago
    > I search in Google
    What I understand you to mean is: you ask Google to give you the most-advertised product and hope that it's what's best for you.
    [-]
    - gus_massa a day ago ago
      I skip the "sponsored" links at the top. For programming questions it used to link to the most relevant Stack Overflow question and the exact doc of that language (that many times are difficult to navigate). It autocorrected typos and used synonyms like "pause" -> "sleep".
- softwaredoug 2 days ago ago
  Author here, well specifically I'm using "experiences" to mean basically ranking / finding relevant results.
  [-]
  - soco 2 days ago ago
    Wouldn't the "dumb API search" be already enough for users? I'm not sure my decades experience as Google search user has improved once they leaned heavy on the ranking and adding "relevant" things...
    [-]
    - janalsncm a day ago ago
      > once they leaned heavy on the ranking
      Not sure what you mean here. Google started with PageRank which is a decent, albeit gamable, ranking algorithm. They’ve never not been leaning heavy on ranking.
      Mid 2010s, Google began supporting conversational (non keyword) searches. This is because a good number of queries are difficult to reduce to keywords. But it is an inherently harder problem. And at the same time, the open web enahittified itself, filling up with SEOed blogspam. And a lot of user generated content moved into walled gardens.
  - nottorp 2 days ago ago
    Well, at least you didn't use 'engagement'.
    Edit: oops, you used it in another comment on here.
- awlejrlakwejr a day ago ago
  Absolutely. We need separate interfaces for work and play.
  [-]
  - OhMeadhbh 11 hours ago ago
    You might be onto something here. What we've lost I notice mostly when working. When I search for "Bizarre undocumented compile error B7102343. Rebooting Universe." I really want to see ONLY those pages that have these keywords. But when I'm trying to find a list of DMV offices near me, I want the "search" engine to interpret my question "where are DMV offices near me?"
    Maybe it is two different products: index search and fuzzy search?
dsr_ 2 days ago ago
"Agents, however, come with the ability to reason."
Citations needed. Calling recursive bullshit reason does not make it so.
[-]
- softwaredoug 2 days ago ago
  Well call it what you want, I'm referring to the reasoning functionality of GPT-5, Claude, etc. It seems to work pretty well at this task. These tools prefer grep to some fancy search thingy.
  [-]
  - esafak 2 days ago ago
    > These tools prefer grep to some fancy search thingy.
    Passing on the cost to you. The fancy search thingy could navigate the AST and catch typos.
    [-]
    - softwaredoug 2 days ago ago
      It's more accurate to say the agents like simple, interpretable tools. Clear input / obvious output. That might be more an AST thing, keyword search, search with very straightforward filters, etc.
      I don't thing they do well with search that is built for human engagement, which is a more complex tool to reason about.
      [-]
      - esafak 2 days ago ago
        It makes no different to the agent. The response from grep is a list of matches, and so it is from something more intelligent. A list is a list.
        > I don't thing they do well with search that is built for human engagement, which is a more complex tool to reason about
        I agree! Structured outputs are best.
        [-]
        shadowgovt a day ago ago
        Under the hood, LLMs are vector analysis engines. They benefit from spaces where the result of change is smooth.
        Adding levels of indirection and secondary reasoning to the search interface makes the results less smooth. This is one of the things humans complain about often when using e.g. Google: "I'm interesting in searching for X, but all these results are Y." Yes, because X and Y are synonyms or close topics and Google is mixing in a popularity signal to deduce that, for example, your search for `tailored swift database` is probably about a corpus of Taylor Swift song lyrics and not about companies that build bespoke Swift APIs on top of data stored in Postgres.
        If you're driving the process with an LLM, it's more of a problem for the LLM if it's searching the space and the search engine under it keeps tripping over the difference between "swift means a programming language" and "swift means a successful musician" as it explores the result space. A dumber API that doesn't try to guess and just returns both datasets blended together fits the space-search problem better.
the_snooze 2 days ago ago
If agents are making value judgments ostensibly on my behalf, why should I trust them to continue to be aligned with my values if in practice they're almost always running on someone else's hardware and being maintained on someone else's budget?
We'd be stupid to ignore the last 15+ years of big tech "democratization"-to-enshittification bait-and-switch.
[-]
- janalsncm a day ago ago
  That was never not the case. There are always value judgements. Even for keyword searches, there are often hundreds of exact matches, and one you want might not even be an exact match.
  This article is about how Target can use LLMs to help you find patio furniture. I guess you could imagine the LLM upselling you on a more expensive set?
- evilduck a day ago ago
  Unless you're running your own search engine for yourself, search indexes and vector databases already manage what data they want to ingest, they contain rank weights, keyword aliases, and prefiltering for controlling the search result in favor of the service provider's desired outcome. And these all run on someone else's hardware maintained on someone else's budget.
  Adding an LLM or agentic data flow and a tuned prompt to the mix does nothing to change who is in charge, it was never you.
- softwaredoug 2 days ago ago
  Search engines currently do this for better or worse. But they still want you to buy products.
  The bigger issue is I’m not sure agents are trained to understand what users find engaging. What makes users click.
mips_avatar a day ago ago
One interesting thing about the lack of click stream feedback is that you can generate it synthetically. If you've got your model evaluating the quality of search responses and adjusting its queries when there's a miss, you get to capture that adjustment and tune your search engine. In user click search you need scale to tune search, now you can generate it. The only problem is you need to trust your agent is doing the right thing as it keeps searching harder.
jillesvangurp 2 days ago ago
Interesting approach. It might be helpful to give the agent more tools though. Some simple aggregations might give it a notion of what's there to query for in the catalog. And a combination of overly broad queries and aggregations (terms, significant terms, etc.) might help it zoom in on interesting results. And of course, relatively large responses are not necessarily as much of a problem for LLMs as they would be for humans.
[-]
- softwaredoug a day ago ago
  I agree, my post only scratches the surface. I want to give more knobs to the agent. But not so many that it can't really experiment / reason about them.
rgavuliak 14 hours ago ago
Why do you expect an LLM to provide an accurate distance metrics?
intalentive a day ago ago
The doc string becoming part of the prompt is a nice touch.
It seems plausible and intuitive that simple tools dynamically called by agents would yield better results than complex search pipelines. But do you have any hard data backing this up?
[-]
- softwaredoug a day ago ago
  I don't have any hard data, no. That will be the other-other side project I may or may not have time for :)
daft_pink a day ago ago
I think Comet's search is really nice and worth the $20 a month, but not $200 a month that it currently costs although it is a little slow. My experience is similar to this article.
2 days ago ago
[deleted]
hakfoo a day ago ago
Keyword search is the compelling experience.
I want my machine to be determinstic and non-magical. I am so tired of search tools that won't let me actually search for what I want because it clearly thinks I meant something else.
westurner a day ago ago
Prompts for this:
Turn this into a paragraph-sized prompt
Turn this into a n document length formal proposal,
And then split that into paragraph sized token optimized prompts
[-]
- westurner a day ago ago
  Aren't "Attention is all you need" Transformers good at suggesting suggested search terms?
pimlottc a day ago ago
“thick-daddy search API”…