These people really mis-understand how people like me use search. I don't want "an experience," I want a list of documents that match the search specifier.
I skip the "sponsored" links at the top. For programming questions it used to link to the most relevant Stack Overflow question and the exact doc of that language (that many times are difficult to navigate). It autocorrected typos and used synonyms like "pause" -> "sleep".
Wouldn't the "dumb API search" be already enough for users? I'm not sure my decades experience as Google search user has improved once they leaned heavy on the ranking and adding "relevant" things...
Not sure what you mean here. Google started with PageRank which is a decent, albeit gamable, ranking algorithm. They’ve never not been leaning heavy on ranking.
Mid 2010s, Google began supporting conversational (non keyword) searches. This is because a good number of queries are difficult to reduce to keywords. But it is an inherently harder problem. And at the same time, the open web enahittified itself, filling up with SEOed blogspam. And a lot of user generated content moved into walled gardens.
You might be onto something here. What we've lost I notice mostly when working. When I search for "Bizarre undocumented compile error B7102343. Rebooting Universe." I really want to see ONLY those pages that have these keywords. But when I'm trying to find a list of DMV offices near me, I want the "search" engine to interpret my question "where are DMV offices near me?"
Maybe it is two different products: index search and fuzzy search?
Well call it what you want, I'm referring to the reasoning functionality of GPT-5, Claude, etc. It seems to work pretty well at this task. These tools prefer grep to some fancy search thingy.
It's more accurate to say the agents like simple, interpretable tools. Clear input / obvious output. That might be more an AST thing, keyword search, search with very straightforward filters, etc.
I don't thing they do well with search that is built for human engagement, which is a more complex tool to reason about.
Under the hood, LLMs are vector analysis engines. They benefit from spaces where the result of change is smooth.
Adding levels of indirection and secondary reasoning to the search interface makes the results less smooth. This is one of the things humans complain about often when using e.g. Google: "I'm interesting in searching for X, but all these results are Y." Yes, because X and Y are synonyms or close topics and Google is mixing in a popularity signal to deduce that, for example, your search for `tailored swift database` is probably about a corpus of Taylor Swift song lyrics and not about companies that build bespoke Swift APIs on top of data stored in Postgres.
If you're driving the process with an LLM, it's more of a problem for the LLM if it's searching the space and the search engine under it keeps tripping over the difference between "swift means a programming language" and "swift means a successful musician" as it explores the result space. A dumber API that doesn't try to guess and just returns both datasets blended together fits the space-search problem better.
If agents are making value judgments ostensibly on my behalf, why should I trust them to continue to be aligned with my values if in practice they're almost always running on someone else's hardware and being maintained on someone else's budget?
We'd be stupid to ignore the last 15+ years of big tech "democratization"-to-enshittification bait-and-switch.
That was never not the case. There are always value judgements. Even for keyword searches, there are often hundreds of exact matches, and one you want might not even be an exact match.
This article is about how Target can use LLMs to help you find patio furniture. I guess you could imagine the LLM upselling you on a more expensive set?
Unless you're running your own search engine for yourself, search indexes and vector databases already manage what data they want to ingest, they contain rank weights, keyword aliases, and prefiltering for controlling the search result in favor of the service provider's desired outcome. And these all run on someone else's hardware maintained on someone else's budget.
Adding an LLM or agentic data flow and a tuned prompt to the mix does nothing to change who is in charge, it was never you.
One interesting thing about the lack of click stream feedback is that you can generate it synthetically. If you've got your model evaluating the quality of search responses and adjusting its queries when there's a miss, you get to capture that adjustment and tune your search engine. In user click search you need scale to tune search, now you can generate it. The only problem is you need to trust your agent is doing the right thing as it keeps searching harder.
Interesting approach. It might be helpful to give the agent more tools though. Some simple aggregations might give it a notion of what's there to query for in the catalog. And a combination of overly broad queries and aggregations (terms, significant terms, etc.) might help it zoom in on interesting results. And of course, relatively large responses are not necessarily as much of a problem for LLMs as they would be for humans.
I agree, my post only scratches the surface. I want to give more knobs to the agent. But not so many that it can't really experiment / reason about them.
The doc string becoming part of the prompt is a nice touch.
It seems plausible and intuitive that simple tools dynamically called by agents would yield better results than complex search pipelines. But do you have any hard data backing this up?
I think Comet's search is really nice and worth the $20 a month, but not $200 a month that it currently costs although it is a little slow. My experience is similar to this article.
I want my machine to be determinstic and non-magical. I am so tired of search tools that won't let me actually search for what I want because it clearly thinks I meant something else.
These people really mis-understand how people like me use search. I don't want "an experience," I want a list of documents that match the search specifier.
I agree, I agree, I search in Google, but Gemini is quite good and most of the times the answer is correct or close enough to save a lot of my time.
> I search in Google
What I understand you to mean is: you ask Google to give you the most-advertised product and hope that it's what's best for you.
I skip the "sponsored" links at the top. For programming questions it used to link to the most relevant Stack Overflow question and the exact doc of that language (that many times are difficult to navigate). It autocorrected typos and used synonyms like "pause" -> "sleep".
Author here, well specifically I'm using "experiences" to mean basically ranking / finding relevant results.
Wouldn't the "dumb API search" be already enough for users? I'm not sure my decades experience as Google search user has improved once they leaned heavy on the ranking and adding "relevant" things...
> once they leaned heavy on the ranking
Not sure what you mean here. Google started with PageRank which is a decent, albeit gamable, ranking algorithm. They’ve never not been leaning heavy on ranking.
Mid 2010s, Google began supporting conversational (non keyword) searches. This is because a good number of queries are difficult to reduce to keywords. But it is an inherently harder problem. And at the same time, the open web enahittified itself, filling up with SEOed blogspam. And a lot of user generated content moved into walled gardens.
Well, at least you didn't use 'engagement'.
Edit: oops, you used it in another comment on here.
Absolutely. We need separate interfaces for work and play.
You might be onto something here. What we've lost I notice mostly when working. When I search for "Bizarre undocumented compile error B7102343. Rebooting Universe." I really want to see ONLY those pages that have these keywords. But when I'm trying to find a list of DMV offices near me, I want the "search" engine to interpret my question "where are DMV offices near me?"
Maybe it is two different products: index search and fuzzy search?
"Agents, however, come with the ability to reason."
Citations needed. Calling recursive bullshit reason does not make it so.
Well call it what you want, I'm referring to the reasoning functionality of GPT-5, Claude, etc. It seems to work pretty well at this task. These tools prefer grep to some fancy search thingy.
> These tools prefer grep to some fancy search thingy.
Passing on the cost to you. The fancy search thingy could navigate the AST and catch typos.
It's more accurate to say the agents like simple, interpretable tools. Clear input / obvious output. That might be more an AST thing, keyword search, search with very straightforward filters, etc.
I don't thing they do well with search that is built for human engagement, which is a more complex tool to reason about.
It makes no different to the agent. The response from grep is a list of matches, and so it is from something more intelligent. A list is a list.
> I don't thing they do well with search that is built for human engagement, which is a more complex tool to reason about
I agree! Structured outputs are best.
Under the hood, LLMs are vector analysis engines. They benefit from spaces where the result of change is smooth.
Adding levels of indirection and secondary reasoning to the search interface makes the results less smooth. This is one of the things humans complain about often when using e.g. Google: "I'm interesting in searching for X, but all these results are Y." Yes, because X and Y are synonyms or close topics and Google is mixing in a popularity signal to deduce that, for example, your search for `tailored swift database` is probably about a corpus of Taylor Swift song lyrics and not about companies that build bespoke Swift APIs on top of data stored in Postgres.
If you're driving the process with an LLM, it's more of a problem for the LLM if it's searching the space and the search engine under it keeps tripping over the difference between "swift means a programming language" and "swift means a successful musician" as it explores the result space. A dumber API that doesn't try to guess and just returns both datasets blended together fits the space-search problem better.
If agents are making value judgments ostensibly on my behalf, why should I trust them to continue to be aligned with my values if in practice they're almost always running on someone else's hardware and being maintained on someone else's budget?
We'd be stupid to ignore the last 15+ years of big tech "democratization"-to-enshittification bait-and-switch.
That was never not the case. There are always value judgements. Even for keyword searches, there are often hundreds of exact matches, and one you want might not even be an exact match.
This article is about how Target can use LLMs to help you find patio furniture. I guess you could imagine the LLM upselling you on a more expensive set?
Unless you're running your own search engine for yourself, search indexes and vector databases already manage what data they want to ingest, they contain rank weights, keyword aliases, and prefiltering for controlling the search result in favor of the service provider's desired outcome. And these all run on someone else's hardware maintained on someone else's budget.
Adding an LLM or agentic data flow and a tuned prompt to the mix does nothing to change who is in charge, it was never you.
Search engines currently do this for better or worse. But they still want you to buy products.
The bigger issue is I’m not sure agents are trained to understand what users find engaging. What makes users click.
One interesting thing about the lack of click stream feedback is that you can generate it synthetically. If you've got your model evaluating the quality of search responses and adjusting its queries when there's a miss, you get to capture that adjustment and tune your search engine. In user click search you need scale to tune search, now you can generate it. The only problem is you need to trust your agent is doing the right thing as it keeps searching harder.
Interesting approach. It might be helpful to give the agent more tools though. Some simple aggregations might give it a notion of what's there to query for in the catalog. And a combination of overly broad queries and aggregations (terms, significant terms, etc.) might help it zoom in on interesting results. And of course, relatively large responses are not necessarily as much of a problem for LLMs as they would be for humans.
I agree, my post only scratches the surface. I want to give more knobs to the agent. But not so many that it can't really experiment / reason about them.
Why do you expect an LLM to provide an accurate distance metrics?
The doc string becoming part of the prompt is a nice touch.
It seems plausible and intuitive that simple tools dynamically called by agents would yield better results than complex search pipelines. But do you have any hard data backing this up?
I don't have any hard data, no. That will be the other-other side project I may or may not have time for :)
I think Comet's search is really nice and worth the $20 a month, but not $200 a month that it currently costs although it is a little slow. My experience is similar to this article.
Keyword search is the compelling experience.
I want my machine to be determinstic and non-magical. I am so tired of search tools that won't let me actually search for what I want because it clearly thinks I meant something else.
Prompts for this:
Turn this into a paragraph-sized prompt
Turn this into a n document length formal proposal,
And then split that into paragraph sized token optimized prompts
Aren't "Attention is all you need" Transformers good at suggesting suggested search terms?
“thick-daddy search API”…