OK, I like this. It’s an agent-based add on to (for now) Gemini that aims at improving the quality of output through a more ‘human’ style of research - digging deeper, considering counter examples, fleshing out with more research thin areas.
I’d like to try it, but I just learned I need and Enterprise Agentic subscription of some sort from Google; no idea how much that costs.
That said, this seems like a real abuse of the term diffusion, as far as I can tell. I don’t think this thing is reversing any entropy on any latent space.
They published a paper, and this isn't something complex that would take a lot of work to implement. You could probably give codex an example open source deep research project, then sic it on the paper and tell it to make a fork that uses this algorithm, I wouldn't be surprised if it could basically one shot implement.
Interesting research, but I wish people would stick to the clearer term “inference-time computation” instead of the more ambiguous and confusing “test-time computation.”
Test/evaluation/inference are treated as almost synonymous because in academic research you almost exclusively run inference on a trained model in order to evaluate its performance on a test set. Of course in the real world, you will want to run inference in production to do useful work. But the language comes from research.
Huh never thought of the process of drafting while writing to be similar to how diffusion models start with a noisy set. Super cool for sure though I'm curious if this (and other similar research on making models think more at inference time) are showing that the best way for models to "think" is the exact same way humans do
The way I read the paper, "diffusion" was more of a metaphor - you start with the output of the LLM as the overview (very much _not_ random noise), and then refine it over many steps. However, seeing this, I wonder myself whether or not in-house they actually mean it more literally or have actually tried using it more literally.
They reference a paper using initial noisy data as a key, mapping to a "jump-ahead" value of a previous example. I think this is very cool and clever, and does use a diffusion model.
But I don't see how this Deep Researcher actually uses diffusion at all. So it seems wrong to say "test-time diffusion" just because you liken an early text draft with noise in a diffusion model, then use RAG to retrieve a potential polished version of said text draft?
Seems like a useful approach to coding assistants as well. Write some draft functionality, notice some patterns or redundancy with the existing code or in the change itself, search for libraries or alternative design patterns that could help out or create something that is targeted to the use case, reimplement in terms of those new components.
OK, I like this. It’s an agent-based add on to (for now) Gemini that aims at improving the quality of output through a more ‘human’ style of research - digging deeper, considering counter examples, fleshing out with more research thin areas.
I’d like to try it, but I just learned I need and Enterprise Agentic subscription of some sort from Google; no idea how much that costs.
That said, this seems like a real abuse of the term diffusion, as far as I can tell. I don’t think this thing is reversing any entropy on any latent space.
They published a paper, and this isn't something complex that would take a lot of work to implement. You could probably give codex an example open source deep research project, then sic it on the paper and tell it to make a fork that uses this algorithm, I wouldn't be surprised if it could basically one shot implement.
Yeah good idea. Virtual Lucid Rains could reimplement.
Interesting research, but I wish people would stick to the clearer term “inference-time computation” instead of the more ambiguous and confusing “test-time computation.”
Test/evaluation/inference are treated as almost synonymous because in academic research you almost exclusively run inference on a trained model in order to evaluate its performance on a test set. Of course in the real world, you will want to run inference in production to do useful work. But the language comes from research.
Literally everything you do during inference is inference-time, no?
Well, if all you're doing is accessing stuff that was pre-learned earlier, then it's not quite inference-time.
Huh never thought of the process of drafting while writing to be similar to how diffusion models start with a noisy set. Super cool for sure though I'm curious if this (and other similar research on making models think more at inference time) are showing that the best way for models to "think" is the exact same way humans do
Does this share techniques with Gemini Diffusion? https://blog.google/technology/google-deepmind/gemini-diffus...
The way I read the paper, "diffusion" was more of a metaphor - you start with the output of the LLM as the overview (very much _not_ random noise), and then refine it over many steps. However, seeing this, I wonder myself whether or not in-house they actually mean it more literally or have actually tried using it more literally.
They reference a paper using initial noisy data as a key, mapping to a "jump-ahead" value of a previous example. I think this is very cool and clever, and does use a diffusion model.
But I don't see how this Deep Researcher actually uses diffusion at all. So it seems wrong to say "test-time diffusion" just because you liken an early text draft with noise in a diffusion model, then use RAG to retrieve a potential polished version of said text draft?
Seems like a useful approach to coding assistants as well. Write some draft functionality, notice some patterns or redundancy with the existing code or in the change itself, search for libraries or alternative design patterns that could help out or create something that is targeted to the use case, reimplement in terms of those new components.
The first time I'm hearing about their https://cloud.google.com/products/agentspace