> Pricing is good, but profits may continue to be elusive; still no clear technical moat.
It's quite ironic that the technology that is displacing so many people from so many industries, has yet to make a profit. I fear the "creative" part of their destruction will take longer to achieve than they advertise.
> OpenAI conveniently forgot to include this comparison (ARC-AGI-2) in their livestream recital of benchmark progress, which left the livestream looking like marketing rather than science.
Yeah but it was _supposed_ to be marketing, right? Like, of course a product video isn't science in the same way a "hot take" post also isn't science.
Why is Grok so surprisingly decent? Does lack of mainstream liberal-left censorship (replaced with Musky censorship) result in some sort of a weird performance boost?
There's nothing weird about a model performing better when it is built to more closely relate to reality instead of an ideologically tainted version of such. I don't know how much Musk & Co. interfere with the fine tuning of the models but it is clear that this interference is far less heavy-handed than what the other actors do to their models.
Fewer fingers on the scale means the LLM gets to actually do its thing. GPT-4 with zero filtering was scary smart according to the red teams that were testing it. The version the public got had a lobe tied behind it's back.
Having only Grok 3 to compare, and toying around with GPT-5... GPT-5 is pretty good.
> Pricing is good, but profits may continue to be elusive; still no clear technical moat.
It's quite ironic that the technology that is displacing so many people from so many industries, has yet to make a profit. I fear the "creative" part of their destruction will take longer to achieve than they advertise.
Nobody has lost their job yet because of AI. But lots of people lost their job, because of the money their CEOs spent on AI.
What is worse? Terrible charts, terrible charts making it through any form of scrutiny or terrible charts intentionally making it to the main stage.
> OpenAI conveniently forgot to include this comparison (ARC-AGI-2) in their livestream recital of benchmark progress, which left the livestream looking like marketing rather than science.
Yeah but it was _supposed_ to be marketing, right? Like, of course a product video isn't science in the same way a "hot take" post also isn't science.
Why is Grok so surprisingly decent? Does lack of mainstream liberal-left censorship (replaced with Musky censorship) result in some sort of a weird performance boost?
Is it decent, or does it game the tests? Really, would love to know..
There's nothing weird about a model performing better when it is built to more closely relate to reality instead of an ideologically tainted version of such. I don't know how much Musk & Co. interfere with the fine tuning of the models but it is clear that this interference is far less heavy-handed than what the other actors do to their models.
How are the other ones tainted?
Yes, actually.
Fewer fingers on the scale means the LLM gets to actually do its thing. GPT-4 with zero filtering was scary smart according to the red teams that were testing it. The version the public got had a lobe tied behind it's back.
Having only Grok 3 to compare, and toying around with GPT-5... GPT-5 is pretty good.