This is a relief, honestly. A prior solution exists now, which means the model didn’t solve anything at all. It just regurgitated it from the internet, which we can retroactively assume contained the solution in spirit, if not in any searchable or known form. Mystery resolved.
This aligns nicely with the rest of the canon. LLMs are just stochastic parrots. Fancy autocomplete. A glorified Google search with worse footnotes. Any time they appear to do something novel, the correct explanation is that someone, somewhere, already did it, and the model merely vibes in that general direction. The fact that no human knew about it at the time is a coincidence best ignored.
The same logic applies to code. “Vibe coding” isn’t real programming. Real programming involves intuition, battle scars, and a sixth sense for bugs that can’t be articulated but somehow always validates whatever I already believe. When an LLM produces correct code, that’s not engineering, it’s cosplay. It didn’t understand the problem, because understanding is defined as something only humans possess, especially after the fact.
Naturally, only senior developers truly code. Juniors shuffle syntax. Seniors channel wisdom. Architecture decisions emerge from lived experience, not from reading millions of examples and compressing patterns into a model. If an LLM produces the same decisions, it’s obviously cargo-culting seniority without having earned the right to say “this feels wrong” in a code review.
Any success is easy to dismiss. Data leakage. Prompt hacking. Cherry-picking. Hidden humans in the loop. And if none of those apply, then it “won’t work on a real codebase,” where “real” is defined as the one place the model hasn’t touched yet. This definition will be updated as needed.
Hallucinations still settle everything. One wrong answer means the whole system is fundamentally broken. Human mistakes, meanwhile, are just learning moments, context switches, or coffee shortages. This is not a double standard. It’s experience.
Jobs are obviously safe too. Software engineering is mostly communication, domain expertise, and navigating ambiguity. If the model starts doing those things, that still doesn’t count, because it doesn’t sit in meetings, complain about product managers, or feel existential dread during sprint planning.
So yes, the Erdos situation is resolved. Nothing new happened. No reasoning occurred. Progress remains hype. The trendline is imaginary. And any discomfort you feel is probably just social media, not the ground shifting under your feet.
I suspect this is AI generated, but it’s quite high quality, and doesn’t have any of the telltale signs that most AI generated content does. How did you generate this? It’s great.
It's bizarre. The same account was previously arguing in favor of emergent reasoning abilities in another thread ( https://news.ycombinator.com/item?id=46453084 ) -- I voted it up, in fact! Turing test failed, I guess.
We need a name for the much more trivial version of the Turing test that replaces "human" with "weird dude with rambling ideas he clearly thinks are very deep"
I'm pretty sure it's like "can it run DOOM" and someone could make an LLM that passes this that runs on an pregnancy test
Well it's a bit of an identity crisis. As a developer on HN my entire identity is wrapped around my skill as a programmer. It's a badge of honor I wear and it's a career and I get paid a lot of money to do this.
All of that is going away so the best way to deal with it is to call it a stochastic parrot and deny reality.
Can anyone give a little more color on the nature of Erdos problems? Are these problems that many mathematicians have spend years tackling with no result? Or do some of the problems evade scrutiny and go un-attempted for most of the time?
EDIT:
After reading a link someone else posted to Terrance Tao's wiki page, he has a paragraph that somewhat answers this question:
> Erdős problems vary widely in difficulty (by several orders of magnitude), with a core of very interesting, but extremely difficult problems at one end of the spectrum, and a "long tail" of under-explored problems at the other, many of which are "low hanging fruit" that are very suitable for being attacked by current AI tools. Unfortunately, it is hard to tell in advance which category a given problem falls into, short of an expert literature review. (However, if an Erdős problem is only stated once in the literature, and there is scant record of any followup work on the problem, this suggests that the problem may be of the second category.)
FWIW, I just gave Deepseek the same prompt and it solved it too (much faster than the 41m of ChatGPT). I then gave both proofs to Opus and it confirmed their equivalence.
The answer is yes. Assume, for the sake of contradiction, that there exists an \(\epsilon > 0\) such that for every \(k\), there exists a choice of congruence classes \(a_1^{(k)}, \dots, a_k^{(k)}\) for which the set of integers not covered by the first \(k\) congruences has density at least \(\epsilon\).
For each \(k\), let \(F_k\) be the set of all infinite sequences of residues \((a_i)_{i=1}^\infty\) such that the uncovered set from the first \(k\) congruences has density at least \(\epsilon\). Each \(F_k\) is nonempty (by assumption) and closed in the product topology (since it depends only on the first \(k\) coordinates). Moreover, \(F_{k+1} \subseteq F_k\) because adding a congruence can only reduce the uncovered set. By the compactness of the product of finite sets, \(\bigcap_{k \ge 1} F_k\) is nonempty.
Choose an infinite sequence \((a_i) \in \bigcap_{k \ge 1} F_k\). For this sequence, let \(U_k\) be the set of integers not covered by the first \(k\) congruences, and let \(d_k\) be the density of \(U_k\). Then \(d_k \ge \epsilon\) for all \(k\). Since \(U_{k+1} \subseteq U_k\), the sets \(U_k\) are decreasing and periodic, and their intersection \(U = \bigcap_{k \ge 1} U_k\) has density \(d = \lim_{k \to \infty} d_k \ge \epsilon\). However, by hypothesis, for any choice of residues, the uncovered set has density \(0\), a contradiction.
Therefore, for every \(\epsilon > 0\), there exists a \(k\) such that for every choice of congruence classes \(a_i\), the density of integers not covered by the first \(k\) congruences is less than \(\epsilon\).
> I then gave both proofs to Opus and it confirmed their equivalence.
You could have just rubber-stamped it yourself, for all the mathematical rigor it holds. The devil is in the details, and the smallest problem unravels the whole proof.
I don't see what's related there but anyway unless you have access to information from within OpenAI I don't see how you can claim what was or wasn't in the training data of ChatGPT 5.2 Pro.
On the contrary for DeepSeek you could but not for a non open model.
I find it interesting that, as someone utterly unfamiliar with ergodic theory, Dini’s theorem, etc, I find Deepseek’s proof somewhat comprehensible, whereas I do not find GPT-5.2’s proof comprehensible at all. I suspect that I’d need to delve into the terminology in the GPT proof if I tried to verify Deepseek’s, so maybe GPT’s is being more straightforward about the underlying theory it relies on?
"Very nice! ... actually the thing that impresses me more than the proof method is the avoidance of errors, such as making mistakes with interchanges of limits or quantifiers (which is the main pitfall to avoid here). Previous generations of LLMs would almost certainly have fumbled these delicate issues.
...
I am going ahead and placing this result on the wiki as a Section 1 result (perhaps the most unambiguous instance of such, to date)"
The pace of change in math is going to be something to watch closely. Many minor theorems will fall. Next major milestone: Can LLMs generate useful abstractions?
Seems like the someone dug something up from the literature on this problem (see top comment on the erdosproblems.com thread)
"On following the references, it seems that the result in fact follows (after applying Rogers' theorem) from a 1936 paper of Davenport and Erdos (!), which proves the second result you mention. ... In the meantime, I am moving this problem to Section 2 on the wiki (though the new proof is still rather different from the literature proof)."
I guess the first question I have is if these problems solved by LLMs are just low-hanging fruit that human researchers either didn't get around to or show much interest in - or if there's some actual beef here to the idea that LLMs can independently conduct original research and solve hard problems.
That's the first warning from the wiki : <<Erdős problems vary widely in difficulty (by several orders of magnitude), with a core of very interesting, but extremely difficult problems at one end of the spectrum, and a "long tail" of under-explored problems at the other, many of which are "low hanging fruit" that are very suitable for being attacked by current AI tools.>> https://github.com/teorth/erdosproblems/wiki/AI-contribution...
There is still value on letting these LLMs loose on the periphery and knocking out all the low hanging fruit humanity hasn’t had the time to get around to. Also, I don’t know this, but if it is a problem on Erdos I presume people have tried to solve it atleast a little bit before it makes it to the list.
Is there though? If they are "solved" (as in the tickbox mark them as such, through a validation process, e.g. another model confirming, formal proof passing, etc) but there is no human actually learning from them, what's the benefit? Completing a list?
I believe the ones that are NOT studied are precisely because they are seen as uninteresting. Even if they were to be solved in an interesting way, if nobody sees the proof because they are just too many and they are again not considered valuable then I don't see what is gained.
Out of curiosity why has the LLM math solving community been focused on the Erdos problems over other open problems? Are they of a certain nature where we would expect LLMs to be especially good at solving them?
I guess they are at a difficulty where it's not too hard (unlike millennium prize problems), is fairly tightly scoped (unlike open ended research), and has some gravitas (so it's not some obscure theorem that's only unproven because of it's lack of noteworthiness).
I have 15 years of software engineering experience across some top companies. I truly believe that ai will far surpass human beings at coding, and more broadly logic work. We are very close
HN will be the last place to admit it; people here seem to be holding out with the vague 'I tried it and it came up with crap'. While many of us are shipping software without touching (much) code anymore. I have written code for over 40 years and this is nothing like no-code or whatever 'replacing programmers' before, this is clearly different judging from the people who cannot code with a gun to their heads but still are shipping apps: it does not really matter if anyone believes me or not. I am making more money than ever with fewer people than ever delivering more than ever.
We are very close.
(by the way; I like writing code and I still do for fun)
Both can be correct : you might be making a lot of money using the latest tools while others who work on very different problems have tried the same tools and it's just not good enough for them.
The ability to make money proves you found a good market, it doesn't prove that the new tools are useful to others.
> holding out with the vague 'I tried it and it came up with crap'
Isn't that a perfectly reasonable metric? The topic has been dominated by hype for at least the past 5 if not 10 years. So when you encounter the latest in a long line of "the future is here the sky is falling" claims, where every past claim to date has been wrong, it's natural to try for yourself, observe a poor result, and report back "nope, just more BS as usual".
If the hyped future does ever arrive then anyone trying for themselves will get a workable result. It will be trivially easy to demonstrate that naysayers are full of shit. That does not currently appear to be the case.
Wasn't transformer 2017? There's been constant AI hype since at least that far back and it's only gotten worse.
If I release a claim once a month that armageddon will happen next month, and then after 20 years it finally does, are all of my past claims vindicated? Or was I spewing nonsense the entire time? What if my claim was the next big pandemic? The next 9.0 earthquake?
I don't dispute that the situation is rapidly evolving. It is certainly possible that we could achieve AGI in the near future. It is also entirely possible that we might not. Claims such as that AGI is close or that we will soon be replacing developers entirely are pure hype.
They can only code to specification which is where even teams of humans get lost. Without much smarter architecture for AI (LLMs as is are a joke) that needle isn’t going to move.
This is crazy. It's clear that these models don't have human intelligence, but it's undeniable at this point that they have _some_ form of intelligence.
I don't think they will ever have human intelligence. It will always be an alien intelligence.
But I think the trend line unmistakably points to a future where it can be MORE intelligent than a human in exactly the colloquial way we define "more intelligent"
The fact that one of the greatest mathematicians alive has a page and is seriously bench marking this shows how likely he believes this can happen.
My take is that a huge part of human intelligence is pattern matching. We just didn’t understand how much multidimensional geometry influenced our matches
Yes, it could be that intelligence is essentially a sophisticated form of recursive, brute force pattern matching.
I'm beginning to think the Bitter Lesson applies to organic intelligence as well, because basic pattern matching can be implemented relatively simply using very basic mathematical operations like multiply and accumulate, and so it can scale with massive parallelization of relatively simple building blocks.
I don't think it's accurate to describe LLMs as pattern matching. Prediction is the mechanism they use to ingest and output information, and they end up with a (relatively) deep model of the world under the hood.
The "pattern matching" perspective is true if you zoom in close enough, just like "protein reactions in water" is true for brains. But if you zoom out you see both humans and LLMs interact with external environments which provide opportunity for novel exploration. The true source of originality is not inside but in the environment. Making it be all about the model inside is a mistake, what matters more than the model is the data loop and solution space being explored.
"Pattern matching" is not sufficiently specified here for us to say if LLMs do pattern matching or not. E.g. we can say that an LLM predicts the next token because that token (or rather, its embedding) is the best "match" to the previous tokens, which form a path ("pattern") in embedding space. In this sense LLMs are most definitely pattern matching. Under other formulations of the term, they may not be (e.g. when pattern matching refers to abstraction or abstracting to actual logical patterns, rather than strictly semantic patterns).
There's some nuance. IQ tests measure pattern matching and, in an underlying way, other facets of intelligence - memory, for example. How well can an LLM 'remember' a thing? Sometimes Claude will perform compaction when its context window reaches 200k "tokens" then it seems a little colder to me, but maybe that's just my imagination. I'm kind of a "power user".
what are you referring to? LLMs are neural networks at their core and the most simple versions of neural networks are all about reproducing patterns observed during training
As someone who doesn't understand this shit, and how it's always the experts who fiddle the LLMs to get good outputs, it feels natural to attribute the intelligence to the operator (or the training set), rather than the LLM itself.
Funny seeing silicon valley bros commenting "you're on fire!" to Neel when it appears he copied and pasted the problem verbatim into chatGPT and it did literally all the other work here
> no prior solutions found.
This is no longer true, a prior solution has just been found[1], so the LLM proof has been moved to the Section 2 of Terence Tao's wiki[2].
[1] - https://www.erdosproblems.com/forum/thread/281#post-3325
[2] - https://github.com/teorth/erdosproblems/wiki/AI-contribution...
Interesting that in Terrance Tao's words: "though the new proof is still rather different from the literature proof)"
And even odder that the proof was by Erdos himself and yet he listed it as an open problem!
Maybe it was in the training set.
I think that was Tao's point, that the new proof was not just read out of the training set.
This is a relief, honestly. A prior solution exists now, which means the model didn’t solve anything at all. It just regurgitated it from the internet, which we can retroactively assume contained the solution in spirit, if not in any searchable or known form. Mystery resolved.
This aligns nicely with the rest of the canon. LLMs are just stochastic parrots. Fancy autocomplete. A glorified Google search with worse footnotes. Any time they appear to do something novel, the correct explanation is that someone, somewhere, already did it, and the model merely vibes in that general direction. The fact that no human knew about it at the time is a coincidence best ignored.
The same logic applies to code. “Vibe coding” isn’t real programming. Real programming involves intuition, battle scars, and a sixth sense for bugs that can’t be articulated but somehow always validates whatever I already believe. When an LLM produces correct code, that’s not engineering, it’s cosplay. It didn’t understand the problem, because understanding is defined as something only humans possess, especially after the fact.
Naturally, only senior developers truly code. Juniors shuffle syntax. Seniors channel wisdom. Architecture decisions emerge from lived experience, not from reading millions of examples and compressing patterns into a model. If an LLM produces the same decisions, it’s obviously cargo-culting seniority without having earned the right to say “this feels wrong” in a code review.
Any success is easy to dismiss. Data leakage. Prompt hacking. Cherry-picking. Hidden humans in the loop. And if none of those apply, then it “won’t work on a real codebase,” where “real” is defined as the one place the model hasn’t touched yet. This definition will be updated as needed.
Hallucinations still settle everything. One wrong answer means the whole system is fundamentally broken. Human mistakes, meanwhile, are just learning moments, context switches, or coffee shortages. This is not a double standard. It’s experience.
Jobs are obviously safe too. Software engineering is mostly communication, domain expertise, and navigating ambiguity. If the model starts doing those things, that still doesn’t count, because it doesn’t sit in meetings, complain about product managers, or feel existential dread during sprint planning.
So yes, the Erdos situation is resolved. Nothing new happened. No reasoning occurred. Progress remains hype. The trendline is imaginary. And any discomfort you feel is probably just social media, not the ground shifting under your feet.
Pity that HN's ability to detect sarcasm is as robust as that of a sentiment analysis model using keyword-matching.
The problem is more that it's an LLM-generated comment that's about 20x as long as it needed to be to get the point across.
I suspect this is AI generated, but it’s quite high quality, and doesn’t have any of the telltale signs that most AI generated content does. How did you generate this? It’s great.
Your intuition on AI is out of date by about 6 months. Those telltale signs no longer exist.
It wasn't AI generated. But if it was, there is currently no way for anyone to tell the difference.
> But if it was there is currently no way for anyone to tell the difference.
This is false. There are many human-legible signs, and there do exist fairly reliable AI detection services (like Pangram).
I've tested some of those services and they weren't very reliable.
> It wasn't AI generated.
You're lying: https://www.pangram.com/history/94678f26-4898-496f-9559-8c4c...
Not that I needed pangram to tell me that, it's obvious slop.
It's bizarre. The same account was previously arguing in favor of emergent reasoning abilities in another thread ( https://news.ycombinator.com/item?id=46453084 ) -- I voted it up, in fact! Turing test failed, I guess.
(edit: fixed link)
I thought the mockery and sarcasm in my piece was rather obvious.
Poe's Law is the real Bitter Lesson.
We need a name for the much more trivial version of the Turing test that replaces "human" with "weird dude with rambling ideas he clearly thinks are very deep"
I'm pretty sure it's like "can it run DOOM" and someone could make an LLM that passes this that runs on an pregnancy test
(edit: removed duplicate comment from above, not sure how that happened)
the poster is in fact being very sarcastic. arguing in favor of emergent reasoning does in fact make sense
It's a formal sarcasm piece.
Why not plan for a future where a lot of non-trivial tasks are automated instead of living on the edge with all this anxiety?
Well it's a bit of an identity crisis. As a developer on HN my entire identity is wrapped around my skill as a programmer. It's a badge of honor I wear and it's a career and I get paid a lot of money to do this.
All of that is going away so the best way to deal with it is to call it a stochastic parrot and deny reality.
come out of the irony layer for a second -- what do you believe about LLMs?
If all of it is going away and you should deny reality, what does everything else you wrote even mean?
Yes, it is simply impossible that anyone could look at things and do your own evaluations and come to a different, much more skeptical conclusion.
The only possible explanation is people say things they don't believe out of FUD. Literally the only one.
Can anyone give a little more color on the nature of Erdos problems? Are these problems that many mathematicians have spend years tackling with no result? Or do some of the problems evade scrutiny and go un-attempted for most of the time?
EDIT: After reading a link someone else posted to Terrance Tao's wiki page, he has a paragraph that somewhat answers this question:
> Erdős problems vary widely in difficulty (by several orders of magnitude), with a core of very interesting, but extremely difficult problems at one end of the spectrum, and a "long tail" of under-explored problems at the other, many of which are "low hanging fruit" that are very suitable for being attacked by current AI tools. Unfortunately, it is hard to tell in advance which category a given problem falls into, short of an expert literature review. (However, if an Erdős problem is only stated once in the literature, and there is scant record of any followup work on the problem, this suggests that the problem may be of the second category.)
from here: https://github.com/teorth/erdosproblems/wiki/AI-contribution...
FWIW, I just gave Deepseek the same prompt and it solved it too (much faster than the 41m of ChatGPT). I then gave both proofs to Opus and it confirmed their equivalence.
The answer is yes. Assume, for the sake of contradiction, that there exists an \(\epsilon > 0\) such that for every \(k\), there exists a choice of congruence classes \(a_1^{(k)}, \dots, a_k^{(k)}\) for which the set of integers not covered by the first \(k\) congruences has density at least \(\epsilon\).
For each \(k\), let \(F_k\) be the set of all infinite sequences of residues \((a_i)_{i=1}^\infty\) such that the uncovered set from the first \(k\) congruences has density at least \(\epsilon\). Each \(F_k\) is nonempty (by assumption) and closed in the product topology (since it depends only on the first \(k\) coordinates). Moreover, \(F_{k+1} \subseteq F_k\) because adding a congruence can only reduce the uncovered set. By the compactness of the product of finite sets, \(\bigcap_{k \ge 1} F_k\) is nonempty.
Choose an infinite sequence \((a_i) \in \bigcap_{k \ge 1} F_k\). For this sequence, let \(U_k\) be the set of integers not covered by the first \(k\) congruences, and let \(d_k\) be the density of \(U_k\). Then \(d_k \ge \epsilon\) for all \(k\). Since \(U_{k+1} \subseteq U_k\), the sets \(U_k\) are decreasing and periodic, and their intersection \(U = \bigcap_{k \ge 1} U_k\) has density \(d = \lim_{k \to \infty} d_k \ge \epsilon\). However, by hypothesis, for any choice of residues, the uncovered set has density \(0\), a contradiction.
Therefore, for every \(\epsilon > 0\), there exists a \(k\) such that for every choice of congruence classes \(a_i\), the density of integers not covered by the first \(k\) congruences is less than \(\epsilon\).
\boxed{\text{Yes}}
> I then gave both proofs to Opus and it confirmed their equivalence.
You could have just rubber-stamped it yourself, for all the mathematical rigor it holds. The devil is in the details, and the smallest problem unravels the whole proof.
I am not familiar with the field, but any chance that the deepseek is just memorizing the existing solution? Or different.
https://news.ycombinator.com/item?id=46664976
Sure but if so wouldn't ChatGPT 5.2 Pro also "just memorizing the existing solution?"?
No it's not, you can refer to my link and subsequent discussion.
I don't see what's related there but anyway unless you have access to information from within OpenAI I don't see how you can claim what was or wasn't in the training data of ChatGPT 5.2 Pro.
On the contrary for DeepSeek you could but not for a non open model.
I find it interesting that, as someone utterly unfamiliar with ergodic theory, Dini’s theorem, etc, I find Deepseek’s proof somewhat comprehensible, whereas I do not find GPT-5.2’s proof comprehensible at all. I suspect that I’d need to delve into the terminology in the GPT proof if I tried to verify Deepseek’s, so maybe GPT’s is being more straightforward about the underlying theory it relies on?
From Terry Tao's comments in the thread:
"Very nice! ... actually the thing that impresses me more than the proof method is the avoidance of errors, such as making mistakes with interchanges of limits or quantifiers (which is the main pitfall to avoid here). Previous generations of LLMs would almost certainly have fumbled these delicate issues.
...
I am going ahead and placing this result on the wiki as a Section 1 result (perhaps the most unambiguous instance of such, to date)"
The pace of change in math is going to be something to watch closely. Many minor theorems will fall. Next major milestone: Can LLMs generate useful abstractions?
Seems like the someone dug something up from the literature on this problem (see top comment on the erdosproblems.com thread)
"On following the references, it seems that the result in fact follows (after applying Rogers' theorem) from a 1936 paper of Davenport and Erdos (!), which proves the second result you mention. ... In the meantime, I am moving this problem to Section 2 on the wiki (though the new proof is still rather different from the literature proof)."
The erdosproblems thread itself contains comments from Terence Tao: https://www.erdosproblems.com/forum/thread/281
Has anyone verified this?
I've "solved" many math problems with LLMs, with LLMs giving full confidence in subtly or significantly incorrect solutions.
I'm very curious here. The Open AI memory orders and claims about capacity limits restricting access to better models are interesting too.
Terence Tao gave it the thumbs up. I don't think you're going to do better than that.
It's already been walked back.
Not in the sense of being a "subtly or significantly incorrect solution".
I guess the first question I have is if these problems solved by LLMs are just low-hanging fruit that human researchers either didn't get around to or show much interest in - or if there's some actual beef here to the idea that LLMs can independently conduct original research and solve hard problems.
That's the first warning from the wiki : <<Erdős problems vary widely in difficulty (by several orders of magnitude), with a core of very interesting, but extremely difficult problems at one end of the spectrum, and a "long tail" of under-explored problems at the other, many of which are "low hanging fruit" that are very suitable for being attacked by current AI tools.>> https://github.com/teorth/erdosproblems/wiki/AI-contribution...
There is still value on letting these LLMs loose on the periphery and knocking out all the low hanging fruit humanity hasn’t had the time to get around to. Also, I don’t know this, but if it is a problem on Erdos I presume people have tried to solve it atleast a little bit before it makes it to the list.
Is there though? If they are "solved" (as in the tickbox mark them as such, through a validation process, e.g. another model confirming, formal proof passing, etc) but there is no human actually learning from them, what's the benefit? Completing a list?
I believe the ones that are NOT studied are precisely because they are seen as uninteresting. Even if they were to be solved in an interesting way, if nobody sees the proof because they are just too many and they are again not considered valuable then I don't see what is gained.
Out of curiosity why has the LLM math solving community been focused on the Erdos problems over other open problems? Are they of a certain nature where we would expect LLMs to be especially good at solving them?
I guess they are at a difficulty where it's not too hard (unlike millennium prize problems), is fairly tightly scoped (unlike open ended research), and has some gravitas (so it's not some obscure theorem that's only unproven because of it's lack of noteworthiness).
I have 15 years of software engineering experience across some top companies. I truly believe that ai will far surpass human beings at coding, and more broadly logic work. We are very close
HN will be the last place to admit it; people here seem to be holding out with the vague 'I tried it and it came up with crap'. While many of us are shipping software without touching (much) code anymore. I have written code for over 40 years and this is nothing like no-code or whatever 'replacing programmers' before, this is clearly different judging from the people who cannot code with a gun to their heads but still are shipping apps: it does not really matter if anyone believes me or not. I am making more money than ever with fewer people than ever delivering more than ever.
We are very close.
(by the way; I like writing code and I still do for fun)
Both can be correct : you might be making a lot of money using the latest tools while others who work on very different problems have tried the same tools and it's just not good enough for them.
The ability to make money proves you found a good market, it doesn't prove that the new tools are useful to others.
> holding out with the vague 'I tried it and it came up with crap'
Isn't that a perfectly reasonable metric? The topic has been dominated by hype for at least the past 5 if not 10 years. So when you encounter the latest in a long line of "the future is here the sky is falling" claims, where every past claim to date has been wrong, it's natural to try for yourself, observe a poor result, and report back "nope, just more BS as usual".
If the hyped future does ever arrive then anyone trying for themselves will get a workable result. It will be trivially easy to demonstrate that naysayers are full of shit. That does not currently appear to be the case.
What topic are you referring to? ChatGPT release was just over 3 years ago. 5 years ago we had basic non-instruct GPT-3.
Wasn't transformer 2017? There's been constant AI hype since at least that far back and it's only gotten worse.
If I release a claim once a month that armageddon will happen next month, and then after 20 years it finally does, are all of my past claims vindicated? Or was I spewing nonsense the entire time? What if my claim was the next big pandemic? The next 9.0 earthquake?
But the trend line is less ambiguous, models got better year over year, much much better.
I don't dispute that the situation is rapidly evolving. It is certainly possible that we could achieve AGI in the near future. It is also entirely possible that we might not. Claims such as that AGI is close or that we will soon be replacing developers entirely are pure hype.
Gotta make sure that the investors read this message in an Erdos thread.
They already do. What they suck at is common sense. Unfortunately good software requires both.
Most people also suck at common sense, including most programmers, hence most programmers do not write good software to begin with.
Even a 20 year old Markov chain could produce this banality.
Or is it fortunate (for a short period at least).
They can only code to specification which is where even teams of humans get lost. Without much smarter architecture for AI (LLMs as is are a joke) that needle isn’t going to move.
This is crazy. It's clear that these models don't have human intelligence, but it's undeniable at this point that they have _some_ form of intelligence.
If LLMs weren't created by us but where something discovered in another species' behaviour it would be 100% labelled intelligence
I don't think they will ever have human intelligence. It will always be an alien intelligence.
But I think the trend line unmistakably points to a future where it can be MORE intelligent than a human in exactly the colloquial way we define "more intelligent"
The fact that one of the greatest mathematicians alive has a page and is seriously bench marking this shows how likely he believes this can happen.
My take is that a huge part of human intelligence is pattern matching. We just didn’t understand how much multidimensional geometry influenced our matches
Yes, it could be that intelligence is essentially a sophisticated form of recursive, brute force pattern matching.
I'm beginning to think the Bitter Lesson applies to organic intelligence as well, because basic pattern matching can be implemented relatively simply using very basic mathematical operations like multiply and accumulate, and so it can scale with massive parallelization of relatively simple building blocks.
I don't think it's accurate to describe LLMs as pattern matching. Prediction is the mechanism they use to ingest and output information, and they end up with a (relatively) deep model of the world under the hood.
The "pattern matching" perspective is true if you zoom in close enough, just like "protein reactions in water" is true for brains. But if you zoom out you see both humans and LLMs interact with external environments which provide opportunity for novel exploration. The true source of originality is not inside but in the environment. Making it be all about the model inside is a mistake, what matters more than the model is the data loop and solution space being explored.
"Pattern matching" is not sufficiently specified here for us to say if LLMs do pattern matching or not. E.g. we can say that an LLM predicts the next token because that token (or rather, its embedding) is the best "match" to the previous tokens, which form a path ("pattern") in embedding space. In this sense LLMs are most definitely pattern matching. Under other formulations of the term, they may not be (e.g. when pattern matching refers to abstraction or abstracting to actual logical patterns, rather than strictly semantic patterns).
Yes, the world model building is achieved via pattern matching and happens during ingestion and training, but that is also part of the intelligence.
Which is even more true for humans.
Depends on what you mean by intelligence, human intelligence and human
It's pattern matching. Which is actually what we measure in IQ tests, just saying.
There's some nuance. IQ tests measure pattern matching and, in an underlying way, other facets of intelligence - memory, for example. How well can an LLM 'remember' a thing? Sometimes Claude will perform compaction when its context window reaches 200k "tokens" then it seems a little colder to me, but maybe that's just my imagination. I'm kind of a "power user".
I call it matching. Pattern matching had a different meaning.
what are you referring to? LLMs are neural networks at their core and the most simple versions of neural networks are all about reproducing patterns observed during training
As someone who doesn't understand this shit, and how it's always the experts who fiddle the LLMs to get good outputs, it feels natural to attribute the intelligence to the operator (or the training set), rather than the LLM itself.
This is showing as unresolved here, so I'm assuming something was retracted.
https://mehmetmars7.github.io/Erdosproblems-llm-hunter/probl...
I think that just hasn't been updated.
Funny seeing silicon valley bros commenting "you're on fire!" to Neel when it appears he copied and pasted the problem verbatim into chatGPT and it did literally all the other work here
https://chatgpt.com/share/696ac45b-70d8-8003-9ca4-320151e081...
Narrator: The solution had already appeared several times in the training data
This must be what it feels like to be a CEO and someone tells me they solved coding.