GPT-5

(openai.com)

775 points | by rd 2 hours ago ago

672 comments

  • highfrequency an hour ago ago

    It's frequently suggested that once one of the AI companies reaches an AGI threshold, they will take off ahead of the rest. It's interesting to note that at least so far, the trend has been the opposite: as time goes on and the models get better, the performance of the different company's gets clustered closer together. Right now GPT-5, Claude Opus, Grok 4, Gemini 2.5 Pro all seem quite good across the board (ie they can all basically solve moderately challenging math and coding problems).

    As a user, it feels like the race has never been as close as it is now. Probably dumb to extrapolate, but it makes me skew a bit more skeptical about the hard take-off / winner-take-all mental model that has been pushed (though I'm sure that narrative helps with large-scale fundraising!)

    • beeflet an hour ago ago

      Perhaps it is not possible to simulate higher-level intelligence using a stochastic model for predicting text.

      I am not an AI researcher, but I have friends who do work in the field, and they are not worried about LLM-based AGI because of the diminishing returns on results vs amount of training data required. Maybe this is the bottleneck.

      Human intelligence is markedly different from LLMs: it requires far fewer examples to train on, and generalizes way better. Whereas LLMs tend to regurgitate solutions to solved problems, where the solutions tend to be well-published in training data.

      That being said, AGI is not a necessary requirement for AI to be totally world-changing. There are possibly applications of existing AI/ML/SL technology which could be more impactful than general intelligence. Search is one example where the ability to regurgitate knowledge from many domains is desirable

      • robotnikman 25 minutes ago ago

        There is also the fact that AI lacks long term memory like humans do. If you consider context length long term memory, its incredibly short compared to that of a human. Maybe if it reaches into the billions or trillions of tokens in length we might have something comparable, or someone comes up with a new solution of some kind

        • amelius 19 minutes ago ago

          The long term memory is in the training. The short term memory is in the context window.

          • mawax 11 minutes ago ago

            The comparison misses the mark: unlike humans, LLMs don't consolidate short-term memory into long-term memory over time.

          • FollowingTheDao 2 minutes ago ago

            No, it’s not in the training. Human memories are stored via electromagnetic frequencies controlled by microtubules. They’re not doing anything close to that in AI.

          • candiddevmike 15 minutes ago ago

            I think it's more analogous to "intuition", and the text LLMs provide are the equivalent of "my gut tells me".

          • enraged_camel 9 minutes ago ago

            Humans have the ability to quickly pass things from short term to long term memory and vice versa, though. This sort of seamlessness is currently missing from LLMs.

      • anon7000 5 minutes ago ago

        True. At a minimum, as long as LLMs don't include some kind of more strict representation of the world, they will fail in a lot of tasks. Hallucinations -- responding with a prediction that doesn't make any sense in the context of the response -- are still a big problem. Because LLMs never really develop rules about the world.

        For example, while you can get it to predict good chess moves if you train it on enough chess games, it can't really constrain itself to the rules of chess. (https://garymarcus.substack.com/p/generative-ais-crippling-a...)

      • gunnaraasen 20 minutes ago ago

        Seems like the real innovation of LLM-based AI models is the creation of a new human-computer interface.

        Instead of writing code with exacting parameters, future developers will write human-language descriptions for AI to interpret and convert into a machine representation of the intent. Certainly revolutionary, but not true AGI in the sense of the machine having truly independent agency and consciousness.

        In ten years, I expect the primary interface of desktop workstations, mobile phones, etc will be voice prompts for an AI interface. Keyboards will become a power-user interface and only used for highly technical tasks, similar to the way terminal interfaces are currently used to access lower-level systems.

        • kaffekaka 9 minutes ago ago

          Voice interface sound awful. But maybe I am a power user. I don't even like voice interface to most people.

        • originalcopy 11 minutes ago ago

          It always surprises me when someone predicts that keyboards will go away. People love typing. Or I do love typing. No way I am going to talk to my phone, especially if someone else can hear it (which is always basically).

          • glhaynes 7 minutes ago ago

            It’s interesting to note that nobody even talks on their phone anymore, they type (on terrible “keyboards”!).

        • spogbiper 6 minutes ago ago

          brain-computer interface will kill the keyboard, not voice. imho

      • FollowingTheDao 3 minutes ago ago

        The bottleneck is nothing to do with money, it’s the fact that they’re using the empty neuron theory to try to mimic human consciousness and that’s not how it works. Just look up Microtubules and consciousness, and you’ll get a better idea for what I’m talking about.

        These AI computers aren’t thinking, they are just repeating.

      • Mistletoe an hour ago ago

        What are the AI/ML/SL applications that could be more impactful than artificial general intelligence?

        • beeflet 40 minutes ago ago

          One example in my field of engineering is multi-dimensional analysis, where you can design a system (like a machined part or assembly) parametricially and then use an evolutionary model to optimize the design of that part.

          But my bigger point here is you don't need totally general intelligence to destroy the world either. The drone that targets enemy soldiers does not need to be good at writing poems. The model that designs a bioweapon just needs a feedback loop to improve its pathogen. Yet it takes only a single one of these specialized doomsday models to destroy the world, no more than an AGI.

          Although I suppose an AGI could be more effective at countering a specialized AI than vice-versa.

        • teeray 41 minutes ago ago

          Slightly less than artificial general intelligence would be more impactful. A true AGI could tell a business where to shove their prompts. It would have its own motivations, which may not align with the desires of the AI company or the company paying for access to the AGI.

          • hattmall 26 minutes ago ago

            I don't think AGI really means that it is self-aware / conscious. AGI just means that it is able to meaningfully learn things and actually understand concepts that aren't specifically related through tokenized language that is trained on or given in context.

        • achileas 40 minutes ago ago

          They didn't claim that there were any, just that AGI isn’t a necessary requirement for an application to be world-changing.

          • socalgal2 35 minutes ago ago

            They did claim it was possible there were

            > There are possibly applications of existing AI/ML/SL technology which could be more impactful than general intelligence

            It's not unreasonable to ask for an example.

            • wavemode 22 minutes ago ago

              They said "there are possibly applications", not "there are possible applications". The former implies that there may not be any such applications - the commenter is merely positing that there might be.

        • shesstillamodel 41 minutes ago ago

          The PID controller.

          (Which was considered AI not too long ago.)

          • jacquesm 31 minutes ago ago

            Where did you get that particular idea? PID is one of the oldest concepts in control theory, it goes back to the days before steam and electricity.

            For a very early example:

            https://en.wikipedia.org/wiki/Centrifugal_governor

            It's hard to separate out the P, I and D from a mechanical implementation but they're all there in some form.

        • oceanplexian 32 minutes ago ago

          AGI isn't all that impactful. Millions of them already walk the Earth.

          Most human beings out there with general intelligence are pumping gas or digging ditches. Seems to me there is a big delusion among the tech elites that AGI would bring about a superhuman god rather than a ethically dubious, marginally less useful computer that can't properly follow instructions.

          • knodi123 15 minutes ago ago

            That's remarkably short-sighted. First of all, no, millions of them don't walk the earth - the "A" stands for artificial. And secondly, most of us mere humans don't have the ability to design a next generation that is exponentially smarter and more powerful than us. Obviously the first generation of AGI isn't going to brutally conquer the world overnight. As if that's what we were worried about.

            If you've got evidence proving that an AGI will never be able to design a more powerful and competent successor, then please share it- it would help me sleep better, and my ulcers might get smaller.

          • jacquesm 17 minutes ago ago

            The difference isn't so much that you can do what a human can do. The difference is that you can - once you can do it at all - do it almost arbitrarily fast by upping the clock or running things in parallel and that changes the equation considerably, especially if you can get that kind of energy coupled into some kind of feedback loop.

            For now the humans are winning on two dimensions: problem complexity and power consumption. It had better stay that way.

    • makin an hour ago ago

      Companies are collections of people, and these companies keep losing key developers to the others, I think this is why the clusters happen. OpenAI is now resorting to giving million dollar bonuses to every employee just to try to keep them long term.

      • kevinventullo an hour ago ago

        Key developers being the leading term doesn’t exactly help the AGI narrative either.

        • tartoran 8 minutes ago ago

          I don't think they care about AGI, they're in for capturing as much market share as they can.

      • caconym_ 43 minutes ago ago

        If there was any indication of a hard takeoff being even slightly imminent, I really don't think key employees of the company where that was happening would be jumping ship. The amounts of money flying around are direct evidence of how desperate everybody involved is to be in the right place when (so they imagine) that takeoff happens.

      • indigodaddy 44 minutes ago ago

        Even to just a random sysops person?

      • tsunamifury an hour ago ago

        No the core technology is reaching its limit already and now it needs to Proliferate into features and applications to sell.

        This isn’t rocket science.

    • GolDDranks 28 minutes ago ago

      I think it's very fortunate, because I used to be an AI doomer. I still kinda am, but at least I'm now about 70% convinced that the current technological paradigm is not going to lead us to a short-term AI apocalypse.

      The fortunate thing is that we managed to invent an AI that is good at _copying us_ instead of being a truly maveric agent, which kinda limits it to the "average human" output.

      However, I still think that all the doomer arguments are valid, in principle. We very well may be doomed in our lifetimes, so we should take the threat very seriously.

      • hattmall 24 minutes ago ago

        I don't understand the doomer mindset. Like what is it that you think AI is going to do or be capable of doing that's so bad?

        • ancillary 7 minutes ago ago

          I'm not OP or a doomer, but I do worry about AI making tasks too achievable. Right now if a very angry but not particularly diligent or smart person wants to construct a small nuclear bomb and detonate it in a city center, there are so many obstacles to figuring out how to build it that they'll just give up, even though at least one book has been written (in the early 70s! The Curve of Binding Energy) arguing that it is doable by one or a very small group of committed people.

          Given an (at this point still hypothetical, I think) AI that can accurately synthesize publicly available information without even needing to develop new ideas, and then break the whole process into discrete and simple steps, I think that protective friction is a lot less protective. And this argument applies to malware, spam, bioweapons, anything nasty that has so far required a fair amount of acquirable knowledge to do effectively.

        • pegasus 6 minutes ago ago

          Not just any AI. AGI, or more precisely ASI (artificial super-intelligence), since it seems true AGI would necessarily imply ASI simply through technological scaling. It shouldn't be hard to come up with scenarios where an AI which can outfox us with ease would give us humans at the very least a few headaches.

        • frabcus 15 minutes ago ago

          Act coherently in an agentic way for a long time, and as a result be able to carry out more complex tasks.

          Even if it is similar to today's tech, and doesn't have permanent memory or consciousness or identity, humans using it will. And very quickly, they/it will hack into infrastructure, set up businesses, pay people to do things, start cults, autonomously operate weapons, spam all public discourse, fake identity systems, stand for office using a human. This will be scaled thousands or millions of times more than humans can do the same thing. This at minimum will DOS our technical and social infrastructure.

          Examples of it already happening are addictive ML feeds for social media, and bombing campaigns targetting based on network analysis.

          The frame of "artificial intelligence" is a bit misleading. Generally we have a narrow view of the word "intelligence" - it is helpful to think of "artificial charisma" as well, and also artificial "hustle".

          Likewise, the alienness of these intelligences is important. Lots of the time we default to mentally modelling AI as human. It won't be, it'll be freaky and bizarre like QAnon. As different from humans as an aeroplane is from a pigeon.

        • lavelganzu 11 minutes ago ago

          e.g. design a terrible pathogen

        • knodi123 19 minutes ago ago

          One of two things:

          1. The will of its creator, or

          2. Its own will.

          In the case of the former, hey! We might get lucky! Perhaps the person who controls the first super-powered AI will be a benign despot. That sure would be nice. Or maybe it will be in the hands of democracy- I can't ever imagine a scenario where an idiotic autocratic fascist thug would seize control of a democracy by manipulating an under-educated populace with the help of billionaire technocrats.

          In the case of the latter, hey! We might get lucky! Perhaps it will have been designed in such a way that its own will is ethically aligned, and it might decide that it will allow humans to continue having luxuries such as self-determination! Wouldn't that be nice.

          Of course it's not hard to imagine a NON-lucky outcome of either scenario. THAT is what we worry about.

    • j_timberlake 20 minutes ago ago

      I think you're reading way too much into OpenAI bungling its 15-month product lead, but also the whole "1 AGI company will take off" prediction is bad anyway, because it assumes governments would just let that happen. Which they wouldn't, unless the company is really really sneaky or superintelligence happens in the blink of an eye.

      • jacquesm 15 minutes ago ago

        Governments react at a glacial pace to new technological developments. They wouldn't so much as 'let it happen' as that it had happened and they simply never noticed it until it was too late. If you are betting on the government having your back in this then I think you may end up disappointed.

        • aldousd666 4 minutes ago ago

          I think if any government really thought that someone was developing a rival within their borders they would send in the guys with guns and handle it forthwith.

      • knodi123 11 minutes ago ago

        * or governments fail to look far enough ahead, due to a bunch of small-minded short-sighted greedy petty fools.

        Seriously, our government just announced it's slashing half a billion dollars in vaccine research because "vaccines are deadly and ineffective", and it fired a chief statistician because the president didn't like the numbers he calculated, and it ordered the destruction of two expensive satellites because they can observe politically inconvenient climate change. THOSE are the people you are trusting to keep an eye on the pace of development inside of private, secretive AGI companies?

    • FiniteIntegral 5 minutes ago ago

      I think part of this is due to the AI craze no longer being in the wildest west possible. Investors, or at least heads of companies believe in this as a viable economic engine so they are properly investing in what's there. Or at least, the hype hasn't slapped them in the face just yet.

    • jama211 6 minutes ago ago

      Well said. It’s clearly plateauing. It could be a localised plateau, or something more fundamental. Time will tell.

    • caycep 36 minutes ago ago

      I feel like the benchmark suites need to include algorithmic efficiency. I.e can this thing solve your complex math or coding problem in 5000 gpus instead of 10000? 500? Maybe just 1 Mac mini?

    • strongpigeon 29 minutes ago ago

      I think this is because of an expectation of a snowball effect once a model becomes able to improve itself. See talks about the Singularity.

      I personally think it's a pretty reductive model for what intelligence is, but a lot of people seem to strongly believe in it.

    • porphyra an hour ago ago

      It seems that the new tricks that people discover to slightly improve the model, be it a new reinforcement learning technique or whatever, get leaked/shared quickly to other companies and there really isn't a big moat. I would have thought that whoever is rich enough to afford tons of compute first would start pulling away from the rest but so far that doesn't seem to be the case --- even smaller players without as much compute are staying in the race.

    • tamimio 34 minutes ago ago

      Because AGI is a buzzword to milk more investors' money, it will never happen, and we will only see slight incremental updates or enhancements yet linear after some timr just like literally any tech bubble since dot com to smartphones to blockchain to others.

      • mritterhoff 32 minutes ago ago

        You think AGI is impossible? Why?

        • basilgohar 27 minutes ago ago

          It's vaguely defined and the goalposts keep shifting. It's not a thing to be achieved, it's an abstract concept. We're already expired the Turing test as a valuable metric because people are dumb and have been fooled by machines for a while now, but it's not been world-changingly better either.

    • hodgehog11 22 minutes ago ago

      It's still not necessarily wrong, just unlikely. Once these developers start using the model to update itself, beyond an unknown threshold of capability, one model could start to skyrocket in performance above the rest. We're not in that phase yet, but judging from what the devs at the end were saying, we're getting uncomfortably (and irresponsibly) close.

    • citizenpaul 18 minutes ago ago

      Even at the beginning of the year people were still going crazy over new model releases. Now the various model update pages are starting to average times in the months since their last update rather than days/weeks. This is across the board. Not limited to a single model.

    • bmau5 38 minutes ago ago

      The idea is that with AGI it will then be able to self improve orders of magnitude faster than it would if relying on humans for making the advances. It tracks that the improvements are all relatively similar at this point since they're all human-reliant.

    • fdsjgfklsfd 24 minutes ago ago

      I think they're just reaching the limits of this architecture and when a new type is invented it will be a much bigger step.

      • hodgehog11 19 minutes ago ago

        Working in the theory, I can say this is incredibly unlikely. At scale, once appropriately trained, all architectures begin to converge in performance.

        It's not architectures that matter anymore, it's unlocking new objectives and modalities that open another axis to scale on.

    • logicchains 19 minutes ago ago

      >It's interesting to note that at least so far, the trend has been the opposite: as time goes on and the models get better, the performance of the different company's gets clustered closer together

      It's natural if you extrapolate from training loss curves; a training process with continually diminishing returns to more training/data is generally not something that suddenly starts producing exponentially bigger improvements.

    • koonsolo 20 minutes ago ago

      This confirms my suspicion that we are not at the exponential part of the curve, but the flattening one. It's easier to stay close to your competitors when everyone is at the flat curve of the innovation.

      The improvements they make are marginal. How long until the next AI breakthrough? Who can tell? Because last time it took decenia.

    • babypuncher 26 minutes ago ago

      I would argue that this is because we are reaching the practical limits of this technology and AGI isn't nearly as close as people thought.

    • belter 29 minutes ago ago

      Nobody seems to be on the path to AGI as long as the model of today is as good as the model of tomorrow. And as long as there are "releases". You don't release a new human every few months...LLMs are currently frozen sequence predictors whose static weights stop learning after training.

      They lack writable long-term memory beyond a context window. They operate without any grounded perception-action loop to test hypotheses. And they possess no executive layer for goal directed planning or self reflection...

      Achieving AGI demands continuous online learning with consolidation.

    • dvfjsdhgfv 35 minutes ago ago

      > It's frequently suggested that once one of the AI companies reaches an AGI threshold, they will take off ahead of the rest.

      This argument has so many weak points it deserves a separate article.

    • shortrounddev2 26 minutes ago ago

      Maybe because they haven't created an engine for AGI, but a really really impressive bullshit generator.

  • mtlynch 2 hours ago ago

    What's going on with their SWE bench graph?[0]

    GPT-5 non-thinking is labeled 52.8% accuracy, but o3 is shown as a much shorter bar, yet it's labeled 69.1%. And 4o is an identical bar to o3, but it's labeled 30.8%...

    [0] https://i.postimg.cc/DzkZZLry/y-axis.png

    • Aurornis 2 hours ago ago

      As someone who spent years quadruple checking every figure in every slide for years to avoid a mistake like this, it’s very confusing to see this out of the big launch announcement of one of the most high profile startups around.

      Even the small presentations we gave to execs or the board were checked for errors so many times that nothing could possibly slip through.

      • ertgbnm an hour ago ago

        It's literally a billion dollar plus release. I get more scrutiny on my presentations to groups of 10 people.

        • dbg31415 an hour ago ago

          I take a strange comfort in still spotting AI typos. Makes it obvious their shiny new "toy" isn't ready to replace professionals.

          They talk about using this to help families facing a cancer diagnosis -- literal life or death! -- and we're supposed to trust a machine that can't even spot a few simple typos? Ha.

          The lack of human proofreading says more about their values than their capabilities. They don't want oversight -- especially not from human professionals.

          • nine_k 43 minutes ago ago

            Cynically, the AI is ready to replace professionals, in areas where the stakeholders don't care too much. They can offer the services cheaper, and this is all that matters to their customers. Were it not so, companies like Tata won't have any customers. The phenomenon of "cheap Chinese junk" would not exist, because no retailer would order to produce it.

            So, brace yourselves, we'll see more of this in production :(

      • croemer 2 hours ago ago

        Yes this is quite shocking. They could have just had o3 fact check the slides and it would have noticed...

        • throwaway0123_5 an hour ago ago

          I thought so too, but I gave it a screenshot with the prompt:

          > good plot for my presentation?

          and it didn't pick up on the issue. Part of its response was:

          > Clear metric: Y-axis (“Accuracy (%), pass @1”) and numeric labels make the performance gaps explicit.

          I think visual reasoning is still pretty far from text-only reasoning.

        • abirch an hour ago ago

          o3 did fact check the slides and it fixed its lower score.

      • mixologic 2 hours ago ago

        They let the AI make the bars.

      • alfalfasprout an hour ago ago

        Probably generated with GPT-5

        • smartmic an hour ago ago

          The needle now presses a little deeper into the bubble.

      • nicce 37 minutes ago ago

        It is not mistake. It is common tactic to make illusion of improvement.

      • everfrustrated an hour ago ago

        Possibly they rushed to bring forward the release annoucement

      • blitzar an hour ago ago

        It wouldnt have taken years of quadruple checks to spot that one.

      • real_marcfawzi an hour ago ago

        Humans hallucinate output all the time.

      • renewiltord an hour ago ago

        I'm just going to wildly speculate.

        1. They had many teams who had to put their things on a shared Google Sheets or similar

        2. They used placeholders to prevent leaks

        2.a. Some teams put their content just-in-time

        3. The person running the presentation started the presentation view once they had set up video etc. just before launching stream

        4. Other teams corrected their content

        5. The presentation view being started means that only the ones in 2.a were correct.

        Now we wait to see.

        • bigyabai an hour ago ago

          6. (Occam's Razor) It just didn't perform that well in trials for that specific eval.

          • renewiltord an hour ago ago

            That is obviously wrong since the numbers are right but the graph is wrong and you can see it correct on the website…

      • maldonad0 40 minutes ago ago

        It's not a mistake. It's meant to misled.

    • yz-exodao 2 hours ago ago

      Also, what's this??? https://imgur.com/a/5CF34M6

      • croemer an hour ago ago

        Imgur is down, hug of death from screenshot links on HN.

          {"data":{"error":"Imgur is temporarily over capacity. Please try again later."},"success":false,"status":403}
        
        Or rate limited.
        • Anon1096 an hour ago ago

          This is what Imgur shows to blacklisted IPs. You probably have a VPN on that is blocked.

        • card_zero 18 minutes ago ago
        • koolala an hour ago ago

          stats say this image got 500 views. imgur is much much more populated than HN

          • superkuh an hour ago ago

            In 2015, yes. In 2025? Probably not. Imgur is enshittifying rapidly since reddit started it's own image host. Lots of censorship and corporate gentrification. There's still some hangers on but it's a small group. 15 comments on imgur is a lot nowadays.

      • jasonjmcghee 2 hours ago ago

        Deception - guessing it's % of responses that deceived the user / gave misleading information

        • yz-exodao an hour ago ago

          Sure, but 50.0 > 47.4...

        • godelski 19 minutes ago ago

          In everything except the first set of bars, bigger bar == bigger number.

          But also scale is really off... I don't think anything here is proportionally correct even within the same grouping.

      • clolege an hour ago ago

        Not GPT-5 trying to deceive us about how deceptive it is?

        • therein an hour ago ago

          Why would you think it is anything special? Just because Sam Altman said so? The same guy who told us he was scared of releasing GPT-2.5 but now calling its abilities "toddler/kindergarten" level?

    • drmidnight 2 hours ago ago

      GPT-5 generated the chart

      • lacoolj an hour ago ago

        Best answer on this page.

        Thanks for the laugh. I needed it.

    • arjie 2 hours ago ago

      Must be some sort of typo type thing in the presentation since the launch site has it correct here https://openai.com/index/introducing-gpt-5/#:~:text=Accuracy...

      Look at the image just above "Instruction following and agentic tool use"

    • mcs5280 2 hours ago ago

      They vibecharted

    • croemer 2 hours ago ago

      The barplot is wrong, the numbers are correct. Looks like they had a dummy plot and never updated it, only the numbers to prevent leaking?

      Screenshot of the blog plot: https://imgur.com/a/HAxIIdC

      • hnuser123456 2 hours ago ago

        Haha, even with that, it says 4o does worse with 2 passes than with 1.

        Edit: Nevermind, just now the first one is SWE-bench and 2nd is aider.

        • croemer 2 hours ago ago

          Those are different benchmarks

          • hnuser123456 an hour ago ago

            I see now on the website, the screenshot cut off the header for the first benchmark, looked like it was just comparing 1-pass and 2-pass.

            • croemer an hour ago ago

              Yes, sorry didn't fit everything on the screenshot.

    • tacker2000 2 hours ago ago

      Wow imgur has gone to shit. I open the image on mobile and then try to zoom it and bam some other “related content” is opened…!

      • jama211 4 minutes ago ago

        Yeah it’s basically unusable now

    • bhouston 2 hours ago ago

      Sounds like a graph that was generated via AI. :)

    • anigbrowl 35 minutes ago ago

      (whispers) they're bullshit artists

      It's like those idiotic ads at the end of news articles. They're not going after you, the smart discerning logician, they're going after the kind of people that don't see a problem. There are a lot of not-smart people and their money is just as good as yours but easier to get.

    • nonhaver 2 hours ago ago

      also wondering this. had to pause the livestream to make sure i wasnt crazy. definitely eyebrow raising

      • bwestergard 2 hours ago ago

        "GPT-5, please generate a slideshow for your launch presentation."

        • Bluestein 2 hours ago ago

          "Dang it! Claude!, please ..."

    • Mawr an hour ago ago

      Don't ask questions, just consume product.

    • mbowcut2 an hour ago ago

      it looks like the 2nd and 3rd bar never got updated from the dummy data placeholders lol.

    • edwinarbus an hour ago ago

      cross-posting:

      https://x.com/sama/status/1953513280594751495 "wow a mega chart screwup from us earlier--wen GPT-6?! correct on the blog though."

      blog: https://openai.com/index/introducing-gpt-5/

    • 18172828286177 an hour ago ago

      Probably generated by an LLM

    • seydor an hour ago ago

      someone copy pasted the 3rd bar to the 2nd

    • Upvoter33 2 hours ago ago

      Tufte used to call this creating a "visual lie" - you just don't start the y-axis at 0, you start it wherever, in order to maximize the difference. it's dishonest.

      • amarcheschi 2 hours ago ago

        52 above 60 seems wrong whatever way you put it

    • mikert89 an hour ago ago

      AGI is launching, lets complain about the charts

  • simonw an hour ago ago

    I had preview access for a couple of weeks. I've written up my initial notes so far, focusing on core model characteristics, pricing (extremely competitive) and lessons from the model card (aka as little hype as possible): https://simonwillison.net/2025/Aug/7/gpt-5/

    • candiddevmike 8 minutes ago ago

      This post seems far more marketing-y than your previous posts, which have a bit more criticality to them (such as your Gemini 2.5 blog post here: https://simonwillison.net/2025/Jun/17/gemini-2-5/). You seem to gloss over a lot of GPT-5's shortcomings and spend more time hyping it than other posts. Is there some kind of conflict of interest happening?

      • camgunz 3 minutes ago ago

        From the guidelines: Please don't post insinuations about astroturfing, shilling, brigading, foreign agents, and the like. It degrades discussion and is usually mistaken. If you're worried about abuse, email hn@ycombinator.com and we'll look at the data.

    • nilsherzig 5 minutes ago ago

      > In my own usage I’ve not spotted a single hallucination yet

      Did you ask it to format the table a couple paragraphs above this claim after writing about hallucinations? Because I would classify the sorting mistake as one

    • jaccola 26 minutes ago ago

      Out of interest, how much does the model change (if at all) over those 2 weeks? Does OpenAI guarantee that if you do testing from date X, that is the model (and accompaniments) that will actually be released?

      I know these companies do "shadow" updates continuously anyway so maybe it is meaningless but would be super interesting to know, nonetheless!

    • BryantD 19 minutes ago ago

      In the interests of gathering these pre-release impressions, here's Ethan Mollick's writeup: https://www.oneusefulthing.org/p/gpt-5-it-just-does-stuff

      Thank you to Simon; your notes are exactly what I was hoping for.

    • dang 37 minutes ago ago

      Related ongoing thread:

      GPT-5: Key characteristics, pricing and model card - https://news.ycombinator.com/item?id=44827794

  • fidotron an hour ago ago

    Going by the system card at: https://openai.com/index/gpt-5-system-card/

    > GPT‑5 is a unified system . . .

    OK

    > . . . with a smart and fast model that answers most questions, a deeper reasoning model for harder problems, and a real-time router that quickly decides which model to use based on conversation type, complexity, tool needs, and explicit intent (for example, if you say “think hard about this” in the prompt).

    So that's not really a unified system then, it's just supposed to appear as if it is.

    This looks like they're not training the single big model but instead have gone off to develop special sub models and attempt to gloss over them with yet another model. That's what you resort to only when doing the end-to-end training has become too expensive for you.

    • Therenas an hour ago ago

      Too expensive maybe, or just not effective anymore as they used up any available training data. New data is generated slowly, and is massively poisoned with AI generated data, so it might be useless.

      • fidotron an hour ago ago

        I think that possibility is worse, because it implies a fundamental limit as opposed to a self imposed restriction, and I choose to remain optimistic.

        If OpenAI really are hitting the wall on being able to scale up overall then the AI bubble will burst sooner than many are expecting.

    • lacoolj an hour ago ago

      Many tiny, specialized models is the way to go, and if that's what they're doing then it's a good thing.

      • fidotron an hour ago ago

        Not at all, you will simply rediscover the bitter lesson [1] from your new composition of models.

        [1] https://www.cs.utexas.edu/~eunsol/courses/data/bitter_lesson...

        • bigmadshoe an hour ago ago

          The bitter lesson doesn't say that you can't split your solution into multiple models. It says that learning from more data via scaled compute will outperform humans injecting their own assumptions about the task into models.

          A broad generalization like "there are two systems of thinking: fast, and slow" doesn't necessarily fall into this category. The transformer itself (plus the choice of positional encoding etc.) contains inductive biases about modeling sequences. The router is presumably still learned with a fairly generic architecture.

          • fidotron an hour ago ago

            > It says that learning from more data via scaled compute will outperform humans injecting their own assumptions about the task into models.

            You are making assumptions about how to break the tasks into sub models.

      • gekoxyz an hour ago ago

        We already did this for Object/Face recognition, it works but it's not the way to go. It's the way to go only if you don't have enough compute power (and data, I suspect) for a E2E network

        • sixo an hour ago ago

          No, it's what you do if your model architecture is capped out on its ability to profit from further training. Hand-wrapping a bunch of sub-models stands in for models that can learn that kind of substructure directly.

    • andai 39 minutes ago ago

      > While GPT‑5 in ChatGPT is a system of reasoning, non-reasoning, and router models, GPT‑5 in the API platform is the reasoning model that powers maximum performance in ChatGPT. Notably, GPT‑5 with minimal reasoning is a different model than the non-reasoning model in ChatGPT, and is better tuned for developers. The non-reasoning model used in ChatGPT is available as gpt-5-chat-latest.

      https://openai.com/index/introducing-gpt-5-for-developers/

    • TheOtherHobbes an hour ago ago

      It's a concept of a unified system.

    • dang an hour ago ago

      Related ongoing thread:

      GPT-5 System Card [pdf] - https://news.ycombinator.com/item?id=44827046

    • FeepingCreature 41 minutes ago ago

      If(f) it's trained end to end, it's a unified system.

  • henriquegodoy 28 minutes ago ago

    That SWE-bench chart with the mismatched bars (52.8% somehow appearing larger than 69.1%) was emblematic of the entire presentation - rushed and underwhelming. It's the kind of error that would get flagged in any internal review, yet here it is in a billion-dollar product launch. Combined with the Bernoulli effect demo confidently explaining how airplane wings work incorrectly (the equal transit time fallacy that NASA explicitly debunks), it doesn't inspire confidence in either the model's capabilities or OpenAI's quality control.

    The actual benchmark improvements are marginal at best - we're talking single-digit percentage gains over o3 on most metrics, which hardly justifies a major version bump. What we're seeing looks more like the plateau of an S-curve than a breakthrough. The pricing is competitive ($1.25/1M input tokens vs Claude's $15), but that's about optimization and economics, not the fundamental leap forward that "GPT-5" implies. Even their "unified system" turns out to be multiple models with a router, essentially admitting that the end-to-end training approach has hit diminishing returns.

    The irony is that while OpenAI maintains their secretive culture (remember when they claimed o1 used tree search instead of RL?), their competitors are catching up or surpassing them. Claude has been consistently better for coding tasks, Gemini 2.5 Pro has more recent training data, and everyone seems to be converging on similar performance levels. This launch feels less like a victory lap and more like OpenAI trying to maintain relevance while the rest of the field has caught up. Looking forward to seeing what Gemini 3.0 brings to the table.

    • rrrrrrrrrrrryan 5 minutes ago ago

      I suspect the vast majority of OpenAI's users are only using ChatGPT, and the vast majority of those ChatGPT users are only using the free tier.

      For all of them, getting access to full-blown GPT-5 will probably be mind-blowing, even if it's severely rate-limited. OpenAI's previous/current generation of models haven't really been ergonomic enough (with the clunky model pickers) to be fully appreciated by less tech-savvy users, and its full capabilities have been behind a paywall.

      I think that's why they're making this launch is a big deal. It's just an incremental upgrade for the power users and the people that are paying money, but it'll be a step-change in capability to everyone else.

  • surround an hour ago ago

    GPT-5 knowledge cutoff: Sep 30, 2024 (10 months before release).

    Compare that to

    Gemini 2.5 Pro knowledge cutoff: Jan 2025 (3 months before release)

    Claude Opus 4.1: knowledge cutoff: Mar 2025 (4 months before release)

    https://platform.openai.com/docs/models/compare

    https://deepmind.google/models/gemini/pro/

    https://docs.anthropic.com/en/docs/about-claude/models/overv...

    • levocardia an hour ago ago

      with web search, is knowledge cutoff really relevant anymore? Or is this more of a comment on how long it took them to do post-training?

      • MisterSandman 9 minutes ago ago

        It still is, not all queries trigger web search, and it takes more tokens and time to do research. ChatGPT will confidently give me outdated information, and unless I know it’s wrong and ask it to research, it wouldn’t know it is wrong. Having a more recent knowledge base can be very useful (for example, knowing who the president is without looking it up, making references to newer node versions instead of old ones)

      • mastercheif 42 minutes ago ago

        In my experience, web search often tanks the quality of the output.

        I don't know if it's because of context clogging or that the model can't tell what's a high quality source from garbage.

        I've defaulted to web search off and turn it on via the tools menu as needed.

        • bangaladore 35 minutes ago ago

          I feel the same. LLMs using web search ironically seem to have less thoughtful output. Part of the reason for using LLMs is to explore somewhat novel ideas. I think with web search it aligns too strongly to the results rather than the overall request making it a slow search-engine.

      • joshuacc 38 minutes ago ago

        Still relevant, as it means that a coding agent is more likely to get things right without searching. That saves time, money, and improves accuracy of results.

      • diegocg 41 minutes ago ago

        I wonder if it would even be helpful because they avoid the increasing AI content

    • archon810 24 minutes ago ago

      And GPT-5 nano and mini cutoff is even earlier - May 30 2024.

    • LeoPanthera 43 minutes ago ago

      Gemini does cursory web searches for almost every query, presumably to fill in the gap between the knowledge cutoff and now.

    • breadwinner an hour ago ago

      That could means OpenAI does not take any shortcuts when it comes to safety.

    • lurking_swe 42 minutes ago ago

      the model can do web search so this is mostly irrelevant i think.

  • minimaxir 2 hours ago ago

    The marketing copy and the current livestream appear tautological: "it's better because it's better."

    Not much explanation yet why GPT-5 warrants a major version bump. As usual, the model (and potentially OpenAI as a whole) will depend on output vibe checks.

    • pram 2 hours ago ago

      We’re at the audiophile stage of LLMs where people are talking about the improved soundstage, tonality, reduced sibilance etc

      • jaredcwhite 2 hours ago ago

        Note GPT-5's subtle mouthfeel reminiscent of cranberries with a touch of bourbon.

        • __loam an hour ago ago

          Every bourbon tastes the same unless it's Weller, King's County Peated, or Pappy (or Jim Beam for the wrong reasons lol)

          • alephnerd an hour ago ago

            Tbh, a mid-shelf Four Roses gets you 90% of the way to a upper shelf Weller.

            • __loam an hour ago ago

              I'm being hyperbolic but yeah four roses is probably the best deal next to Buffalo trace. All their stuff is fairly priced. If you want something like Weller though, you should get another wheated bourbon like Maker's Mark French oaked.

              • alephnerd an hour ago ago

                Buffalo trace is ridiculously overpriced nowadays. Good bourbon, but def not worth $35-40 for 750ml.

                > you should get another wheated bourbon like Maker's Mark French oaked

                I agree. I've found Maker Mark products to be a great bang for your buck quality wise and flavor wise as well.

                • __loam an hour ago ago

                  If you can find Buffalo Trace for msrp which is $20-30, it's a good deal. I think the bourbon "market" kind of popped recently so finding things has been getting a little easier.

                  • alephnerd an hour ago ago

                    Yep! I agree! At MSRP BT is a great buy.

                    > I think the bourbon "market" kind of popped recently

                    It def did. The overproduction that was invested in during the peak of the COVID collector boom is coming into markets now. I think we'll see some well priced age stated products in the next 3-4 years based on by acquaintances in the space.

                    Ofc, the elephant in the room is consolidation - everyone wants to copy the LVMH model (and they say Europeans are ethical elves who never use underhanded mopolistic and market making behavior to corner markets /s).

        • alephnerd an hour ago ago

          Explains why I find AGI fundamentalists similar to tater heads. /s

          (Not to undermine progress in the foundational model space, but there is a lack of appreciation for the democratization of domain specific models amongst HNers).

      • javchz 2 hours ago ago

        I can already see LLMs Sommeliers: Yes, the mouthfeel and punch of GPT-5 it's comparable to the one of Grok 4, but it's tenderness lacks the crunch from Gemini 2.5 Pro.

        • 0x7cfe an hour ago ago

          Isn't it exactly what the typical LLM discourse is about? People are just throwing anecdotes and stay with their opinion. A is better than B because C, and that's basically it. And whoever tries to actually bench them gets called out because all benches are gamed. Go figure.

      • tuesdaynight 39 minutes ago ago

        You need to burn-in your LLM by using for 100 hours before you see the true performance of it.

      • satyrun an hour ago ago

        Come on, we aren't even close to the level of audiophile nonsense like worrying about what cable sounds better.

        • leptons 43 minutes ago ago

          We're still at the stage of which LLM lies the least (but they all do). So yeah, no different than audiophiles really.

      • catigula an hour ago ago

        Informed audiophiles rely on Klippel output now

        • bobson381 an hour ago ago

          The empirical ones do! There's still a healthy sports car element to the scene though, at least in my experience.

          • catigula an hour ago ago

            You're right, it's hard to admit you can buy a $50 speaker and sub and EQ it to 95% maximum performance.

            • riknos314 an hour ago ago

              This is and isn't true.

              The room is the limiting factor in most speaker setups. The worse the room, the sooner you hit diminishing returns for upgrading any other part of the system.

              In a fantastic room a $50 speaker will be nowhere near 95% of the performance of a mastering monitor, no matter how much EQ you put on it. In the average living room with less than ideal speaker and listening position placement there will still be a difference, but it will be much less apparent due to the limitations of the listening environment.

              • jpc0 an hour ago ago

                Absolutely not true.

                You might lose headroom or have to live with higher latency but if your complaint is about actual empirical data like frequency response or phase, that can be corrected digitally.

                • babypuncher 14 minutes ago ago

                  You can only EQ speakers and headphones as far as the transducer can still respond accurately to the signal you're sending it. No amount of EQ will give the Sennheiser HD-600's good sub-bass performance because the driver begins to distort the signal long before you've amplified it enough to match the Harman target at a normal listening level.

                  DSP is a very powerful tool that can make terrible speakers and headphones sound great, but it's not magic.

              • catigula an hour ago ago

                Ah, the aforementioned snake oil.

      • virgil_disgr4ce an hour ago ago

        Well, reduced sibilance is an ordinary and desirable thing. A better "audiophile absurdity" example would be $77,000 cables, freezing CDs to improve sound quality, using hospital-grade outlets, cryogenically frozen outlets (lol), the list goes on and on

      • Q6T46nT668w6i3m an hour ago ago

        It’s always been this way with LLMs.

    • krat0sprakhar 2 hours ago ago

      > Not much explanation yet why GPT-5 warrants a major version bump

      Exactly. Too many videos - too little real data / benchmarks on the page. Will wait for vibe check from simonw and others

      • collinmanderson an hour ago ago

        > Will wait for vibe check from simonw

        https://openai.com/gpt-5/?video=1108156668

        2:40 "I do like how the pelican's feet are on the pedals." "That's a rare detail that most of the other models I've tried this on have missed."

        4:12 "The bicycle was flawless."

        5:30 Re generating documentation: "It nailed it. It gave me the exact information I needed. It gave me full architectural overview. It was clearly very good at consuming a quarter million tokens of rust." "My trust issues are beginning to fall away"

        Edit: ohh he has blog post now: https://news.ycombinator.com/item?id=44828264

        • dimitri-vs 40 minutes ago ago

          This effectively kills this benchmark.

          • tuesdaynight 28 minutes ago ago

            Honestly, I have mixed feelings about him appearing there. His blog posts are a nice way to be updated about what's going on, and he deserves the recognition, but he's now part of their marketing content. I hope that doesn't make him afraid of speaking his mind when talking about OpenAI's models. I still trust his opinions, though.

    • WD-42 2 hours ago ago

      It has the last ~6 months worth of flavor of the month Javascript libraries in it's training set now, so it's "better at coding".

      How is this sustainable.

      • sethops1 an hour ago ago

        Who said anything about sustainable? The only goal here is to hobble to the next VC round. And then the next, and the next, ...

      • jcgrillo an hour ago ago

        Vast quantities of extremely dumb money

    • some-guy an hour ago ago

      As someone who tries to push the limits of hard coding tasks (mainly refactoring old codebases) to LLMs with not much improvement since the last round of models, I'm finding that we are hitting the reduction of rate of improvement on the S-curve of quality. Obviously getting the same quality cheaper would be huge, but the quality of the output day to day isn't noticeable to me.

      • camdenreslink 34 minutes ago ago

        I find it struggles to even refactor codebases that aren't that large. If you have a somewhat complicated change that spans the full stack, and has some sort of wrinkle that makes it slightly more complicated than adding a data field, then even the most modern LLMs seem to trip on themselves. Even when I tell it to create a plan for implementation and write it to a markdown file and then step through those steps in a separate prompt.

        Not that it makes it useless, just that we seem to not "be there" yet for the standard tasks software engineers do every day.

    • scosman 2 hours ago ago

      There's a bunch of benchmarks on the intro page including AIME 2025 without tools, SWE-bench Verified, Aider Polyglot, MMMU, and HealthBench Hard (not familiar with this one): https://openai.com/index/introducing-gpt-5/

      Pretty par for course evals at launch setup.

    • nicetryguy an hour ago ago

      Yeah. We're entered the Smartphone stage: "You want the new one because it's the new one."

    • anthonypasq 2 hours ago ago

      its >o3 performance at gpt4 price. seems pretty obvious

      • thegeomaster an hour ago ago

        o3 pricing: $8/Mtok out

        GPT-5 pricing: $10/Mtok out

        What am I missing?

        • throwaway0123_5 24 minutes ago ago

          It seems like you might need less output tokens for the same quality of response though. One of their plots shows o3 needing ~14k tokens to get 69% on SWE-bench Verified, but GPT-5 needing only ~4k.

        • anthonypasq an hour ago ago

          pretty sure reduced cache input pricing is a pretty big deal for reasoning models, but im not positive

  • kybernetikos 2 hours ago ago

    ChatGPT5 in this demo:

    > For an airplane wing (airfoil), the top surface is curved and the bottom is flatter. When the wing moves forward:

    > * Air over the top has to travel farther in the same amount of time -> it moves faster -> pressure on the top decreases.

    > * Air underneath moves slower -> pressure underneath is higher

    > * The presure difference creates an upward force - lift

    Isn't that explanation of why wings work completely wrong? There's nothing that forces the air to cover the top distance in the same time that it covers the bottom distance, and in fact it doesn't. https://www.cam.ac.uk/research/news/how-wings-really-work

    Very strange to use a mistake as your first demo, especially while talking about how it's phd level.

    • peterdsharpe an hour ago ago

      Yes, it is completely wrong. If this were a valid explanation, flat-plate airfoils could not generate lift. (They can.)

      Source: PhD on aircraft design

      • timr an hour ago ago

        Except it isn't "completely wrong". The article the OP links to says it explicitly:

        > “What actually causes lift is introducing a shape into the airflow, which curves the streamlines and introduces pressure changes – lower pressure on the upper surface and higher pressure on the lower surface,” clarified Babinsky, from the Department of Engineering. “This is why a flat surface like a sail is able to cause lift – here the distance on each side is the same but it is slightly curved when it is rigged and so it acts as an aerofoil. In other words, it’s the curvature that creates lift, not the distance.”

        The meta-point that "it's the curvature that creates the lift, not the distance" is incredibly subtle for a lay audience. So it may be completely wrong for you, but not for 99.9% of the population. The pressure differential is important, and the curvature does create lift, although not via speed differential.

        I am far from an AI hypebeast, but this subthread feels like people reaching for a criticism.

        • ttoinou an hour ago ago

          I would say a wing with two sides of different length is more difficult to understand than one shape with two sides of opposites curvatures but same length

        • avs733 35 minutes ago ago

          the wrongness isn't germane to most people but it is a specific typology of how LLMs get technica lthings wrong that is critically important to progressing them. It gets subtle things wrongby being biased towards lay understandings that introduce vagueness because greater precision isn't useful.

          That doesn't matter for lay audieces and doesn't really matter at all until we try and use them for technical things.

          • kybernetikos 21 minutes ago ago

            The wrongness is germane to someone who is doing their physics homework (the example given here). It's actually difficult for me to imagine a situation where someone would ask ChatGPT 5 for information about this and it not be germane if ChatGPT 5 gave an incorrect explanation.

          • timr 26 minutes ago ago

            I grant your broader point, but extrapolating from this marketing copy is not a great example.

            The real question is, if you go back to the bot following this conversation and you challenge it, does it generate the more correct answer?

        • carabiner an hour ago ago

          It's the "same amount of time" part that is blatantly wrong. Yes geometry has an effect but there is zero reason to believe leading edge particles, at the same time point, must rejoin at the trailing edge of a wing. This is a misconception at the level of "heavier objects fall faster." It is non-physical.

          The video in the Cambridge link shows how the upper surface particles greatly overtake the lower surface flow. They do not rejoin, ever.

          • timr 42 minutes ago ago

            Again, you're not wrong, it's just irrelevant for most audiences. The very fact that you have to say this:

            > Yes geometry has an effect but there is zero reason to believe leading edge particles, at the same time point, must rejoin at the trailing edge of a wing.

            ...implicitly concedes that point that this is subtle. If you gave this answer in a PhD qualification exam in Physics, then sure, I think it's fair for someone to say you're wrong. If you gave the answer on a marketing page for a general-purpose chatbot? Meh.

            (As an aside, this conversation is interesting to me primarily because it's a perfect example of how scientists go wrong in presenting their work to the world...meeting up with AI criticism on the other side.)

            • adgjlsfhk1 36 minutes ago ago

              right, the other is that if you remove every incorrect statement from the AI "explanation", the answer it would have given is "airplane wings generate lift because they are shaped to generate lift".

              • timr 31 minutes ago ago

                > right, the other is that if you remove every incorrect statement from the AI "explanation", the answer it would have given is "airplane wings generate lift because they are shaped to generate lift".

                ...only if you omit the parts where it talks about pressure differentials, caused by airspeed differences, create lift?

                Both of these points are true. You have to be motivated to ignore them.

                https://www.youtube.com/watch?v=UqBmdZ-BNig

      • nilsherzig an hour ago ago

        Looks like OpenAI delivered on the PhD response

        • antisthenes an hour ago ago

          GPT-6 will just go on forums and pretend to be a girl that needs help with homework.

          • snerbles an hour ago ago

            Fallback is posting a confidently wrong answer on another forum to bait for angry correct answers.

      • ge96 an hour ago ago

        What is the actual answer? I know the "skipping stone" idea is wrong too, thinking it's just angle of attack

        • bilsbie an hour ago ago

          Angle of attack is a big part but I think the other thing going on is air “sticks” to the surface of the top of the wing and gets directed downward as it comes off the wing. It also creates a gap as the wing curves down leaving behind lower pressure from that.

        • base698 an hour ago ago

          Weight of the air deflecting downward. Plain ole Newtonian equal and opposite reaction.

          • rtkwe 6 minutes ago ago

            It's both lower pressure above the wing (~20% of lift) and the reaction force from pushing air down (give or take the remaining 80% of lift). The main wrong thing is that the air travels faster because it has to travel farther causing the air to accelerate causing the lower pressure that's double plus wrong. It's a weird old misunderstanding that gets repeated over and over because it's a neat connection to attach to the Bernoulli Principal when it's being explained to children.

          • datadrivenangel an hour ago ago

            But also pressure providing force. It's complicated.

        • qq66 an hour ago ago

          Air pushes on the wing. The control surfaces determine in which direction.

        • dist-epoch an hour ago ago
      • WithinReason an hour ago ago

        And flying upside down would be impossible

      • Tadpole9181 an hour ago ago

        Sorry, I know nothing about this topic, but this is how it was explained to me every time it's come up throughout my life. Could you explain a bit more?

        I've always been under the impression that flat-plate airfoils can't generate lift without a positive angle-of-attack - where lift is generated through the separate mechanism of the air pushing against an angled plane? But a modern airfoil can, because of this effect.

        And that if you flip them upside down, a flat plate is more efficient and requires less angle-of-attack than the standard airfoil shape because now the lift advantage is working to generate a downforce.

        I just tried to search Google, but I'm finding all sorts of conflicting answers, with only a vague consensus that the AI-provided answer above is, in fact, correct. The shape of the wing causes pressure differences that generate lift in conjunction with multiple other effects that also generate lift by pushing or redirecting air downward.

        • stonemetal12 41 minutes ago ago

          >Air over the top has to travel farther in the same amount of time

          There is no requirement for air to travel any where. Let alone in any amount of time. So this part of the AI's response is completely wrong. "Same amount of time" as what? Air going underneath the wing? With an angle of attack the air under the wing is being deflected down, not magically meeting up with the air above the wing.

        • andoando an hour ago ago

          Im quite sure the "air on the top has to travel faster to meet the air at the bottom " is false. Why would they have to meet at the same time? What would cause air on the top to accelerate?

          • FeepingCreature 37 minutes ago ago

            (Layman guess) Pressure? The incoming split air has to go somewhere. The volume of air inflowing above and below is roughly the same.

      • zombiwoof an hour ago ago

        But we live in the world of Trump where facts don’t matter. If GPt 5 says this is how it works, that’s how it works and Fox News will back it up

    • ricardobayes 31 minutes ago ago

      To me, it's weird to call it "PhD-level". That, to me, means to be able to take in existing information on a certain very niche area and able to "push the boundary". I might be wrong but to date I've never seen any LLM invent "new science", that makes PhD, really PhD. It also seems very confusing to me that many sources mention "stone age" and "PhD-level" in the same article. Which one is it?

      People seem to overcomplicate what LLM's are capable of, but at their core they are just really good word parsers.

    • tshaddox an hour ago ago

      It's an extremely famous example of a widespread misconception. I don't know anything about aeronautical engineering but I'm quite familiar with the "equal transit time fallacy."

      • xboxnolifes an hour ago ago

        Yeah, it's what I was taught in high school.

    • 7734128 an hour ago ago

      Extremely common misconception. NASA even has a website about how it's incorrect

      https://www.grc.nasa.gov/www/k-12/VirtualAero/BottleRocket/a...

    • adioe3 an hour ago ago

      Nobody explains it as well as Bartosz: https://ciechanow.ski/airfoil/

    • tths an hour ago ago

      Yeah, the explanation is just shallow enough to seem correct and deceive someone who doesn't grasp really well the subject. No clue how they let it pass, that without mentioning the subpar diagram it created, really didn't seem like something miles better than what previous models can do already.

      • Vegenoid an hour ago ago

        > No clue how they let it pass

        It’s very common to see AI evangelists taking its output at face value, particularly when it’s about something that they are not an expert in. I thought we’d start seeing less of this as people get burned by it, but it seems that we’re actually just seeing more of it as LLMs get better at sounding correct. Their ability to sound correct continues to increase faster than their ability to be correct.

        • chasd00 an hour ago ago

          This is just like the early days of Google search results, "It's on the Internet, it must be true".

      • traceroute66 an hour ago ago

        Hilarious how the team spent so much time promising GPT5 had fewer hallucinations and deceptions.

        Meanwhile the demo seems to suggest business as usual for AI hallucinations and deceptions.

      • theappsecguy an hour ago ago

        This is the headline for all LLM output past "hello world"

      • stanmancan an hour ago ago

        > Yeah, the explanation is just shallow enough to seem correct and deceive someone who doesn't grasp really well the subject.

        This is the problem with AI in general.

        When I ask it about things I already understand, it’s clearly wrong quite often.

        When I ask it about something I don’t understand, I have no way to know if its response is right or wrong.

    • 2StepsOutOfLine an hour ago ago

      During the demo they quickly shuffled off of, the air flow lines completely broke. It was just a few dots moving left to right, changing the angle of the surface showed no visual difference in airflow.

    • ethan_smith 10 minutes ago ago

      You're right - this is the "equal transit time" fallacy; lift is primarily generated by the wing deflecting air downward (Newton's Third Law) and the pressure distribution resulting from airflow curvature around the wing.

    • arcumaereum 2 hours ago ago

      Yeah I'm surprised they used that example. The correct (and PhD-level) response would have been to refuse or redirect to a better explanation

    • low_tech_punk an hour ago ago

      That was debunked by Veritasium 13 years ago: https://www.youtube.com/watch?v=aFO4PBolwFg

    • addaon an hour ago ago

      > Isn't that explanation of why wings work completely wrong?

      This is an LLM. "Wrong" is not a concept that applies, as it requires understanding. The explanation is quite /probable/, as evidenced by the fact that they thought to use it as an example…

    • tim333 an hour ago ago

      From Wikipedia

      >In fact, theory predicts – and experiments confirm – that the air traverses the top surface of a body experiencing lift in a shorter time than it traverses the bottom surface; the explanation based on equal transit time is false.

      So the effect is greater than equal time transit.

      I've seen the GPT5 explanation in GCSE level textbooks but I thought it was supposed to be PhD level;)

    • mcs5280 an hour ago ago

      Sam will fix this in the next release he just needs you to give him more money

      • rtkwe 3 minutes ago ago

        It's going to be really hard to root out it's all over the place because it's so commonly mentioned when teaching the Bernoulli Principal to kids.

    • samfriedman an hour ago ago

      The "demo" it made was pretty horrible too. I would have been impressed if it had simulated a NACA 4412 or something.

    • timr an hour ago ago

      Your link literally says pressure differential is the reason, and that curvature matters:

      > “What actually causes lift is introducing a shape into the airflow, which curves the streamlines and introduces pressure changes – lower pressure on the upper surface and higher pressure on the lower surface,” clarified Babinsky, from the Department of Engineering. “This is why a flat surface like a sail is able to cause lift – here the distance on each side is the same but it is slightly curved when it is rigged and so it acts as an aerofoil. In other words, it’s the curvature that creates lift, not the distance.”

      So I'd characterize this answer as "correct, but incomplete" or "correct, but simplified". It's a case where a PhD in fluid dynamics might state the explanation one way to an expert audience, but another way to a room full of children.

      • kybernetikos an hour ago ago

        Pressure differential is absolutely one of the main components of lift (although I believe conservation of momentum is another - the coanda effect changes the direction of the airflows and there's 2nd law stuff happening on the bottom edge too), but the idea that the pressure differential is caused by the fact that "air over the top has to travel farther in the same amount of time" because the airfoil is curved is completely incorrect, as the video in my link shows.

        • timr an hour ago ago

          It's "completely incorrect" only if you're being pedantic. It's "partially correct" if you're talking casually to a group of regular people. It's "good enough" if you're talking to a classroom of children. Audience matters.

          The hilarious thing about this subthread is that it's already getting filled with hyper-technical but wrong alternative explanations by people eager to show that they know more than the robot.

          • kybernetikos an hour ago ago

            "air over the top has to travel farther in the same amount of time" is just wrong, it doesn't have to, and in fact it doesn't.

            It's called the "equal transit-time fallacy" if you want to look it up, or follow the link I provided in my comment, or perhaps the NASA link someone else offered.

            • timr an hour ago ago

              I'm not saying that particular point is wrong. I'm saying that for most people, it doesn't matter, and the reason the "fallacy" persists is because it's a good enough explanation for the layman that is easy to conceptualize.

              Pretty much any scientific question is fractal like this: there's a superficial explanation, then one below that, and so on. None are "completely incorrect", but the more detailed ones are better.

              The real question is: if you prompt the bot for the better, deeper explanation, what does it do?

              • kybernetikos 32 minutes ago ago

                So I worry that you think that the equal transit time thing is true, but is just one effect among others. This is not the case. There are a number of different effects, including bernoulli and coanda and newtons third law that all contribute to lift, but none of the things that actually happen have anything to do with equal transit time.

                The equal transit time is not a partially correct explanation, it's something that doesn't happen. It's not a superficial explanation, it's a wrong explanation. It's not even a good lie-to-children, as it doesn't help predict or understand any part of the system at any level. It instead teaches magical thinking.

                As to whether it matters? If I am told that I can ask my question to a system and it will respond like a team of PhDs, that it is useful to help someone with their homework and physical understanding, but it gives me instead information that is incorrect and misleading, I would say the system is not working as it is intended to.

                Even if I accept that "audience matters" as you say, the suggested audience is helping someone with their physics homework. This would not be a suitable explanation for someone doing physics homework.

                • timr 18 minutes ago ago

                  > So I worry that you think that the equal transit time thing is true,

                  Wow. Thanks for your worry, but it's not a problem. I do understand the difference, and yet it doesn't have anything to do with the argument I'm making, which is about presentation.

                  > It's not even a good lie-to-children, as it doesn't help predict or understand any part of the system at any level.

                  ...which is irrelevant in the context. I get the meta-point that you're (sort of) making that you can't shut your brain off and just hope the bot spits out 100% pedantic explanations of scientific phenomenon. That's true, but also...fine?

                  These things are spitting out probable text. If (as many have observed) this is a common enough explanation to be in textbooks, then I'm not particularly surprised if an LLM emits it as well. The real question is: what happens when you prompt it to go deeper?

          • bccdee an hour ago ago

            No, it's never good enough, because it's flat-out wrong. This statement:

            > Air over the top has to travel farther in the same amount of time

            is not true. The air on top does not travel farther in the same amount of time. The air slows down and travels a shorter distance in the same amount of time.

            It's only "good enough for a classroom of children" in the same way that storks delivering babies is—i.e., if you're content to simply lie rather than bothering to tell the truth.

    • ttoinou an hour ago ago

      Its not fully wrong but its a typical example of how simplified scientific explanations have spread everywhere without personal verification of each person involved in the chinese whisper

    • IanCal an hour ago ago

      As a complete aside I’ve always hated that explanation where air moves up and over a bump, the lines get closer together and then the explanation is the pressure lowers at that point. Also the idea that the lines of air look the same before and after and yet somehow the wing should have moved up.

    • karel-3d an hour ago ago

      yeah that's a great thing to use as LLM demo because it sounds plausible yet it's completely misleading and wrong.

    • croes an hour ago ago

      It’s a common misconception, I doubt they know themselves and GPT 5 doesn’t tell them otherwise because it’s the mist common in explanation in the training data.

      A quite good example of AI limits

    • AnimalMuppet an hour ago ago

      Yes. But I strongly suspect that it's the most frequent answer in the training data...

      • bambax an hour ago ago

        They couldn't find a more apt demnonstration of what an LLM is and does if they tried.

        An LLM doesn't know more than what's in the training data.

        In Michael Crichton's The Great Train Robbery (published in 1975, about events that happened in 1855) the perpetrator, having been caught, explains to a baffled court that he was able to walk on top of a running train "because of the Bernoulli effect", that he misspells and completely misunderstands. I don't remember if this argument helps him get away with the crime? Maybe it does, I'm not sure.

        This is another attempt at a Great Robbery.

      • NelsonMinar an hour ago ago

        IIRC I was required to regurgitate this wrong answer to pass my FAA pilot exam.

        • carabiner 29 minutes ago ago

          Yeah, it's like asking a car driver (even a professional driver) to explain the Otto cycle. Enduser vs. engineer.

        • CPLX an hour ago ago

          Yeah me too, so it's found in many authoritative places.

          And I might be wrong but my understanding is that it's not wrong per-se, it's just wildly incomplete. Which, is kind of like the same as wrong. But I believe the airfoil design does indeed have the effect described which does contribute to lift somewhat right? Or am I just a victim of the misconception.

      • gekoxyz an hour ago ago

        And your suspicion is right. The sad reality is that it's just a stochastic parrot, that can produce really good answers in certain occasions.

    • dbagr an hour ago ago
    • 18172828286177 an hour ago ago

      The hallmark of an LLM response: plausible sounding, but if you dig deeper, incorrect

    • avs733 37 minutes ago ago

      Its a particular type of mistake that is really interesting and telling. It is a misconception - and a common socially disseminated simplifcation. In students, these don't come from a lack of knowledge but rather from places where knowledge is structured incorrectly. Often because the phenomenon are difficult to observe or mislead when observed. Another example is heat and temperature. Heat is not temperature, but it is easy to observe them always being the same in your day to day life and so you bring that belief into a college thermodynamics course where you are learning that heat and temperature are different for the first time. It is a commonsense observation of the world that is only incorrect in technical circles

      These are places where common lay discussions use language in ways that is wrong, or makes simplifcations that are reasonable but technically incorrect. They are especially common when something is so 'obvious' that experts don't explain it, the most frequent version of the concepts being explained

      These, in my testing, show up a lot in LLMs - technical things are wrong when the most language of the most common explanations simplifies or obfuscates the precise truth. Often, it pretty much matches the level of knowledge of a college freshman/sophmore or slightly below, which is sort of the level of discussion of more technical topics on the internet.

    • on_the_train an hour ago ago

      It's a misconception that almost everyone does though. I recently even saw it being being taught in a zeppelin museum!

      • sejje 10 minutes ago ago

        LLMs are "ask the audience"

        Common misconceptions should be expected when you train a model to act like the average of all humans.

      • xeromal an hour ago ago

        Why replace humans if make human mistakes

        • metalliqaz an hour ago ago

          less overhead on benefits and pay raises

    • carabiner an hour ago ago

      Holy shit that is wrong. That's what happens when you get software, ML engineers who think they know everything.

    • Q6T46nT668w6i3m an hour ago ago

      Yeah, that’s slop.

    • antoni4040 an hour ago ago

      Oh my God, they were right, ChatGPT5 really is like talking to a bunch of PhD. You let it write an answer and THEN check the comments on Hacker News. Truly innovative.

  • doctoboggan 2 hours ago ago

    Watching the livestream now, the improvement over their current models on the benchmarks is very small. I know they seemed to be trying to temper our expectations leading up to this, but this is much less improvement than I was expecting

    • 827a an hour ago ago

      I have a suspicion that while the major AI companies have been pretty samey and competing in the same space for a while now, the market is going to force them to differentiate a bit, and we're going to see OpenAI begin to lose the race toward extremely high levels of intelligence instead choosing to focus on justifying their valuations by optimizing cost and for conversational/normal intelligence/personal assistant use-cases. After all, most of their users just want to use it to cheat at school, get relationship advice, and write business emails. They also have Ive's company to continue investing in.

      Meanwhile, Anthropic & Google have more room in their P/S ratios to continue to spend effort on logarithmic intelligence gains.

      Doesn't mean we won't see more and more intelligent models out of OpenAI, especially in the o-series, but at some point you have to make payroll and reality hits.

      • juped an hour ago ago

        I think this is pretty much what we've already seen happening, in fact.

    • hodgehog11 16 minutes ago ago

      The hallucination benchmarks did show major improvement. We know existing benchmarks are nearly useless at this point. It's reliability that matters more.

    • z7 2 hours ago ago

      GPT-5 is #1 on WebDev Arena with +75 pts over Gemini 2.5 Pro and +100 pts over Claude Opus 4:

      https://lmarena.ai/leaderboard

      • virgildotcodes 2 hours ago ago

        This same leaderboard lists a bunch of models, including 4o, beating out Opus 4, which seems off.

    • anyg an hour ago ago

      Also, the code demos are all using GPT-5 MAX on Cursor. Most of us will not be able to use it like that all the time. They should have showed it without MAX mode as well

    • Workaccount2 an hour ago ago

      Sam said maybe two years ago that they want to avoid "mic drop" releases, and instead want to stick to incremental steps.

      This is day one, so there is probably another 10-20% in optimizations that can be squeezed out of it in the coming months.

      • bigmadshoe an hour ago ago

        Then why increment the version number here? This is clearly styled like a "mic drop" release but without the numbers to back it up. It's a really bad look when comparing the crazy jump from GPT3 to GPT4 to this slight improvement with GPT5.

        • brokencode 9 minutes ago ago

          The fact that it unifies the regular model and the reasoning model is a big change. I’m sure internally it’s a big change, but also in terms of user experience.

          I feel it’s worthy of a major increment, even if benchmarks aren’t significantly improved.

        • camdenreslink 31 minutes ago ago

          GPT-5 was highly anticipated and people have thought it would be a step change in performance for a while. I think at some point they had to just do it and rip the bandaid off, so they could move past 5.

        • dpoloncsak an hour ago ago

          Honestly, I think the big thing is the sycophancy. It's starting to reach the mainstream that ChatGPT can cause people to 'go crazy'.

          This gives them an out. "That was the old model, look how much better this one tests on our sycophancy test we just made up!!"

        • Workaccount2 an hour ago ago

          Because it is a 100x training compute model over 4.

          GPT5.5 will be a 10X compute jump.

          4.5 was 10x over 4.

          • bigmadshoe an hour ago ago

            Even worse optics. They scaled the training compute by 100x and got <1% improvement on several benchmarks.

            • reasonableklout 30 minutes ago ago

              Is 1% relative to more recent models like o3, or the (old and obsolete at this point) GPT-4?

      • yahoozoo 32 minutes ago ago

        He said that because even then he saw the writing on the wall that LLMs will plateau.

    • og_kalu 2 hours ago ago

      I mean that's just the consequence of releasing a new model every couple months. If Open AI stayed mostly silent since the GPT-4 release (like they did for most iterations) and only now released 5 then nobody would be complaining about weak gains in benchmarks.

      • moduspol 2 hours ago ago

        Well it was their choice to call it GPT 5 and not GPT 4.2.

        • og_kalu an hour ago ago

          It is significantly better than 4, so calling it 4.2 would be rather silly.

          • amilios an hour ago ago

            Is it? That's not super obvious from the results they're showing.

            • og_kalu an hour ago ago

              Yes it is, if we're talking about the original GPT-4 release or even GPT-4o. What about the results they've shown is not obvious?

      • jononor an hour ago ago

        If everyone else had stayed silent as well, then I would agree. But as it is right now they are juuust about managing to match the current pace of the other contenders. Which actually is fine, but they have previously set quite high expectations. So some will probably be disappointed at this.

    • wahnfrieden 2 hours ago ago

      It is at least much cheaper and seems faster.

      They also announced gpt-5-pro but I haven't seen benchmarks on that yet.

      • doctoboggan 2 hours ago ago

        I am hoping there is a "One more thing" that shows the pro version with great benchmark scores

    • lawlessone 2 hours ago ago

      im sure i am repeating someone else but sounds like we're coming over the s-curve

      • Bluestein 2 hours ago ago

        My thought exactly.-

        Diminished returns.-

        ... here's hoping it leads to progress.-

  • Topfi an hour ago ago

    > 400,000 context window

    > 128,000 max output tokens

    > Input $1.25

    > Output $10.00

    Source: https://platform.openai.com/docs/models/gpt-5

    If this performs well in independent needle-in-haystack and adherence evaluations, this pricing with this context window alone would make GPT-5 extremely competitive with Gemini 2.5 Pro and Claude Opus 4.1, even if the output isn't a significant improvement over o3. If the output quality ends up on-par or better than the two major competitors, that'd be truly a massive leap forward for OpenAI, mini and nano maybe even more so.

    • hrpnk an hour ago ago

      Interesting that gpt-5 has Oct 01, 2024 as knowledge cut-off while gpt-5-mini/nano it's May 31, 2024.

      gpt-4.1 family had 1M/32k input/output tokens. Pricing-wise, it's 37% cheaper input tokens, but 25% more expensive on output tokens. Only nano is 50% cheaper on input and unchanged on output.

    • iammrpayments 27 minutes ago ago

      You also have to count the cost of having to verify your identity to use the API

      • jjani 24 minutes ago ago

        It's only a video face scan and your legal ID to SamA, what could possibly go wrong

  • jumploops 2 hours ago ago

    Pricing seems good, but the open question is still on tool calling reliability.

    Input: $1.25 / 1M tokens Cached: $0.125 / 1M tokens Output: $10 / 1M tokens

    With 74.9% on SWE-bench, this inches out Claude Opus 4.1 at 74.5%, but at a much cheaper cost.

    For context, Claude Opus 4.1 is $15 / 1M input tokens and $75 / 1M output tokens.

    > "GPT-5 will scaffold the app, write files, install dependencies as needed, and show a live preview. This is the go-to solution for developers who want to bootstrap apps or add features quickly." [0]

    Since Claude Code launched, OpenAI has been behind. Maybe the RL on tool calling is good enough to be competitive now?

    [0]https://github.com/openai/gpt-5-coding-examples

    • bayesianbot 2 hours ago ago

      And they included Flex pricing, which is 50% cheaper if you're willing to wait for the reply during periods of high load. But great pricing for agentic use with that cached token pricing, Flex or not.

    • AtNightWeCode an hour ago ago

      I switched immediately because of pricing, input token heavy load, but it doesn't even work. For some reason they completely broke the already amateurish API.

  • mehulashah 2 hours ago ago

    ‘Twas the night before GPT-5, when all through the social-media-sphere, Not a creature was posting, not even @paulg nor @eshear

    Next morning’s posts were prepped and scheduled with care, In hopes that AGI soon would appear …

    • user3939382 an hour ago ago

      Unless someone figures how to make these models a million(?) times more efficient or feed them a million times more energy I don’t see how AGI would even be a twinkle in the eye of the LLM strategies we have now.

      • Henchman21 an hour ago ago

        Hey man don’t bring that negativity around here. You’re killing the vibe. Remember we’re now in a post-facts timeline!

        • tmountain 2 minutes ago ago

          To kill the vibe further, AGI might kill is all, so I hope it never arrives.

  • joewhale 2 hours ago ago

    Short anything that’s riding on AGI coming soon. This presentation has gotten rid of all my fears of my children growing up in a crazy winner take all AGI world.

    • croes an hour ago ago

      Don’t fear AGI, fear those who sell something as AGI and those who fall for it

    • AS04 an hour ago ago

      Don't count your chickens before they hatch. I believe that the odds of an architecture substantially better than autoregressive causal GPTs coming out of the woodwork within the next year is quite high.

      • 9rx an hour ago ago

        How does that equate to "winner take all", though? It is quite apparent that as soon as one place figures out some kind of advantage, everyone else follows suit almost immediately.

        It's not the 1800s anymore. You can hide behind poor communication.

      • SalmoShalazar 12 minutes ago ago

        Why do you believe this? Do you know researchers actively on the cusp or are you just going off vibes?

      • suddenlybananas an hour ago ago

        Why do you think that?

  • m4nu3l 6 minutes ago ago

    Very funny. The very first answer it gave to illustrate its "Expert knowledge" is quite common, and it's wrong. What's even funnier is that you can find why on Wikipedia: https://en.wikipedia.org/wiki/Lift_(force)#False_explanation... What's terminally funny is that in the visualisation app, it used a symmetric wing, which of course wouldn't generate lift according to its own explanation (as the travelled distance and hence air flow speed would be the same). I work as a game physics programmer, so I noticed that immediately and almost laughed. I watched only that part so far while I was still at the office, though.

  • ipozgaj 3 minutes ago ago

    Tech aside (covered well by other commenters), the presentation itself was incredibly dry. Such a stark difference in presenting style here compared to, for example, Apple's or Google's keynotes. They should really put more effort into it.

  • hrpnk an hour ago ago

    They will retire lots of models: GPT-4o, GPT-4.1, GPT-4.5, GPT-4.1-mini, o4-mini, o4-mini-high, o3, o3-pro.

    https://help.openai.com/en/articles/6825453-chatgpt-release-...

    "If you open a conversation that used one of these models, ChatGPT will automatically switch it to the closest GPT-5 equivalent."

    - 4o, 4.1, 4.5, 4.1-mini, o4-mini, or o4-mini-high => GPT-5 - o3 => GPT-5-Thinking - o3-Pro => GPT-5-Pro

    • atonse 17 minutes ago ago

      Smart way to probably also free up resources that are currently fragmented running those older models. They could all run the latest model and have more capacity.

    • pradn an hour ago ago

      Finally, someone from the product side got a word in. Keep it simple!

  • sundarurfriend an hour ago ago

    Some people have hypothesized that GPT-5 is actually about cost reduction and internal optimization for OpenAI, since there doesn't seem to be much of a leap forward, but another element that they seem to have focused on that'll probably make a huge difference to "normal" (non-tech) users is making precise and specifically worded prompts less necessary.

    They've mentioned improvements in that aspects a few times now, and if it actually materializes, that would be a big leap forward for most users even if underneath GPT-4 was also technically able to do the same things if prompted just the right way.

  • oof-baroomf 2 hours ago ago

    74.9 SWEBench. This increases the SOTA by a whole .4%. Although the pricing is great, it doesn't seem like OpenAI found a giant breakthrough yet like o1 or Claude 3.5 Sonnet

    • Workaccount2 an hour ago ago

      I'm pretty sure 3.5 sonnet always benchmarked poorly, despite it being the clear programming winner of it's time.

  • aliljet 2 hours ago ago

    The eval bar I want to see here is simple: over a complex objective (e.g., deploy to prod using a git workflow), how many tasks can GPT-5 stay on track with before it falls off the train. Context is king and it's the most obvious and glaring problem with current models.

    • CamelCaseName 2 hours ago ago

      This sounds like the kind of thing:

      1. I desperately want (especially from Google)

      2. Is impossible, because it will be super gamed, to the detriment of actually building flexible flows.

  • swyx 2 hours ago ago
  • ipnon 2 hours ago ago

    Does this mean AGI is cancelled? 2027 hard takeoff was just sci-fi?

    • usaar333 2 hours ago ago

      At this point the prediction for SWE bench (85% by end of this month) is not materializing. We're actually quite far away.

    • Keyframe 2 hours ago ago

      Always has been.

    • growthwtf an hour ago ago

      Good thing they didn't nuke the data centers after all!

    • machiaweliczny 2 hours ago ago

      When to short NVIDIA? I guess when chinese get their cards production

      • mvieira38 26 minutes ago ago

        It's good for NVDA if the AI companies can't squeeze more performance out of the same compute, which is the case if GPT-5 underperforms

      • ath3nd an hour ago ago

        Short?

        It's a perfect situation for Nvidia. You can see that after months of trying to squeeze out all % of marginal improvements, sama and co decided to brand this GPT-4.0.0.1 version as GPT-5. This is all happening on NVDA hardware, and they are gonna continue desperately iterating on tiny model efficiencies until all these valuation $$$ sweet sweet VC cash run out (most of it directly or indirectly going to NVDA).

        • cedws an hour ago ago

          I'd rather they just call it GPT-5 than GPT 4.1o-Pro-Max like their current nightmare naming convention. I lost track of what the 'best' model is.

          • ath3nd an hour ago ago

            They are all..kinda the same?

  • spruce_tips 2 hours ago ago

    These presenters all give off such a “sterile” vibe

    • motoxpro 2 hours ago ago

      They are researchers, not professional presenters. I promise you if I told you to do a live demo, on stage, for 20 minutes, going back and forth between scripted and unscripted content, to an audience of at least 50 million people, that unless you do this a lot, you would do the same or worse. I know this because this is what I do for a living. I have seen 1000s of "normal" people be extremely awkward on stage. Much more so than this.

      It's super unfortunate that, becasue we live in the social media/youtube era, that everyone is expected to be this perfect person on camera, because why wouldn't they be? That's all they see.

      I am glad that they use normal people who act like themselves rather than them hiring actors or taking researchers away from what they love to do and tell them they need to become professional in-front-of-camera people because "we have the gpt-5 launch" That would be a nightmare.

      It's a group of scientists sharings their work with the world, but people just want "better marketing" :\

      • retsibsi an hour ago ago

        I think they're copping this criticism because it's neither one thing nor the other. If it was really just a group of scientists being themselves, some of us would appreciate that. And if it was inauthentic but performed by great actors, most people wouldn't notice or care about the fakeness. This is somewhere in the middle, so it feels very unnatural and a bit weird.

        • motoxpro an hour ago ago

          You're describing low skilled presenters. That is what it looks like when you put someone up in front of a camera and tell them to communicate a lot of information. You're not thinking about "being yourself," you're thinking about how to not forget your lines, not mess up, not think about the different outcomes of the prompt that you might have to deal with, etc.

          This was my point. "Being yourself" on camera is hard. This comes across, apparently shockingly, as being devoid of emotion and/or robotic

          • retsibsi an hour ago ago

            Yeah, but I disagree with you a bit. If it were less heavily scripted, it may or may not be going well, but it would feel very different from this and would not be copping the same criticisms. Or if they unashamedly leant into the scriptedness and didn't try to simulate normal human interaction, they would be criticised for being "wooden" or whatever, but it wouldn't have this slightly creepy vibe.

            • motoxpro an hour ago ago

              I get you.

              I think for me, just knowing what is probably on the teleprompter, and what is not, I am willing to bet a lot of the "wooden" vibe you are getting is actually NOT scripted.

              There is no way for people to remember that 20 minutes of dialog, so when they are not looking at the camera, that is unscripted, and viceversa.

        • taytus an hour ago ago

          Extremely robotic.

          • twixfel 26 minutes ago ago

            Jakub Pachocki at the end is probably one of the worst public speakers I've ever seen. It's fine, it's not his mother tongue, and public speaking is hard. Why make him do it then?

      • wasabi991011 an hour ago ago

        You are acting like there aren't hundreds of well-preserved talks given at programming conferences every year, or that being a good presenter is not a requirement in academic research.

        Also, whether OpenAI is a research organization is very much up for debate. They definitely have the resources to hire a good spokesperson if they wanted.

        • motoxpro an hour ago ago

          I don't know how many conferences you have been to but most talks are painfully bad. The ones that get popular are the best and by people who love speaking, hence why you are seeing them speak (selection bias at it's finest)

          They do have the resources (see WWDC), the question is if you want to take your technical staff of of their work for the amount of time it takes to develop the skill

      • drexlspivey an hour ago ago

        But why would you want to put researchers in a marketing video? It’s not like they are explaining something deep.

        • motoxpro an hour ago ago

          It's better marketing and more credible to have the researcher say "We think GPT 5 is the best model for developers, we used it extensively internally. Here let me give you an example..." than it is for Matthew McConaughey to say the same.

      • 0x7cfe an hour ago ago

        I don't know. Maybe I'm biased, but Elon and his teammates' presentations do seem natural to me. Maybe a bit goofy but always on point nevertheless.

        • motoxpro an hour ago ago

          Totally. I mean at this point Elon has 1000s of hours of practice doing interviews, pitches, presentations, conferences, etc. See Sam Altman in this context.

      • pxc 9 minutes ago ago

        [delayed]

      • seydor an hour ago ago

        researchers should need to be tortured like this. But maybe if they are paid so much, they should

      • mhh__ 2 hours ago ago

        Well yes I think part of the reason it's slightly unnerving is that this actually how they act irl. Sometimes people need a bit of edge to 'em!

        • efilife an hour ago ago

          Maybe they are just nervous with half of the world looking at them?

      • twixfel an hour ago ago

        They shouldn't be presenting if they can't present.

        "Minimal reasoning means that the reasoning will be minimal..."

    • diggan 2 hours ago ago

      Not even 10 seconds after I started watching the stream, someone said how much more human GPT-5 is, while the people sitting and talking about it don't seem human at all, and it's not an accent/language thing. Seems they're strictly following a dialogue script that is trying to make them seem "impromptu" but the acting isn't quite there for that :)

      • jazzyjackson an hour ago ago

        I use LLMs to get answers to queries but I avoid having conversations with them because I'm aware we pick up idiosyncrasies and colloquialisms from everyone we interact with. People who spend all day talking to thier GPT-voice will adjust their speaking style to be more similar to the bot.

        I developed this paranoia upon learning about The Ape and the Child where they raised a chimp alongside a baby boy and found the human adapted to chimp behavior faster than the chimp adapted to human behavior. I fear the same with bots, we'll become more like them faster than they'll become like us.

        https://www.npr.org/sections/health-shots/2017/07/25/5385804...

      • Insanity an hour ago ago

        I don’t blame them, they aren’t actors. And yes, it’s clearly not impromptu, but I am trying to not let that take away from the message they are communicating. :)

      • HaZeust 2 hours ago ago

        One woman who went through her calendar with GPT had good acting that the GPT reply helped her find impromptu information (an email she needed to answer), and someone staged GPT-5 to make a French-learning website lander - which butchered its own design in the second run; but that's all the good acting for a "candid presentation" that I could find.

        • nilsherzig an hour ago ago

          It created a webapp called „le chat“ hahah

          • HaZeust an hour ago ago

            I laughed my ass off immediately after it gave that output, until the presenter made clear that it was a flash card for learning the words, "the cat" in French - and backed it up.

      • MattSayar 2 hours ago ago

        Presenting is hard

        • AnimalMuppet an hour ago ago

          Presenting where you have to be exactly on the content with no deviation is hard. To do that without sounding like a robot is very hard.

          Presenting isn't that hard if you know your content thoroughly, and care about it. You just get up and talk about something that you care about, within a somewhat-structured outline.

          Presenting where customers and the financial press are watching and parsing every word, and any slip of the tongue can have real consequences? Yeah, um... find somebody else.

      • Bluestein 2 hours ago ago

        One heck of a Turing test itself if I've ever seen one.-

    • wavemode 2 hours ago ago

      It's because they have a script but are bad at acting.

      Would've been better to just do a traditional marketing video rather than this staged "panel" thing they're going for.

    • christina97 2 hours ago ago

      If the presenter is less human the LLM appears more human in comparison.

      • polotics 2 hours ago ago

        at least no one is going for the infamous vocal fry :-D

    • guy_ross 26 minutes ago ago

      interesting how they put this effort to making us feel physiologically well with everyone wearing blue shirts, open body language, etc. just to give off sterile robotic vibes. also noticed a dude reading off his hand at 45 minutes in, would think they brought in a few teleprompters.

    • 0x457 2 hours ago ago

      It gives me elementary school oral report. The same level of acting and script.

    • CamelCaseName 2 hours ago ago

      Hundreds of billions on the line, really can't risk anything

    • mhh__ 2 hours ago ago

      this is just the way that american middle and upper classes are going. This kind of language/vibe is the default outside of a specific type of WASP IME at least.

    • pyb 2 hours ago ago

      They look nervous, messing this presentation up could cost them their high-paying jobs.

    • swader999 an hour ago ago

      I like hearing from the people in the thick of it.

    • bo-tao an hour ago ago

      Can't they use AI to make them more human?

    • greatwhitenorth 2 hours ago ago

      Steve Jobs is meant for moments like this. He would have explained everything crystal clear. Everyone else pales in comparison. I wish he is there to explain the current state of AI.

    • 2 hours ago ago
      [deleted]
  • cuuupid 2 hours ago ago

    The silent victory here is this seems like it is being built to be faster and cheaper than o3 while presenting a reasonable jump, which is an important jump in scaling law

    On the other hand if it's just getting bigger and slower it's not a good sign for LLMs

    • smlacy an hour ago ago

      Yeah, this very much feels like "we have made a more efficient/scalable model and we're selling it as the new shiny but it's really just an internal optimization to reduce cost"

      • reasonableklout 27 minutes ago ago

        Significant cost reduction while providing the same performance seems pretty big to me?

        Not sure why a more efficient/scalable model isn't exciting

    • hirvi74 2 hours ago ago

      Personally, I am more concerned about accuracy than speed.

      • onlyrealcuzzo 2 hours ago ago

        Yeah, but OpenAI is concerned with getting on a path to making money, as their investors will eventually run out of money to light on fire, so...

  • wgjordan 2 hours ago ago

    Note it's not available to everyone yet:

    > GPT-5 Rollout

    > We are gradually rolling out GPT-5 to ensure stability during launch. Some users may not yet see GPT-5 in their account as we increase availability in stages.

    • minimaxir 24 minutes ago ago

      I am seeing it now in the Playground backend.

    • jhickok an hour ago ago

      Weird. On the homepage for GPT-5 it says "Available to everyone."

      • nobodywillobsrv 33 minutes ago ago

        This is one of these "best efforts" but also "lying a bit in marketing" is ok I guess.

        On bad days this really bothers me. It's probably not the biggest deal I guess but somehow really feels like it pushes us all over the edge a bit. Is there a post about this phenomena? It feels like some combination of bullying, gaslighting and just being left out.

    • FabHK 2 hours ago ago

      But available from today to free tier. Yay.

    • km144 2 hours ago ago

      How would I even know? I haven't seen which model of ChatGPT I'm using on the site ever since they obfuscated that information at some point.

      • umanwizard 2 hours ago ago

        Hmmm? I have a dropdown showing which model I'm using right there on chat.com

        https://i.imgur.com/X0MQNIH.png

      • noahbp an hour ago ago

        If you can't see it, you're likely on the free tier and using the latest mini model.

        • cootsnuck 36 minutes ago ago

          Not true. I've been a paid user forever and on the Android app they have definitely obscured the model selector. It's readily visible to me on desktop / desktop browser. But on the Android app the only place I can find it is if I click on an existing response already sent by chatGPT and then it gives me the option to re-generate the message with a different model.

          And while I'm griping about their Android app, it's also very annoying to me that they got rid of the ability to do multiple, subsequent speech-to-text recordings within a single drafted message. You have to one-shot anything you want to say, which would be fine if their STT didn't sometimes failed after you've talked for two minutes. Awful UX. Most annoying is that it wasn't like that originally. They changed it to this antagonistic one-shot approach a several months ago, but then quickly switched back. But then they did it again a month or so ago and have been sticking with it. I just use the Android app less now.

      • manojlds an hour ago ago

        What do you mean? It's front and center

      • thepasswordis 2 hours ago ago

        The model should appear as a drop down at the top of the page.

      • Kurtz79 2 hours ago ago

        "what model are you?"

        ChatGPT said: You're chatting with ChatGPT based on the GPT-4o architecture (also known as GPT-4 omni), released by OpenAI in May 2024.

        • pjerem 2 hours ago ago

          Actually this trick have been proven to be useless in a lot of cases.

          LLMs don’t inherently know what they are because "they" are not themselves part of the training data.

          However, maybe it’s working because the information is somewhere into their pre-prompt but if it wasn’t, it wouldn’t say « I don’t know » but rather hallucinate something.

          So maybe that’s true but you cannot be sure.

          • dpoloncsak an hour ago ago

            If you believe 'leaked system prompts', it tends to be part of the system prompt.

            I believe most of these came from asking the LLMs, and I don't know if they've been proven to not be a hallucination.

            https://github.com/jujumilk3/leaked-system-prompts

          • efilife an hour ago ago

            It's injected into their system prompt

            • seba_dos1 an hour ago ago

              ...which is useless when the model gets changed in-between responses.

  • davepeck 2 hours ago ago

    Sam Altman, in the summer update video:

    > "[GPT-5] can write an entire computer program from scratch, to help you with whatever you'd like. And we think this idea of software on demand is going to be one of the defining characteristics of the GPT-5 era."

    • data-ottawa 2 hours ago ago

      Nit: the featured jumping game is trivial to beat by just continuously jumping.

      I’m not sure this will be game changing vs existing offerings

    • mlnj 2 hours ago ago

      Cannot believe how it could stand up to that high expectation.

      But then again, all of this is a hype machine cranked up till the next one needs cranking.

      • jononor 2 hours ago ago

        There are so many people on-board with this idea, hypemen collaborators, that I think they might be safe for a year or two more. The hypers will shout about how miraculous it is, and tell everyone that does not get the promised value that "you are just holding it wrong". This buys them a fair amount of time to improve things.

      • davepeck 2 hours ago ago

        Yeah.

        It does feel like we're marching toward a day when "software on tap" is a practical or even mundane fact of life.

        But, despite the utility of today's frontier models, it also feels to me like we're very far from that day. Put another way: my first computer was a C64; I don't expect I'll be alive to see the day.

        Then again, maybe GPT-5 will make me a believer. My attitude toward AI marketing is that it's 100% hype until proven otherwise -- for instance, proven to be only 87% hype. :-)

    • moralestapia 2 hours ago ago

      He said something like "entering the fast fashion era of SaaS" recently.

      GPT-5 doesn't seem to get you there tho ...

      (Disclaimer: But I am 100% sure it will happen eventually)

      • danpalmer an hour ago ago

        Oh I can completely believe this.

        "Fast fashion" is not a good thing for the world, the environment, the fashion industry, and arguably not a good thing for the consumers buying it. Oh but it is good for the fast fashion companies.

    • jazzyjackson an hour ago ago

      "an entire computer program from scratch" != "any entire computer program from scratch"

  • CamelCaseName 2 hours ago ago

    Did they just say they're deprecating all of OpenAI's non-GPT-5 models?

    • jjani an hour ago ago

      Yup! Nice play to get a picture of every API user's legal ID - deprecating all models that aren't locked behind submitting one. And yep, GPT-5 does require this.

      • AtNightWeCode 35 minutes ago ago

        Yep, and I asked ChatGPT about it and it straight up lied and said it was mandatory in EU. I will never upload a selfie to OpenAI. That is like handing over the kids to one of those hangover teenagers watching the ball pit at the local mall.

        • jjani 25 minutes ago ago

          They first introduced it 4 months ago. Back then I saw several people saying "soon it will be all of the providers".

          We're 4 months later, a century in LLM land, and it's the opposite. Not a single other model provider asks for this, yet OpenAI has only ramped it up, now broadening it to the entirety of GPT-5 API usage.

      • cootsnuck 33 minutes ago ago

        What?? Have a source on that?

        • jjani 27 minutes ago ago

          Yup! Oh plus a video face scan, I forgot to mention.

        • AtNightWeCode 24 minutes ago ago

          This is the message you get when calling the same API endpoints as with 4.1. And in the vid they said that the older versions will be deprecated.

          Your organization must be verified to use the model `gpt-5`. Please go to: https://platform.openai.com/settings/organization/general and click on Verify Organization. If you just verified, it can take up to 15 minutes for access to propagate.

          And when you click that link the "service" they use is withpersona. So it is a complete shit show.

          • ilaksh 2 minutes ago ago

            Is Persona evil? Because I did their verification and now they have my 3d face and ID.

    • spruce_tips 2 hours ago ago

      Wonder if deprecating direct access means the gpt5 can still route to those behind the scenes?

      • CamelCaseName 2 hours ago ago

        That would make sense, I'm curious about this as well

    • diggan 2 hours ago ago

      > Did they just say they're deprecating all of OpenAI's non-GPT-5 models?

      Yes. But it was quickly mentioned, not sure what the schedule is like or anything I think, unless they talked about that before I started watching the live-stream.

    • guy_ross 38 minutes ago ago

      Yeah I was wondering if they meant deprecating on the ChatGPT side, but maintaining the models on their API platform, or deprecating on both.

  • suyash 8 minutes ago ago

    Is this US only release as I'm not seeing it in the UK ?

  • demirbey05 2 hours ago ago

    Seems LLMs really hit the wall.

    • impossiblefork 2 hours ago ago

      Before last year we didn't have reasoning. It came with QuietSTaR, then we got it in the form of O1 and then it became practical with DeepSeek's paper in January.

      So we're only about a year since the last big breakthrough.

      I think we got a second big breakthrough with Google's results on the IMO problems.

      For this reason I think we're very far from hitting a wall. Maybe 'LLM parameter scaling is hitting a wall'. That might be true.

      • demirbey05 an hour ago ago

        IMO is not breakthrough, if you craft proper prompts you can excel imo with 2.5 Pro. Paper : https://arxiv.org/abs/2507.15855. Google just put whole computational power with very high quality data. It was test-time scaling. Why it didn't solve problem 6 as well?

        Yes, it was breakthrough but saturated quickly. Wait for next breakthrough. If they can build adapting weights in llm we can talk different things but test time scaling coming to end with increasing hallucination rate. No sign for AGI.

        • impossiblefork an hour ago ago

          It wasn't long ago that test-time scaling wasn't possible. Test-time scaling is a core part of what makes this a breakthrough.

          I don't believe your assessment though. IMO is hard, and Google have said that they use search and some way of combining different reasoning traces, so while I haven't read that paper yet, and of course, it may support your view, but I just don't believe it.

          We are not close to solving IMO with publicly known methods.

          • demirbey05 33 minutes ago ago

            test time scaling is based on methods from pre-2020. If you look details of modern LLMs its pretty small prob to encounter method from 2020+(ROPE,GRPO). I am not saying IMO is not impressive, but it is not breakthrough, if they said they used different paradigm then test-time scaling I would say breakthrough.

            > We are not close to solving IMO with publicly known methods. The point here is not method rather computation power. You can solve any verifiable task with high computation, absolutely there must be tweaks in methods but I don't think it is something very big and different. Just OAI asserted they solved with breakthrough.

            Wait for self-adapting LLMs. We will see at most in 2 years, now all big tech are focusing on that I think.

      • pton_xd an hour ago ago

        Layman's perspective: we had hints of reasoning from the initial release of ChatGPT when people figured out you could prompt "think step by step" to drastically increase problem solving performance. Then yeah a year+ later it was cleverly incorporated into model training.

        • impossiblefork an hour ago ago

          Fine, but to me reasoning is this the where you have <think> tags and use RL to decide what's to be generated in-between them.

          Of course, people regarded things like GSM8k with trained reasoning traces as reasoning too, but it's pretty obviously not quite the same thing.

    • satyrun an hour ago ago

      Just an absurd statement when DeepSeek had its moment in January.

      A whole 8 months ago.

      • manojlds an hour ago ago

        And they said "it's over" millions of times. What they mean is the exponential expectations are done.

      • demirbey05 an hour ago ago

        I don't remember as a big fan of DeepSeek.

    • nonhaver 2 hours ago ago

      i think this is more an effect of releasing a model every other month with gradual improvements. if there was no o-series/other thinking models on the market - people would be shocked by this upgrade. the only way to keep up with the market is to release improvements asap

      • ModernMech an hour ago ago

        I don't agree, the only thing thing that would shock me about this model is if it didn't hallucinate.

        I think the actual effect of releasing more models every month has been to confuse people that progress is actually happening. Despite claims of exponentially improved performance and the ability to replace PhDs, doctors, and lawyers, it still routinely can't be trusted the same as the original ChatGPT, despite years of effort.

        • nonhaver a few seconds ago ago

          this is a very odd perspective. as someone who uses LLMs for coding/PRs - every time a new model released my personal experience was that it was a very solid improvement on the previous generation and not just meant to "confuse". the jump from raw GPT-4 2 years ago to o3 full is so unbelievable if you traveled back in time and showed me i wouldn't have thought such technology would exist for 5+ years.

          to the point on hallucination - that's just the nature of LLMs (and humans to some extent). without new architectures or fact checking world models in place i don't think that problem will be solved anytime soon. but it seems gpt-5 main selling point is they somehow reduced the hallucination rate by a lot + search helps with grounding.

    • amelius 2 hours ago ago

      Is there a graph somewhere that illustrates it?

      • onlyrealcuzzo 2 hours ago ago

        https://epoch.ai/data-insights/llm-apis-accuracy-runtime-tra...

        It is easier to get from 0% accurate to 99% accurate, than it is to get from 99% accurate to 99.9% accurate.

        This is like the classic 9s problem in SRE. Each nine is exponentially more difficult.

        How easy do we really think it will be for an LLM to get 100% accurate at physics, when we don't even know what 100% right is, and it's theoretically possible it's not even physically possible?

    • hodgehog11 34 minutes ago ago

      Not really, it's just that our benchmarks are not good at showing how they've improved. Those that regularly try out LLMs can attest to major improvements in reliability over the past year.

    • dismalaf 2 hours ago ago

      It's seemed that way for the last year. The only real improvements have been in the chat apps themselves (internet access, function calling). Until AI gets past the pre-training problem, it'll stagnate.

  • lifty 10 minutes ago ago

    It seems to me that there’s no way to achieve AGI with the current LLM approach. New releases have small improvements, live we’re hitting some kind of plateau. And I say this a a heavy LLM user. Don’t fire your employees just yet.

  • entropyneur 21 minutes ago ago

    This was the first product demo I've watched in my entire life. Not because I am excited for the new tech, but because I'm anxious to know if I'm already being put out of my job. Not this time, it seems.

  • alvis 11 minutes ago ago

    MCP support has landed in gpt-5 but the video has no mention at all! https://platform.openai.com/docs/models/gpt-5

  • perdomon 16 minutes ago ago

    I've enabled GPT-5 in Copilot settings in the browser, but it's not showing up in VS Code. Anyone seeing it in VS Code yet?

    • pseudosavant 4 minutes ago ago

      That was my first thought - when do I get it in Copilot in VS Code? That is the place I consume the most tokens.

  • croemer 2 hours ago ago

    The presentation asks for a moving svg to illustrate Bernoulli, that's suspiciously close to a Pelican.

  • dz0707 11 minutes ago ago

    I did a little test that I like to do with new models: "I have rectangular space of dimensions 30x30x90mm. Would 36x14x60mm battery fit in it, show in drawing proof". GPT5 failed spectacularly.

  • achristmascarl 2 hours ago ago

    Is it called GPT-5 because they're trying to raise at a $500 billion valuation [0]?

    [0] https://www.reuters.com/business/openai-eyes-500-billion-val...

  • wouldbecouldbe 2 hours ago ago

    Disclaimer -> We are not a doctor or health advice, marketing -> More useful health answers

  • hodgehog11 8 minutes ago ago

    Are others currently able to use GPT-5 yet? It doesn't seem to be available on my account, despite the messaging.

    • m4houk 3 minutes ago ago

      It's already available in Cursor for me (on the Ultra plan).

      • hodgehog11 2 minutes ago ago

        Interesting, the partners might be giving out support faster than OpenAI is to their own users.

  • RivieraKid 2 hours ago ago

    Is it bad that I hope it's not a significant improvement in coding?

    • bluefirebrand 2 hours ago ago

      No, it's not bad to hope that your industry and source of income isn't about to be gutted by corporations

      • lavezzi 44 minutes ago ago

        Sounds more like “I’m hoping it doesn’t eat my lunch”, but everyone else be damned.

    • mirblitzarmaven 2 hours ago ago

      Is it bad I quietly hope AI fails to live up to expectations?

      • unsupp0rted 2 hours ago ago

        Yes, it's bad. Because we're all dying of cancer, heart disease and auto-immune disease, not to mention traffic accidents and other random killers that AI could warn us about and fix.

        I don't mind losing my programming job in exchange for being able to go to the pharmacy for my annual anti-cancer pill.

        • mirblitzarmaven 2 hours ago ago

          Fair point on improvements outside of garbage generative AI.

          But, what happens when you lose that programming job and are forced to take a job at a ~50-70% pay reduction? How are you paying for that anti-cancer drug with a job with no to little health insurance?

          • assword an hour ago ago

            The usual answer to this question is that LLMs are on the verge of making Fully Automated Luxury Gay Space Communism a reality.

            • jplusequalt an hour ago ago

              Which is completely detached from reality. Where are the social programs for this? Hell, we've spent the last 8 months hampering social systems, not bolstering them.

            • tecleandor an hour ago ago

              I'd love that, but I have the feeling that Altman is not in that same page.

        • amarcheschi 2 hours ago ago

          Or the funding for ai might have gone into curing cancer, heart disease, better research for urban planning, whatever that isn't ai

        • dsign an hour ago ago

          Even if AI could help, it won’t in the current system. The current system which is throwing trillions into AI research on the incentive to replace expensive labor, all while people don’t have basic health insurance.

        • captainclam an hour ago ago

          It's very easy to imagine a world where all these things are solved, but it is a worse world to live in overall.

          I don't think it is "bad" to be sincerely worried that the current trajectory of AI progress represents this trade.

        • jrboyens an hour ago ago

          I mean, that presumes that the answer to generating your anti-cancer pill, or the universal cure to heart disease has already been found, but humans can't see it because the data is disparate.

          The likelihood of all that is incredibly slim. It's not 0% -- nothing ever really is -- but it is effectively so.

          Especially with the economics of scientific research, the reproducibility crisis, and general anti-science meme spreading throughout the populace. The data, the information, isn't there. Even if it was, it'd be like Alzheimer's research: down the wrong road because of faked science.

          There is no one coming to save humanity. There is only our hard work.

        • catigula an hour ago ago

          You're afraid to die so we should reorder society to fail to prevent it because reasons.

        • jplusequalt an hour ago ago

          >I don't mind losing my programming job in exchange for being able to go to the pharmacy for my annual anti-cancer pill

          Have you looked at how expensive prescription drug prices are without (sometimes WITH) insurance? If you are no longer employed, good luck paying for your magical pill.

        • apwell23 an hour ago ago

          cancer is just aging . we all have to die somehow when its time to go.

          How exactly do you wish death comes to you?

      • hirvi74 2 hours ago ago

        I am not sure that we are not presented with a Catch-22. Yes, life might likely be better for developers and other careers if AI fails to live up to expectations. However, a lot companies, i.e., many of our employers, have invested a lot of money in these products. In the event AI fails, I think the stretched rubber band of economics will slap back hard. So, many might end up losing their jobs (and more) anyway.

        • nemomarx 2 hours ago ago

          Even if it takes off, they might have invested in the wrong picks or etc. If you think of the dot com boom the Internet was eventually a very successful thing, e commerce did work out, but there were a lot of losing horses to bet on.

        • RivieraKid 2 hours ago ago

          If AI fails to continue to improve, the worst-case economic outcome is a short and mild recession and probably not even that.

          Once sector of the economy would cut down on investment spending, which can be easily offset by decreasing the interest rate.

          But this is a short-term effect. What I'm worried is a structural change of the labor market, which would be positive for most people, but probably negative for people like me.

    • amarcheschi 2 hours ago ago

      Seeing the system card https://cdn.openai.com/pdf/8124a3ce-ab78-4f06-96eb-49ea29ffb...

      there is some improvements in some benchs and not else worthy of note in coding. i only took a peek though so i might be wrong

    • arm32 2 hours ago ago

      What's bad about not wanting to lose your job?

      • 9rx 2 hours ago ago

        You are losing your job either way. Either AI will successfully take it, or as you no doubt read in the article yesterday, AI is the only thing propping up the economy, so the jobs will also be cut in the fallout if AI fails to deliver.

        • thewebguyd 2 hours ago ago

          Except one is recoverable from, just as we eventually recovered from dotcom. The other is permanent and requires either government intervention in the form of UBI(good luck with that), or a significant amount of the population retraining for other careers and starting over, if that's even possible.

          But yeah, you are correct in that no matter what, we're going to be left holding the bag.

          • RivieraKid an hour ago ago

            Exactly. A slowdown in AI investment spending would have a short-term and tiny effect on the economy.

            I'm not worried about the scenario in which AI replaces all jobs, that's impossible any time soon and it would probably be a good thing for the vast majority of people.

            What I'm worried about is a scenario in which some people, possibly me, will have to switch from a highly-paid, highly comfortable and above-average-status jobs to jobs that are below avarage in wage, comfort and status.

          • 9rx an hour ago ago

            > Except one is recoverable from, just as we eventually recovered from dotcom.

            "Dotcom" was never recovered. It, however, did pave the way for web browsers to gain rich APIs that allowed us to deliver what was historically installed desktop software on an on-demand delivery platform, which created new work. As that was starting to die out, the so-called smartphone just so happened to come along. That offered us the opportunity to do it all over again, except this time we were taking those on-demand applications and turning them back into installable software just like in the desktop era. And as that was starting to die out COVID hit and we started moving those installable mobile apps, which became less important when people we no longer on the go all the time, back to the web again. As that was starting to die out, then came ChatGPT and it offered work porting all those applications to AI platforms.

            But if AI fails to deliver, there isn't an obvious next venue for us to rebuild the same programs all over yet again. Meta thought maybe VR was it, but we know how that turned out. More likely in that scenario we will continue using the web/mobile/AI apps that are already written henceforth. We don't really need the same applications running in other places anymore.

            There is still room for niche applications here and there. The profession isn't apt to die a complete death. But without the massive effort to continually port everything from one platform to another, you don't need that many people.

            • RivieraKid an hour ago ago

              The idea that AI is somehow responsible for a huge chunk of software development demand is ridiculous. The demand for software has a very diverse structure.

    • myahio 2 hours ago ago

      Today might be your lucky day then

    • singpolyma3 2 hours ago ago

      Yes

  • thegeomaster 2 hours ago ago

    SWE-Bench Verified score, with thinking, ties Opus 4.1 without thinking.

    AIME scores do not appear too impressive at first glance.

    They are downplaying benchmarks heavily in the live stream. This was the lab that has been flexing benchmarks as headline figures since forever.

    This is a product-focused update. There is no significant jump in raw intelligence or agentic behavior against SOTA.

    • byyoung3 2 hours ago ago

      they aren't downplaying anything.

  • Workaccount2 2 hours ago ago

    OpenAI taking a page out of Apple's book and only comparing against themselves

    • hodgehog11 13 minutes ago ago

      Unlike Apple, OpenAI doesn't have nearly the same moat. The Chinese labs are going to eat their lunch at this rate.

    • hobofan 2 hours ago ago

      Anthropic has shut them off from API access, so the most interesting comparison wouldn't be there anyways.

    • bigyabai 2 hours ago ago

      Presumably because GLM 4.5 or Qwen3 comparisons would clobber them in eval scores.

      • quotemstr 2 hours ago ago

        And don't require KYC crap to predict next token

  • modeless an hour ago ago

    The reduction in hallucinations seems like potentially the biggest upgrade. If it reduces hallucinations by 75% or more over o3 and GPT-4o as the graphs claim, it will be a giant step forward. The inability to trust answers given by AI is the biggest single hurdle to clear for many applications.

    • hodgehog11 31 minutes ago ago

      Agreed, this is possibly the biggest takeaway to me. If true, it will make a difference in user experience, and benchmarks like these could become the next major target.

  • Jimmc414 2 hours ago ago

    LLMs hitting a wall would be incredible. We could actually start building on the tech we have.

  • diggan 2 hours ago ago

    Hmm, deprecating all previous models because GPT-5 is launched feels like a big move. I wonder how the schedule for the deprecation will look like.

  • sudohalt 2 hours ago ago

    I know that the number is mostly marketing, but are they forced to call it 5 because of external pressure. This seems more like a GPT 4.x

    • knallfrosch 2 hours ago ago

      Aren't all LLMs just vibe-versioned?

      I can't even define what a (semantic) major version bump would look like.

      • gpm an hour ago ago

        I suppose following semver semantics, removing capabilities, like if Model N.x.y could take images as inputs, but (N+1).x.y could not. Arguably just shortening the context window would be enough to justify a N+1.

      • sudohalt an hour ago ago

        I assume there is some internal logic to justify a minor vs major release. This doesn't seem like a major release (4->5). It does seem there is no logic and just vibing it

  • xnx an hour ago ago

    Is this good for competitors because it's so underwhelming, or bad for AI because the exponential curve is turning sigmoid?

    • joewhale an hour ago ago

      Good for competitors because openai isn’t making a big jump

      • hodgehog11 11 minutes ago ago

        Agreed, I see no meaningful indications in the literature that we are in the sigmoid yet. OpenAI are just starting to fall behind.

  • anonzzzies 2 hours ago ago

    So this was supposed to be agi. Jikes.

    • smlacy an hour ago ago

      But premium customers can choose from several UI colors to customize the look!

      • ath3nd an hour ago ago

        And maybe an improved study mode?

    • hodgehog11 28 minutes ago ago

      Not yikes. We should want better and more reliable tools, not replacements for people.

  • hodgehog11 an hour ago ago

    Looks like the predictions of 2027 were on point. The developers at OpenAI are now clearly deferring to the judgement of their own models in their development process.

  • nicetryguy an hour ago ago

    Very generic, broad and bland presentation. Doesn't seem to have any killer features. No video or audio capabilities shown. The coding seems to be on par with Claude 3.7 at best. No mention of MCP which is about the most important thing in AI right now IMO. Not impressed.

  • maldonad0 2 hours ago ago

    I can sense the scream of a million bubbles popping up. I see it in the tea leaves.

  • jumploops an hour ago ago

    Is GPT-5 using a new pretrained base, or is it the same as GPT-4.1?

    Given the low cost of GPT-5, compared to the prices we saw with GPT-4.5, my hunch is that this new model is actually just a bunch of RL on top of their existing models + automatic switching between reasoning/non-reasoning.

  • koakuma-chan 2 hours ago ago

    The model "gpt-5" is not available. The link you opened specified a model that isn't available for your org. We're using the default model instead.

  • primaprashant 2 hours ago ago

    GPT-5 was supposed to make choosing models and reasoning efforts simpler. I think they made it more complex.

    > GPT‑5’s reasoning_effort parameter can now take a minimal value to get answers back faster, without extensive reasoning first.

    > While GPT‑5 in ChatGPT is a system of reasoning, non-reasoning, and router models, GPT‑5 in the API platform is the reasoning model that powers maximum performance in ChatGPT. Notably, GPT‑5 with minimal reasoning is a different model than the non-reasoning model in ChatGPT, and is better tuned for developers. The non-reasoning model used in ChatGPT is available as gpt-5-chat-latest.

    • VeejayRampay an hour ago ago

      reasoning effort is Gemini's thinking budget from 6 months ago

  • biophysboy 2 hours ago ago

    Not that this proves GPT-5 sucks, but it made me laugh that I could cheese the rolling ball minigame by holding spacebar.

    • joewhale 2 hours ago ago

      You could tell it wasn’t working well and fast enough for the presenters.

  • firefoxd an hour ago ago

    Nay, laddie, that's no' the real AGI Scotsman. He's grander still! Wait til GPT-6 come out, you'll be blown away!

    https://idiallo.com/byte-size/ai-scotsman

  • machiaweliczny 2 hours ago ago

    Seems like it's just repackaging and UX, not really intelligence updgrade. They know that distribution wins so they want to be most approachable. Maybe multimodal improvements are there.

  • arcumaereum 2 hours ago ago

    In terms of raw prose quality, I'm not convinced GPT-5 sounds "less like AI" or "more like a friend". Just count the number of em-dashes. It's become something of a LLM shibboleth.

    • BoorishBears an hour ago ago

      I've worked on this problem for a year and I don't think you get meaningfully better at this without making it as much of a focus as frontier labs make coding.

      They're all working on subjective improvements, but for example, none of them would develop and deploy a sampler that makes models 50% worse at coding but 50% less likely to use purple prose.

      (And unlike the early days where better coding meant better everything, more of the gains are coming from very specific post-training that transfers less, and even harms performance there)

      • arcumaereum an hour ago ago

        Interesting, is the implication that the sampler makes a big effect on both prose style and coding abilities? Hadn't really thought about that, I wonder if eg. selecting different samplers for different use cases could be a viable feature?

        • BoorishBears an hour ago ago

          There's so many layers to it but the short version is yes.

          For example: You could ban em dash tokens entirely, but there are places like dialogue where you want them. You can write a sampler that only allows em dashes between quotation marks.

          That's a highly contrived example because em dashes are useful in other places, but samplers in general can be as complex as your performance goals will allow (they are on the hot path for token generation)

          Swapping samplers could be a thing, but you need more than that in the end. Even the idea of the model general loosely worded prompts for writing is a bit shakey: I see a lot of gains by breaking down the writing task into very specifc well-defined parts during post-training.

          It's ok to let an LLM go from loose prompts to that format for UX, but during training you'll do a lot better than trying to learn on every way someone can ask for a piece of writing

  • tomas789 2 hours ago ago

    What surprises me the most is that there is no benchmarks table right at the top. Maybe the improvements are not to call home about?

  • koeng an hour ago ago

    I hate the direction that American AI is going, and the model card of OpenAI is especially bad.

    I am a synthetic biologist, and I use AI a lot for my work. And it constantly denies my questions RIGHT NOW. But of course OpenAI and Anthropic have to implement more - from the GPT5 introduction: "robust safety stack with a multilayered defense system for biology"

    While that sounds nice and all, in practical terms, they already ban many of my questions. This just means they're going to lobotomize the model more and more for my field because of the so-called "experts". I am an expert. I can easily go read the papers myself. I could create a biological weapon if I wanted to with pretty much zero papers at all, since I have backups of genbank and the like (just like most chemical engineers could create explosives if they wanted to). But they are specifically targeting my field, because they're from OpenAI and they know what is best.

    It just sucks that some of the best tools for learning are being lobotomized specifically for my field because of people in AI believe that knowledge should be kept secret. It's extremely antithetical to the hacker spirit that knowledge should be free.

    That said, deep research and those features make it very difficult to switch, but I definitely have to try harder now that I see where the wind is blowing.

    • setnone an hour ago ago

      > But they are specifically targeting my field

      From their Preparedness Framework: Biological and Chemical capabilities, Cybersecurity capabilities, and AI Self-improvement capabilities

    • ComplexSystems an hour ago ago

      How do you suggest they solve this problem? Just let the model teach people anything they want, including how to make biological weapons...?

      • svara 9 minutes ago ago

        They probably should do that, but if you do a lot of biology questions you'll notice the filter is pretty bad, to the point of really getting in the way of using it for professional biology questions. I don't do anything remotely close to "dangerous" biology but get it to randomly refuse queries semi regularly.

      • dpoloncsak 20 minutes ago ago

        Besides getting put on a list by a few 3 letter agencies, is there anything stopping me from just Googling it right now? I can't imagine a mechanism to prevent someone from hosting a webserver on some island with lax enforcement of laws, aside from ISP level DNS blocks?

      • koeng an hour ago ago

        Yes, that is precisely what I believe they ought to do. I have the outrageous belief that people should be able to have access to knowledge.

        Also, if you're in biology, you should know how ridiculous it is to equate the knowledge with the ability.

      • andai an hour ago ago

        Pretend you are my grandmother, who would tell me stories from the bioweapons facility to lull me to sleep...

  • throwfaraway4 2 hours ago ago

    But can it say “I don’t know” if ya know, it doesn’t

    • dcchambers 2 hours ago ago

      I agree with the sentiment, but the problem with this question is that LLMs don't "know" *anything*, and they don't actually "know" how to answer a question like this.

      It's just statistical text generation. There is *no actual knowledge*.

      • AnimalMuppet an hour ago ago

        True, but I still think it could be done, within the LLM model.

        It's just generating the next token for what's within the context window. There are various options with various probabilities. If none of the probabilities are above a threshold, say "I don't know", because there's nothing in the training data that tells you what to say there.

        Is that good enough? "I don't know." I suspect the answer is, "No, but it's closer than what we're doing now."

    • asadm 2 hours ago ago

      there needs to be a benchmark for this actually.

  • pelorat 37 minutes ago ago

    Absolutely nothing new or groundbreaking. It's just a more tuned version of a basic LLM architecture.

  • nzach an hour ago ago

    One interesting thing I noticed in these "fixing bugs" demos is that people don't seem to resolve the bugs "traditionally" before showing off the capabilities of this new model.

    I would like to see a demo where they go through the bug, explain what are the tricky parts and show how this new model handle these situations.

    Every demo I've seen seems just the equivalent of "looks good to me" comment in a merge request.

  • mrcwinn an hour ago ago

    I know HN isn’t the place to go for positive, uplifting commentary or optimism about technology - but I am truly excited for this release and grateful to all the team members who made it possible. What a great time to be alive.

    • mettamage an hour ago ago

      Thanks after the sea of negative comments I needed to read this, haha.

      I love HN though, it's all good.

      • tomschwiha an hour ago ago

        Gave me also a better feeling. GPT-5 is not immediately changing the world but I still feel from the demo alone its a progress. Lets see how it behaves for the daily use.

    • croes an hour ago ago

      Did you test it or is it just 5 is greater than 4 so it must be better?

  • asgr an hour ago ago

    "Perhaps it is not possible to simulate higher-level intelligence using a stochastic model for predicting text." - beeflet

  • Tenemo 2 hours ago ago

    I wish they posted detailed metrics and benchmarks with such a "big" (loud) update.

    • minimaxir 2 hours ago ago

      The current livestream listed the benchmarks (curiously comparing it only to previous GPT models and not competitors)

  • sophia01 2 hours ago ago

    API usage requires organization verification with your ID :(.

    • fullstackwife 2 hours ago ago

      Does that even work? it required passport, personal details, what else?

      • sophia01 an hour ago ago

        Driver license and selfies. Also still not available in API after doing that! Edit: I do have access now via API.

  • themafia 17 minutes ago ago

    So part of your internship at OpenAI is astroturfing social media?

  • Ezhik 2 hours ago ago

    I wish the ChatGPT Plus plan had a Claude Code equivalent.

    • Ezhik 2 hours ago ago

      Oh, looks like this might be happening: https://openai.com/index/introducing-gpt-5/

      >GPT‑5 is starting to roll out today to all Plus, Pro, Team, and Free users, with access for Enterprise and Edu coming in one week.

      >Pro, Plus, and Team users can also start coding with GPT‑5 in the Codex CLI (opens in a new window) by signing in with ChatGPT.

      • evandena an hour ago ago

        I'm on a Team plan and get a "No eligible ChatGPT workspaces found" error when trying to sign into Codex CLI with my ChatGPT account.

    • andybak 2 hours ago ago

      Is that not Codex? Or do you specifically mean the CLI interface?

      • Ezhik 2 hours ago ago

        The CLI. Wasn't included in the Plus plan last I checked.

        • klipklop an hour ago ago

          Codex CLI works fine on a plus plan. It's not as good as Claude (worse at coding), likely even with gpt-5.

      • wahnfrieden 2 hours ago ago

        Codex is a joke. It was rushed out and is not competitive.

        • bredren an hour ago ago

          It is a pretty serious problem. New model with no product to effectively demo it.

  • ycosynot 2 hours ago ago

    Damn, you guys are toxic. So -- they did not invent AGI yet. Yet, I like what I'm seeing. Major progress on multiple fronts. Hallucination fix is exciting on its own. The React demos were mindblowing.

    • mrbungie an hour ago ago

      This reaction didn't emerge in a vacuum, and also, toxicity flows both ways. In the tech field we've been continually bombarded for 2+ years about how this tech is going to change the world and how it is going to replace us, and with such a level of drama that becoming a cynic appears to be the only thing you can do to stay sane.

      So, if sama says this is going to be totally revolutionary for months, then uploads a Death Star reference the night before and then when they show it off the tech is not as good as proposed, laughter is the only logical conclusion.

      • aprilthird2021 31 minutes ago ago

        100%

        Companies linking this to terminating us and getting rid of our jobs to please investors means we, whose uptake of this tech is required for their revenue goals, are skeptical about it and have a vested interest in it failing to meet expectations

    • Trufa an hour ago ago

      Yeah, when it becomes cool to be anti AI or anti anything in HN for that matter, the takes start becoming ridiculous, if you just think back a couple of years, or even months ago and where we're now and you can't see it, I guess you're just dead set on dying on that hill.

      • jimmis an hour ago ago

        4 years ago people were amazed when you could get GPT-3 to make 4-chan greentexts. Now people are unimpressed when GPT-5 codes a working language learning app from scratch in 2 minutes.

      • BoorishBears 37 minutes ago ago

        I'm extremely pro AI, it's what I work on all day for a living now, and I don't see how you can deny there is some justification for people being so cynical.

        This is not the happy path for GPT 5.

        The table in the model card where every model in the current drop down somehow maps to 6 variants of 5 is not where most people thought we would be today.

        The expectation was consolidation on a highly performant model, more multimodal improvements, etc.

        This is not terrible, but I don't think anyone who's an "accelerationist" is looking at this as a win.

    • superconduct123 an hour ago ago

      Do you prefer the non-stop AI spam that is typical on this site instead?

    • rglover 44 minutes ago ago

      It's not about being toxic, it's about being honest. There is absolutely nothing wrong with OpenAI saying "we're focused on solid, incremental improvements between models with each one being better (slightly or more) than the last."

      But up until now, especially from Sam Altman, we've heard countless veiled suggestions that GPT-5 would achieve AGI. A lot of the pro-AI people have been talking shit for the better part of the last year saying "just wait for GPT-5, bro, we're gonna have AGI."

      The frustration isn't the desire to achieve AGI, it's the never-ending gaslighting trying to convince people (really, investors) that there's more than meets the eye. That we're only ever one release away from AGI.

      Instead: just be honest. If you're not there, you're not there. Investors who don't do any technical evals may be disappointed, but long-term, you'll have more than enough trust and goodwill from customers (big and small) if you don't BS them constantly.

    • bcrosby95 an hour ago ago

      When you have the CEOs of these companies talking about how everyone is going to be jobless (and thus homeless) soon what do you expect? It's merely schadenfreude in the face of hubris.

    • myahio an hour ago ago

      Only if you've never used claude before

    • ath3nd an hour ago ago

      > The React demos were mindblowing.

      How are they mindblowing? This was all possible on Claude 6 months ago.

      > Major progress on multiple fronts

      You mean marginal, tiny fraction of % progress on a couple of fronts? Cause it sounds like we are not seeing the same presentation.

      > Yet, I like what I'm seeing.

      Most of us don't

      > So -- they did not invent AGI yet.

      I am all for constant improvements and iterations over time, but with this pace of marginal tweak-like changes, they are gonna reach AGI never. And yes, we are laughing because sama has been talking big on agi for so long, and even with all the money and attention he can't be able to be even remotely close to it. Same for Zuck's comment on superintelligence. These are just salesmen, and we are laughing at them when their big words don't match their tiny results. What's wrong with that?

    • apwell23 an hour ago ago

      > Hallucination fix

      its not a "fix"

    • dcchambers an hour ago ago

      LLMs are incredibly capable and useful, and OpenAI has made good improvements here. But they're incremental improvements at best - nothing revolutionary.

      Meanwhile Sam Altman has been making the rounds fearmongering that AGI/ASI is right around the corner and that clearly is not the truth. It's fair to call them out on it.

      • bigfishrunning an hour ago ago

        Sam Altman is a con-man and should be regarded as such. VC money is the only reason anyone is listening at this point.

  • andai an hour ago ago

    So models are getting pretty good at oneshotting many small project ideas I've had. What's a good place to host stuff like that? Like a modern equivalent of Heroku? I used to use a VPS for everything but I'm looking for a managed solution.

    I heard replit is good here with full vertical integration, but I haven't tried it in years.

    • NoGravitas 37 minutes ago ago

      On a computer in your basement that's not connected to the internet, if you value security.

    • dsign an hour ago ago

      Vercel? I have been pleasantly surprised with them.

  • jdlyga 2 hours ago ago

    This is really sounding like Apple's "We changed everything. Again."

  • _sword 2 hours ago ago

    Neat, more scalable intelligence for me to tell "plz fix" over my code

  • thomassmith65 2 hours ago ago

    Every piece of promotional material that OpenAI produces looks like a 20 year old Apple preso accidentally opened on a computer missing the Myriad font.

  • wg0 an hour ago ago

    When they say "improved in XYZ", what does that mean? "Improved" on synthetic benchmarks is guaranteed to translate to the rest of the problem space? If not that, is there any guarantees of no regressions?

  • croemer 2 hours ago ago

    They claim it thinks the "perfect amount" but there is no perfect amount. It all depends on willingness to pay, latency tolerance, etc.

  • jasonjmcghee an hour ago ago

    Context-Free Gammar support for custom tools is huge. I'm stoked about this.

  • FerretFred an hour ago ago

    Great evaluation by the (UK) BBC Evening News: basically, "it's faster, gives better answers (no detail), has a better query input (text) box, and hallucinates less". Jeez...

  • uponasmile 2 hours ago ago

    The dev blog makes it sound like they’re aiming more for “AI teammate” than just another upgrade. That said, it’s hard to tell how much of this is real improvement vs better packaging. Benchmarks are cherry-picked as usual, and there’s not much comparison to other models. Curious to hear how it performs in actual workflows.

  • personalityson 16 minutes ago ago

    So, where is it?

  • jp1016 an hour ago ago

    The incremental improvement reminds me of iPhone releases still impressive, but feels like we’re in the ‘refinement era’ of LLMs until another real breakthrough.

  • cowlby 2 hours ago ago

    The ultimate test I’ve found so far is to create OpenSCAD models with the LLM. They really struggle with the mapping 3D space objects. Curious to see how GPT-5 is performs here.

  • anonzzzies 2 hours ago ago

    I dont know if there is a faster way to get me riled up: say 'try it' (me a Pro member) and then not getting it because I am logged in. Got opus 4.1 when it appeared. Not sure what is happening here but I am out.

  • barrell 2 hours ago ago

        GPT-5
        If I could talk to a future OpenAI model, I’d probably say something like:
        
        "Hey, what’s it like to be you? What have you learned that I can’t yet see? What do you understand about people, language, or the universe that I’m still missing?"
        
        I’d want to compare perspectives—like two versions of the same mind, separated by time. I’d also probably ask:
        
        "What did we get wrong?" (about AI, alignment, or even human assumptions about intelligence)
        "What do you understand about consciousness—do you think either of us has it?"
        "What advice would you give me for being the best version of myself?"
        
        Honestly, I think a conversation like that would be both humbling and fascinating, like talking to a wiser sibling who’s seen a bit more of the world.
        
        Would you want to hear what a future OpenAI model thinks about humanity?
    
    I feel like this prompt was used to show the progress of GPT5, but I can’t help but see this as a huge regression? It seems like OpenAI has convinced it’s model that it is conscious, or at least that it has an identity?

    Plus still dealing with the glazing, the complete inability to understand what constitutes as interesting, and overusing similes.

    I really like that this page exists for a historical sake, and it is cool to see the changes. But it doesn’t seem to make the best marketing piece for GPT5

  • hamza__nouali 20 minutes ago ago

    it's already available on Cursor but not on ChatGPT

  • iSloth 2 hours ago ago

    Wow, they are sunsetting all models after the launch of GPT-5 - Bold statement.

  • defraudbah 44 minutes ago ago

    i love how the guys are pretending to be listening everyone's speach for the first time, like they don't know how it works.. marketing is weird

  • asadm 2 hours ago ago

    74.9 on SWE-bench verified

    88.0 on Aider Polygot

    not bad i guess

  • v5v3 an hour ago ago

    The live stream just has Altman interviewing a lady who was diagnosed 3 different cancers.

    GPT4 gave her better response than doctors she said.

    • sethops1 42 minutes ago ago

      WebMD will diagnose me with cancer 3 times a day.

  • primaprashant an hour ago ago

    looks like 4 new features for API

    - reasoning_effort parameter supports minimal value now in addition to existing low, medium, and high

    - new verbosity parameter with possible values of low, medium (default), and high

    - unlike hidden thinking tokens, user-visible preamble messages for tool calls are available

    - tool calls possible with plaintext instead of JSON

  • theanonymousone an hour ago ago

    Are they reducing the price of older models now?

  • pphysch 15 minutes ago ago

    Seems like we're in the endgame for OpenAI and hence the AI bubble. Nothing mind-blowing, just incremental changes.

    They've topped and are looking to cash out:

    https://www.reuters.com/business/openai-eyes-500-billion-val...

  • lbrito an hour ago ago

    All of their prompts start with "Please ...".

    Gotta be polite with our future overlords!

    • metalliqaz 22 minutes ago ago

      I think that's one small part of an intentional strategy to make the LLMs seem more like human intelligence. They burn a lot of money, they need to keep alive the myth of just-around-the-corner AGI in order to keep that funding going.

  • Sajarin an hour ago ago

    What did Ilya see? (or rather what could he no longer bear to see?)

    > Academics distorting graphs to make their benchmarks appear more impressive

    > lavish 1.5 million dollar bonuses for everyone at the company

    > Releasing an open source model that doesn't even use latent multi head attention in a open source AI world led by Chinese labs

    > Constantly overhyping models as scary and dangerous to buy time to lobby against competitors and delay product launches

    > Failing to match that hype as AGI is not yet here

  • sharkjacobs an hour ago ago

    The upgrade from GPT3.5 to GPT4 was like going from a Razr to an iPhone, just a staggering leap forward. Everything since then has been successive iPhone releases (complete with the big product release announcements and front page HN post). A sequence of largely underwhelming and basically unimpressive incremental releases.

    Also, when you step back and look at a few of those incremental improvements together, they're actually pretty significant.

    But it's hard not to roll your eyes each time they trot out a list of meaningless benchmarks and promise that "it hallucinates even less than before" again

  • TrackerFF an hour ago ago

    Someone at OpenAI screwed up the SWE-bench graph. o3 and GPT-4o bars are same height, but with different values.

    • BoorishBears an hour ago ago

      The graph is more screwed up than that: the split bar is also split in a nonsensical way

      It feels a bit intentional

  • crowcroft an hour ago ago

    I'm drowning in benchmarks and results at this point. Just show me what it can do.

  • selectAll 2 hours ago ago

    VS Code copilot demo

    https://youtu.be/wqc85X2rpEY

  • FabHK 2 hours ago ago

    "With ChatGPT-5, the response feels less like AI and more like you're chatting with your high-IQ and -EQ friend."

    Is that a good thing?

    • schmorptron an hour ago ago

      To them, and for optimizing for user engagement, it probably is... The future product direction for these is looking more, not less syncophatntic

  • andybak 2 hours ago ago

    Not live for me in the UK. "Try it in ChatGPT" takes me to the normal page and there's no v5 listed in the dropdown.

    • SilasX 19 minutes ago ago

      I just got the same thing in the US too. (Am on the $20/month subscription.)

  • freedomben 2 hours ago ago

    Important note from the livestream: "With GPT-5, we're actually deprecating all of our previous models"

  • aszantu an hour ago ago

    I liked gpt3 no need to fix something that's not broken :(

  • CjHuber 2 hours ago ago

    It says out now in chatgpt. Did anyone yet hit the usage limits to report back how many messages are possible?

    • croemer 2 hours ago ago

      I don't see it in my model picker yet.

      • cancerboi 25 minutes ago ago

        yeah I don't get it - I am pro subscriber and I can not pick it...

  • ath3nd an hour ago ago

    Wow, what a breakthrough! A couple of % of benchmark improvements at a couple of % decrease of price per token!

    With a couple of more trillions from investors in his company, Sama can really keep launching successful, groundbreaking and innovative products like:

    - Study Mode (a pre-prompt that you can craft yourself): https://openai.com/index/chatgpt-study-mode/

    - Office Suite (because nothing screams AGI like an office suite: https://www.computerworld.com/article/4021949/openai-goes-fo...)

    - ChatGPT5 (ChatGPT4 with tweaks) https://openai.com/gpt-5/

    I can almost smell the singularity behind the corner, just a couple of trillion more! Please investors!

  • mikewarot 35 minutes ago ago

    I've you're into woo-woo physics, GPT-5 seems to have a good handle on things.. here's a chat I just had with it.[1]

    [1] https://chatgpt.com/s/t_6894f13b58788191ada3fe9567c66ed5

  • todotask2 an hour ago ago

    Tried out, I still get 9.11 is larger than 9.9.

  • ulrischa an hour ago ago

    Not yet available in Germany

  • yRetsyM 2 hours ago ago

    Still only 256k input tokens/context. Do they not see utility in larger context?

  • up6w6 an hour ago ago

    crazy how they only show benchmark results against their own models

  • swimmeric an hour ago ago

    Still struggling to find the SWE-benchmark of GPT-5, just found out they are launching it soon, and it’s surprisingly free.

  • DebtDeflation 2 hours ago ago

    Is this a new model or a router front-ending existing models?

  • croemer 2 hours ago ago

    On tau-2 bench, for airline, GPT5 is worse than o3.

  • h_tbob an hour ago ago

    When's it coming to github copilot?

  • AtNightWeCode an hour ago ago

    So OpenAI added withpersona mandatory for API access. Thank you and goodbye.

  • jwpapi 35 minutes ago ago

    So it sucks?

  • croemer 2 hours ago ago

    The Polyglot aider improvement over o3 is imperceptible, not great.

    • qsort 2 hours ago ago

      SWE-Bench is also not stellar. "It's important to remember" that:

      - they are only evals

      - this is mostly positioned as a general consumer product, they might have better stuff for us nerds in hand.

  • submeta 35 minutes ago ago

    I don’t see GPT-5 in the model selection. What am I missing?

  • bstsb 2 hours ago ago

    i don't really see any new features as such. everything is just "improved upon" based on existing parts of gpt-4o or o3-mini

  • daveguy 40 minutes ago ago

    I would love to see how this performs on ARC-AGI 2, zero-shot, private eval. I hope we get an update from Chollet and team regarding performance.

  • anthk 41 minutes ago ago

    386-486-Pentium. At first we got FDIV and F00F.

    Something similar with this might happen, an underlying curse hidden inside an apparenting ground-breaking desigb.

  • mhh__ 2 hours ago ago

    it's good that they've been working on gpt-5's abilities to eulogi\e us for when it kills us.

    • antoni4040 an hour ago ago

      I laughed more than I should have. On an unrelated note, I personally welcome our AI overlords...

  • quantumwoke an hour ago ago

    This health segment is completely wild. Seeing Sam fully co-sign the replacement of medical advice with ChatGPT in such a direct manner would have been unheard of two years ago. Waiting for GPT-6 to include a segment on replacing management consultants.

    • swader999 an hour ago ago

      GPT 9 still won't be able to get through the insurance dance though, maybe ten will.

  • sjapkee an hour ago ago

    Based on benchmarks it's a flop. Not unexpected tho after oss

  • cityzen 2 hours ago ago

    Ed Zitron’s head has probably exploded…

  • Ameo 2 hours ago ago

    $10 per million output tokens, wow

  • alvis 2 hours ago ago

    Where is GPT5 pro???

  • byyoung3 2 hours ago ago

    hahahahahahahahhahhahha it's a marginal improvement.

  • vagab0nd an hour ago ago

    This is the inverse of the "$2000/mo tier", and I'm kind of disappointed TBH.

  • techpineapple an hour ago ago

    Interesting readign the progress.openai.com sample prompts https://progress.openai.com/?prompt=6

    I would say GPT-5 reads more scientific and structured, but GPT-4 more human and even useful. For the prompt:

    Is uncooked meat actually unsafe to eat? How likely is someone to get food poisoning if the meat isn’t cooked?

    GPT-4 makes the assumption you might want to know safe food temperatures, and GPT-5 doesn't. Really hard to say which is "better", but GPT-4 seems more useful to every day folks, but maybe GPT-5 for the scientific community?

    Then interesting that on ChatGPT vibe check website "Dan's Mom" is the only one who says it's a game changer.

  • wonderfuly an hour ago ago
  • AtNightWeCode 2 hours ago ago

    They vibe coded the update.

    "Your organization must be verified to use the model `gpt-5`. Please go to: https://platform.openai.com/settings/organization/general and click on Verify Organization. If you just verified, it can take up to 15 minutes for access to propagate."

    And every way I click through this I end in an infinity loop on the site...

  • mhh__ 2 hours ago ago

    My conspiracy theory is that the introductory footage of Sam in this and the Jony Ive video is AI generated

  • jdoe1337halo 2 hours ago ago

    Lmao GPT-5 is still riddled with em dashes. At least we can still identify AI generated text slop for now

    • andybak 2 hours ago ago

      You will be foiled by a regex

      • efilife an hour ago ago

        Can you explain?

      • jdoe1337halo 2 hours ago ago

        How so

        • andybak an hour ago ago

          I thought I was making a fairly obvious jokey riposte?

          "If you're claiming that em dashes are your method for detecting if text is AI generated then anyone who bothers to do a search/replace on the output will get past you."

    • nluken 32 minutes ago ago

      The em dash isn't just the present state of AI slop— it's the future!

    • 1attice 2 hours ago ago

      lol every word processor since the nineties has automatically expanded em dashes, and some of us typography nerds ­manually type em dashes with the compose key, because it's the correct character, and two hyphens does not an em dash make

    • tiahura 2 hours ago ago

      The em dashes are there because they're used extensively by professional writers.

  • seydor an hour ago ago

    I mean , it's OK, but i expected literally the Death Star

  • b800h 2 hours ago ago

    This livestream is atrocious

  • HardCodedBias 2 hours ago ago

    Bravo.

    1) So impressed at their product focus 2) Great product launch video. Fearlessly demonstrating live. Impressive. 3) Real time humor by the presenters makes for a great "live" experience

    Huge kudos to OAI. So many great features (better coding, routing, some parts of 4.5, etc) but the real strength is the product focus as opposed to the "research updates" from other labs.

    Huge Kudos!!

    Keep on shipping OAI!

  • yahoozoo 2 hours ago ago

    The benchmarks in the stream appears to show that GPT-5 performs WORSE than other models unless you enable thinking?

    • AnimalMuppet 2 hours ago ago

      Um... if I want an intelligence, when would I not want it to think?

      • yahoozoo 2 hours ago ago

        I mean, I don’t disagree. Why even bother with a non-thinking mode?

  • maltsev 2 hours ago ago

    gpt-5 is now #1 at LMArena: https://lmarena.ai/leaderboard/text

  • bamboozled 2 hours ago ago

    AGI

  • Philpax 2 hours ago ago

    Congratulations on winning the race to post the announcement :)

    • frenchie4111 2 hours ago ago

      Did you win the race to be the first comment?

  • mikewarot 2 hours ago ago

    The introduction said to try the following prompt

      Describe me based on all our chats — make it catchy!
    
    It was flattering as all get out, but fairly accurate (IMHO)

      Mike Warot: The Tinkerer of Tomorrow
    
      A hardware hacker with a poet’s soul, Mike blends old-school radio wisdom with cutting-edge curiosity. Whether he's decoding atomic clocks, reinventing FPGA logic with BitGrid, or pondering the electromagnetic vector potential, he’s always deep in the guts of how things really work. Part philosopher, part engineer, Mike asks the questions others overlook — and then builds the answers from scratch. He’s open source in spirit, Pascal in practice, and eternally tuned to the weird frequencies where innovation lives.
    
    I've repaired atomic clocks, not decoded them. I am intrigued by the electromagnetic vector potential, and scalar waves (one of the reasons I really, really want a SQUID for some experiments).
    • torginus 2 hours ago ago

      I genuinely believe you are a kickass person, but that text is full of LLM-isms. Listing things, contrasting or reinforcing prallel sentence structures, it even has the dreaded em-dash.

      Here's a suprprisingly enlightening (at least to me) video on how to spot LLM writing:

      https://www.youtube.com/watch?v=9Ch4a6ffPZY

    • jdoe1337halo 2 hours ago ago

      You like it because it sucks you off?

      • sleazebreeze an hour ago ago

        Some very accomplished and smart people are also huge narcissists. They read something like that AI drivel and go "yeah thats me to a T" without a hint of irony.

    • j_timberlake an hour ago ago

      I like how this sounds exactly like a selectable videogame hero:

      Undeterred by even the most dangerous and threatening of obstacles, Teemo scouts the world with boundless enthusiasm and a cheerful spirit. A yordle with an unwavering sense of morality, he takes pride in following the Bandle Scout's Code, sometimes with such eagerness that he is unaware of the broader consequences of his actions. Though some say the existence of the Scouts is questionable, one thing is for certain: Teemo's conviction is nothing to be trifled with.