Whenever I see press on these new 'rack scale' systems, the first thing I think is something along the lines of: "man I hope the BIOS and OS's and whatnot supporting these racks are relatively robust and documented/open sourced enough so that 40 years from now when you can buy an entire rack system for $500, some kid in a garage will be able to boot and run code on these".
Depends on the residence. I have personally seen a large house in Brooklyn with dual 200 amp 120/208 volt three phase services (two meters, each feeding a panel.) I have seen someone setup an old SGI rack scale Origin 3000 systems in their garage. I think they even had an electrician upgrade their service to accommodate it.
100% this. But don't forget the garden hose running full blast so you can cool it! It's not impossible to get up and running for fun for an hour, but this isn't a run 24/7 kinda setup any more than getting an old mainframe running in one's garage is practical.
The firmware is UEFI and Vera should have good upstream support. The GPU driver is proprietary though, so you'll have to dig up the last supported version from 2036.
As a software guy who follows chip evolution more at a macro level like: new design + process node enabling better cores/tiles/units/clocks + new architecture enabling better caches, busses, I/O == better IPC, bandwidth, latency and throughput at given budget (cost, watts, heat, space) - I've yet to find anything which gives a sense of Rubin's likely lift vs the prior generation that's grounded in macro-but-concrete specs (such as cores, tiles, units, clocks, caches, busses, IPC, bandwidth, latency, throughput).
If their new platform reduces inference token cost by 10x, does that play well or not well with the recently updated GPU deprecation schedules companies have been playing with to reduce projected cost outlays?
For context, my understanding is that companies have recently moved to mark their expected GPU deprecation cycles from 3 years to as high as 6 which has huge impacts on projected expenditures.
I wonder what the step was for the Blackwell platform from the previous. Is this slower which might indicate that the slower deprecation cycle is warranted, or faster?
No way you throw away Blackwell GPUs after just 3 years. Google runs 8 year old TPUs still at 100% utilization. Why would you depreciate them in just 3 years?
The conversation around GPU lifecycles seems to be conflating the various shear rates within the data center. My layman understanding is that the old 3 year replacement cycle had more to do with some component, not necessarily the memory or the processor, going wrong for half of their units by 3 years, at which point GPUs were cheap enough and advancing faster enough that it was more cost effective to upgrade than to fix. However, that calculus changes completely when the GPU and the HBM are orders of magnitude more expensive than the rest of the system. I suspect that we will see repairs being done on on the various brittle bits of the system and the actual core expensive components will continue to operate much longer than 3 years.
Extreme Codesign Across NVIDIA Vera CPU, Rubin GPU, NVLink 6 Switch, ConnectX-9 SuperNIC, BlueField-4 DPU and Spectrum-6 Ethernet Switch Slashes Training Time and Inference Token Generation Cost
... it took a couple searches to figure out that "extreme codesign" wasn't actually code-signing, but "co-design" like "stuff that was designed to work together"
Even << "co-design" like "stuff that was designed to work together" >> sound strange to me. Typically when I read about co-design is stuff that was designed together, by more than 1 party.
Is there any American style guide that insists hyphens be avoided even when a closed compound would cause ambiguity? I follow Chicago, but I imagine other style guides also already emphasise clarity.
Mostly yes, and I prefer it that way, but it does get smashed into a single word sometimes. "co-design" I've mostly only seen hyphenated, though I don't see it often enough or in broad enough contexts to really claim anything about the frequency in a general sense.
Maybe it's caused by `codesign` tools? Like `codesign --extreme` which probably requires two signers to sign one thing?
does anyone know how well this 5x petaflop improvement translates to real world performance?
I know that memory bandwidth tends to be a big limiting factor, but I'm trying to understand how this factors into it its overall perf, compared to blackwell.
I think it is interesting. Is there any other company in a position today that could put together endorsement quotes from such high ranking people across tech?
That's because of financial links. They are so intertwined propping up the same bubble they are absolutely going to share quotes instantly. FWIW just skimmed through and the TL;DR sounds to me like "Look at the cool kid, we play together, we are cool too!" without obviously any information, anything meaningful or insightful, just boring marketing BS>
> They are so intertwined propping up the same bubble they are absolutely going to share quotes instantly.
Reading this line, I had a funny image form of some NVidia PR newbie reflexively reaching out to Lisa Su for a supporting quote and Lisa actually considering it for a few seconds. The AI bubble really has reached a level of "We must all hang together or we'll surely hang separately".
It could be an indicator that Apple is not as leveraged up on NVIDIA as to provide a quote. Cook did make a special one of a kind product for the current POTUS, so he is nothing if not pragmatic.
Quotes from known names in a boring corporate press release are absolutely standard. It gives journalists a hook to build a story. “Elon Musk says new Nvidia tech is…”
Whenever I see press on these new 'rack scale' systems, the first thing I think is something along the lines of: "man I hope the BIOS and OS's and whatnot supporting these racks are relatively robust and documented/open sourced enough so that 40 years from now when you can buy an entire rack system for $500, some kid in a garage will be able to boot and run code on these".
What's the power hookup to just boot one rack? I'd imagine that's more than you get anywhere in residential areas for a single house.
Hopefully in 40 years we'll all be running miniature cold fusion power or something, so we can avoid burning the planet to the ground.
Depends on the residence. I have personally seen a large house in Brooklyn with dual 200 amp 120/208 volt three phase services (two meters, each feeding a panel.) I have seen someone setup an old SGI rack scale Origin 3000 systems in their garage. I think they even had an electrician upgrade their service to accommodate it.
170 kW
100% this. But don't forget the garden hose running full blast so you can cool it! It's not impossible to get up and running for fun for an hour, but this isn't a run 24/7 kinda setup any more than getting an old mainframe running in one's garage is practical.
The firmware is UEFI and Vera should have good upstream support. The GPU driver is proprietary though, so you'll have to dig up the last supported version from 2036.
The blog post has more technical details and fewer quotes from customers: https://developer.nvidia.com/blog/inside-the-nvidia-rubin-pl...
That link was somewhat clearer, thanks.
As a software guy who follows chip evolution more at a macro level like: new design + process node enabling better cores/tiles/units/clocks + new architecture enabling better caches, busses, I/O == better IPC, bandwidth, latency and throughput at given budget (cost, watts, heat, space) - I've yet to find anything which gives a sense of Rubin's likely lift vs the prior generation that's grounded in macro-but-concrete specs (such as cores, tiles, units, clocks, caches, busses, IPC, bandwidth, latency, throughput).
Edit: I found something a bit closer after scrolling down on a sub-link from the page you linked (https://developer.nvidia.com/blog/inside-the-nvidia-rubin-pl...).
For dev info we'll need to wait for GTC 2026 March 16–19. CES is just hype.
They're intentionally drip-feeding information over time until the actual release.
If their new platform reduces inference token cost by 10x, does that play well or not well with the recently updated GPU deprecation schedules companies have been playing with to reduce projected cost outlays?
For context, my understanding is that companies have recently moved to mark their expected GPU deprecation cycles from 3 years to as high as 6 which has huge impacts on projected expenditures.
I wonder what the step was for the Blackwell platform from the previous. Is this slower which might indicate that the slower deprecation cycle is warranted, or faster?
No way you throw away Blackwell GPUs after just 3 years. Google runs 8 year old TPUs still at 100% utilization. Why would you depreciate them in just 3 years?
The conversation around GPU lifecycles seems to be conflating the various shear rates within the data center. My layman understanding is that the old 3 year replacement cycle had more to do with some component, not necessarily the memory or the processor, going wrong for half of their units by 3 years, at which point GPUs were cheap enough and advancing faster enough that it was more cost effective to upgrade than to fix. However, that calculus changes completely when the GPU and the HBM are orders of magnitude more expensive than the rest of the system. I suspect that we will see repairs being done on on the various brittle bits of the system and the actual core expensive components will continue to operate much longer than 3 years.
Companies are playing games with GPU depreciation.
Unsure why you were downvoted; I'm curious to understand this comment. Playing finance and accounting games I presume you mean.
Yes they are depreciating GPUs for longer than usual time periods like 6 years.
but token required for quality generation may increase as much very soon.
Yea, definitely a good point. Going to be interesting to see how it plays out. I definitely do not have the expertise to answer the question
Extreme Codesign Across NVIDIA Vera CPU, Rubin GPU, NVLink 6 Switch, ConnectX-9 SuperNIC, BlueField-4 DPU and Spectrum-6 Ethernet Switch Slashes Training Time and Inference Token Generation Cost
Technical details available here https://developer.nvidia.com/blog/inside-the-nvidia-rubin-pl...
Their own CPU, too - 88 ARM cores.
So it's an all-NVidia solution - CPU, interconnects, AI GPUs.
Afaik MediaTek helped them with the CPU part.
... it took a couple searches to figure out that "extreme codesign" wasn't actually code-signing, but "co-design" like "stuff that was designed to work together"
Even << "co-design" like "stuff that was designed to work together" >> sound strange to me. Typically when I read about co-design is stuff that was designed together, by more than 1 party.
Me too. Good style says to avoid creating words with dashes - it’s Un-American. But clarity matters more than rules.
Is there any American style guide that insists hyphens be avoided even when a closed compound would cause ambiguity? I follow Chicago, but I imagine other style guides also already emphasise clarity.
Wouldn't "code sign" be two words in English? And "code signing" rather than "code sign"?
Mostly yes, and I prefer it that way, but it does get smashed into a single word sometimes. "co-design" I've mostly only seen hyphenated, though I don't see it often enough or in broad enough contexts to really claim anything about the frequency in a general sense.
Maybe it's caused by `codesign` tools? Like `codesign --extreme` which probably requires two signers to sign one thing?
same I was so confused
does anyone know how well this 5x petaflop improvement translates to real world performance?
I know that memory bandwidth tends to be a big limiting factor, but I'm trying to understand how this factors into it its overall perf, compared to blackwell.
Rebuild all the data centers!
lol haven't even started building half the Blackwell datacenters yet
Elon's emoji-filled blurb for that press release is the most cringe things I've seen this week.
I find all the blurbs weird, do they usually include that? If not, why now? It doesn't look professional.
I think it is interesting. Is there any other company in a position today that could put together endorsement quotes from such high ranking people across tech?
Also: Tim Cook / Apple is noticeably absent.
That's because of financial links. They are so intertwined propping up the same bubble they are absolutely going to share quotes instantly. FWIW just skimmed through and the TL;DR sounds to me like "Look at the cool kid, we play together, we are cool too!" without obviously any information, anything meaningful or insightful, just boring marketing BS>
> They are so intertwined propping up the same bubble they are absolutely going to share quotes instantly.
Reading this line, I had a funny image form of some NVidia PR newbie reflexively reaching out to Lisa Su for a supporting quote and Lisa actually considering it for a few seconds. The AI bubble really has reached a level of "We must all hang together or we'll surely hang separately".
Why is that interesting?
It could be an indicator that Apple is not as leveraged up on NVIDIA as to provide a quote. Cook did make a special one of a kind product for the current POTUS, so he is nothing if not pragmatic.
Quotes from known names in a boring corporate press release are absolutely standard. It gives journalists a hook to build a story. “Elon Musk says new Nvidia tech is…”
Because standing out gets attention?
I wonder what the significance of a green heart is, in Elon-world.
Riveting.