Nvidia DGX Spark

(nvidia.com)

133 points | by janandonly 3 days ago ago

133 comments

qwertox 20 minutes ago ago
According to Wendell from Level1Techs, the now-launched Jetson Thor uses a Linux Kernel built by Nvidia, on Ubuntu 20.04 [0]. So I assume getting upgrades will have the same feel as those Chinese SBC's like from Radxa or cheap Android devices.
I wonder if this also applies to this DGX Spark. I hope not.
[0] https://www.youtube.com/watch?v=cgnKUUcCKcs&t=669s
hereme888 9 hours ago ago
FP4-sparse (TFLOPS) | Price | $/TF4s
5090: 3352 | 1999 | 0.60
Thor: 2070 | 3499 | 1.69
Spark: 1000 | 3999 | 4.00
____________
FP8-dense (TFLOPS) | Price | $/TF8d (4090s have no FP4)
4090 : 661 | 1599 | 2.42
4090 Laptop: 343 | vary | -
____________
Geekbench 6 (compute score) | Price | $/100k
4090: 317800 | 1599 | 503
5090: 387800 | 1999 | 516
M4 Max: 180700 | 1999 | 1106
M3 Ultra: 259700 | 3999 | 1540
____________
Apple NPU TOPS (not GPU-comparable)
M4 Max: 38
M3 Ultra: 36
[-]
- nabla9 2 hours ago ago
  Memory is the bottleneck. It limits the size of the models you can run and what you pay for.
```
  Spark: 128 GB LPDDR5x, unified system memory
  5090 :  32 GB GDDR7,
```
  Model sizes (parameter size)
```
  Spark: 200B 
  5090 :  12B (raw)
```
  [-]
  - artemisart 18 minutes ago ago
    That's very true and what's segmenting the market, but I don't understand why you're saying the 5090 supports only 12B model when it can go up to 50-60B (= a bit less than 64B to leave room for inference) as it supports FP4 as well.
- Y_Y 8 hours ago ago
  You are doing god's work.
  In fact you're also doing the work Nvidia should have done when they put together their (imho) ridiculously imprecise spec sheet.
- aurareturn 9 hours ago ago
  It's not good value when you put it like that. It doesn't have a lot of compute and bandwidth. What it has is the ability to run DGX software for CUDA devs I guess. Not a great inference machine either.
- canucker2016 7 hours ago ago
  5090: 32GB RAM (newegg & amazon lowest price seems to be +300 more)
  4090: 24GB RAM
  Thor & Spark: 128GB RAM (probably at least 96GB usable by the GPU if they behave similar to the AMD Strix Halo APU)
  [-]
  - oliwary 2 hours ago ago
    True... It would be very interesting to make a comparison of various open models based on token generation speed on these platforms. Presumably starting st some size the larger accessible RAM wins out over raw speed but low VRAM? Although I suppose things like MoE and FP would also matter.
- boulos 2 hours ago ago
  As long as you're going to add FP8 dense, you could do the same for the parts mentioned in the FP4 section. Divide by two from dense => sparse, and another two for FP4 => FP8.
  That gives you 250 tops of fp8 for Spark.
- eurekin an hour ago ago
  So how many generation T/s we can expect for a dense model?
  I assume we can go up to 120B using fp8?
- bjackman 3 hours ago ago
  Note you cannot actually get a 5090 for $1999 that's just the RRP. I believe they actually cost $4k
  [-]
  - IshKebab 2 hours ago ago
    I just googled it and the first result was one in stock for £2200. That's including tax. I assume $1999 is excluding tax. Without tax and converted to dollars it's $2470.
    From other less reliable sources like eBay they are more like £1800.
- conradev 8 hours ago ago
  where does an RTX Pro 6000 Blackwell fall in this? I feel like that’s the next step up in performance (and about the same price as two Sparks)
  [-]
  - qingcharles 7 hours ago ago
    I thought the 6000 was slightly lower throughput than 5090, but obviously has a shitload more RAM.
    [-]
    - skhameneh 6 hours ago ago
      It's more throughput, but way less value and there's still no NVLink on the 6000. Something like ~4x the price, ~20% more performance, 3x the VRAM.
      There's two models that go by 6000, the RTX Pro 6000 (Blackwell) is the one that's currently relevant.
      [-]
      - QQ00 2 hours ago ago
        the RTX Pro 6000 (Blackwell) does not have NVlink? if so, what the fuck Mr.leather jacket.
        [-]
        kouteiheika an hour ago ago
        Of course it doesn't; artificial segmentation because they really want you to buy their even more expensive datacenter GPUs for AI training.
- scosman 8 hours ago ago
  How does the process management comparison work for GPU vs full systems?
- nodesocket 7 hours ago ago
  Once the updated Mac Studio with M4/M5 Ultra comes out, pretty much going to make the DGX irrelevant right?
  [-]
  - thomasskis 4 hours ago ago
    I run 4 Mac Studio ultras at work (they’re pricy when maxed out), for local-first AI dev services. But there’s a few things that make me want to switch to the Spark. Networking is the biggest one, the Macs have Thunderbolt and Ethernet, but if I run distributed inference with EXO over Thunderbolt; the drop in tokens/second is massive. These Sparks get RDMA and can stack nicely. The other big one is access to CUDA, MLX has come a long way but being able to have CUDA and GPU access in containers would simplify the stack so nicely. If I had a USB-C/Thunderbolt backplane it might compare, but scaling with the Spark is likely a lot more straightforward.
    I call the stack with Mac Studios “MacAIver” because it feels like a duct tape solution, but the Spark equivalent would likely be more elegant.
    [-]
    - aurareturn 2 hours ago ago
      You'd have to stack 16 of these to get 2TB of VRAM, equivalent to 4 Mac Studios 512GBs chained together.
      16 compared to 4. Surely even much faster networking in the Spark would degrade with that many devices?
      Biggest problem with Macs is that they don't have dedicated tensor cores in the GPU which makes prompt processing very slow compared to Nvidia and AMD.
      [-]
      - themgt an hour ago ago
        n.b. there's been a little speculation that Apple adding TensorOps to Metal 4 suggests M5/M6 may get tensor cores.
        https://x.com/liuliu/status/1932158994698932505
        https://developer.apple.com/metal/Metal-Shading-Language-Spe...
  - wmf 7 hours ago ago
    Ultras are pretty expensive.
    [-]
    - nodesocket 7 hours ago ago
      I mean the spark is $3,999 and current M3 Max 28-Core CPU 60-Core GPU is the same price. I would expect the refreshed studio will stay around the same price.
      [-]
      - KingOfCoders 6 hours ago ago
        In Germany the 96gb version is 5000 EUR and the 256gb version is 7000 EUR (no 128gb available as far as I can see).
        [-]
        stefanfisk an hour ago ago
        Are you comparing prices with or without taxes? US usually prices without and EU with.
        spwa4 an hour ago ago
        At that point it's far superior to fly to the US, buy it, and fly back. Hell, have a nice week in a hotel and bring two.
  - FirmwareBurner 4 hours ago ago
    If that would be true why aren't Mac sales banned in China instead of Nvidia GPUs?
    [-]
    - saagarjha 3 hours ago ago
      Those are high end GPUs that aren’t comparable
    - nevi-me 3 hours ago ago
      Because Tim bribed Trump with a golden calf, or more seriously it's easier to ban a component and its manufacturer vs broader systems.
      [-]
      - nodesocket an hour ago ago
        funny to hear the tards that haven’t listened to any of the policy screeches pretend like they know what’s going on.
      - actionfromafar 2 hours ago ago
        Not unprecedented though, Playstation 2 had export restrictions.
        But it was a different time. Most policies had some connection to the subject at hand.
        Policies today are all about brand Trump and brand MAGA.
syntaxing 9 hours ago ago
While a completely different price point, I have a Jetson Orin Nano. Some people forget the kernels are more or less set in stone for product like these. I could rebuild my own Jetpack kernel but it’s not that straight forward to update something like CUDA or any other module. Unless you’re a business where your product relies on this hardware, I find it hard to buy this for consumer applications.
[-]
- coredog64 8 hours ago ago
  Came in here to say the same thing. Have bought 3 Nvidia dev boards and never again as you quickly get left behind. You're then stuck compiling everything from scratch.
- larodi 6 hours ago ago
  My experience with Jetson Nano was that it had to have its Ubuntu debloatred first (with 3rd party script) before we could get their NN something library to run the image recognition, designated to run on this device.
  These seem to be highly experimental boards, even though are super powerful for their form factor.
cherioo 9 hours ago ago
The mainstream options seem to be
Ryzen AI Max 395+, ~120 tops (fp8?), 128GB RAM, $1999
Nvidia DGX Spark, ~1000 tops fp4, 128GB RAM, $3999
Mac Studio max spec, ~120 tflops (fp16?), 512GB RAM, 3x bandwidth, $9499
DGX Spark appears to potentially offer the most token per second, but less useful/value as everyday pc.
[-]
- lhl 3 hours ago ago
  RDNA3 CUs do not have FP8 support and its INT8 runs at the same speed as FP16 so Strix Halo's max theoretical is basically 60 TFLOPS no matter how you slice it (well it has double INT4, but I'm unclear on how generally useful that is):
```
    512 ops/clock/CU * 40 CU * 2.9e9 clock / 1e12 = 59.392 FP16 TFLOPS
```
  Note, even with all my latest manual compilation whistles and the latest TheRock ROCm builds the best I've gotten mamf-finder up to about 35 TFLOPS, which is still not amazing efficiency (most Nvidia cards are at 70-80%), although a huge improvement over the single-digit TFLOPS you might get ootb.
  If you're not training, your inference speed will largely be limited by available memory bandwidth, so the Spark token generation will be about the same as the 395.
  On general utility, I will say that the 16 Zen5 cores are impressive. It beats my 24C EPYC 9274F in single and multithreaded workloads by about 25%.
- UncleOxidant 6 hours ago ago
  > Ryzen AI Max 395+, ~120 tops (fp8?), 128GB RAM, $1999
  Just got my Framework PC last week. It's easy to setup to run LLMs locally - you have to use Fedora 42, though, because it has the latest drivers. It was super easy to get qwen3-coder-30b (8 bit quant) running in LMStudio at 36 tok/sec.
  [-]
  - alias_neo an hour ago ago
    I'm pretty new to this, so if I wanted to benchmark my current hardware and compare to your results what would be the best way to do that?
    I'm looking at going for a Framework Desktop and would like to know what kind of performance gain I'd get over the current hardware I have, which so far I have a "feel" for the performance of from running Ollama and OpenWebUI, but no hard numbers.
  - hasperdi 4 hours ago ago
    Hi could you share if you get a decent coding performance (quality wise) with this setup? IE. Is it good enough to replace say Claude Code?
  - pixelpoet 3 hours ago ago
    Very encouraging result, I'm waiting super anxiously for mine! How much memory did you allocate for the iGPU?
- jauntywundrkind 9 hours ago ago
  NVidia Spark is $4000. Or, will be, supposedly whenever it comes out.
  Also notably, Strix Halo and DGX Spark are both ~275GBps memory bandwidth. Not always but in many machine learning cases it feels like that's going to be the limiting factor.
- rjzzleep 5 hours ago ago
  Maybe the real value of the DGX spark is to work on Switch 2 emulation. ARM + Nvidia GPU. Start with Switch 2 emulation on this machine and then optimize for others. (Yeah, I know, kind of expensive toy).
  [-]
  - pta2002 4 hours ago ago
    I think you can get something a lot cheaper if that’s all you want, e.g. something in the Jetson Orin line. That’s more similar to the switch, also, since it’s a Tegra CPU.
- aurareturn 9 hours ago ago
```
  Mac Studio max spec, ~120 tflops (fp16?), 384GB RAM, 3x bandwidth, $9499
```
  512GB.
  DGX has 256GB/s bandwidth so it wouldn't offer the most tokens/s.
  [-]
  - rz2k 9 hours ago ago
    Perhaps they are referring to default GPU allocation that is 75% of the unified memory, but it is trivial to increase it.
    [-]
    - jauntywundrkind 9 hours ago ago
      The GPU memory allocation refers to how capacity is alloted, not bandwidth. Sounds like the same 256-bit/quad-channel 8000MHz lpddr5 you can get today with Strix Halo.
      [-]
      - rz2k 8 hours ago ago
        384GB is 75% of 512GB. The M3 Ultra bandwidth is over 800GB/s, though potentially less in practice.
        Using an M3 Ultra I think the performance is pretty remarkable for inference and concerns about prompt processing being slow in particular are greatly exaggerated.
        Maybe the advantage of the DGX Spark will be for training or fine tuning.
  - echelon 9 hours ago ago
    tokens/s/$ then.
nightski 10 hours ago ago
Am I missing something or does the comparably priced (technically cheaper) Jetson Thor have double the PFLOPs of the Spark with the same memory capacity and similar bandwidth?
[-]
- Apes 9 hours ago ago
  My understanding is the DGX Spark is optimized for training / fine tuning and the Jetson Thor is optimized for running inference.
  Architecturally, the DGX Spark has a far better cache setup to feed the GPU, and offers NVLINK support.
  [-]
  - AlotOfReading 9 hours ago ago
    There's a lot of segmentation going on in the Blackwell generation from what I'm told.
- modeless 10 hours ago ago
  Also Thor is actually getting sent out to robotics companies already. Did anyone outside Nvidia get a DGX Spark yet?
seanalltogether an hour ago ago
Do we need a new term to describe "unified memory" where the cpu and gpu are still isolated from each other and memory needs to be allocated for one or the other and "unified memory" where the cpu and gpu can both access the same addresses. Which systems use which?
KingOfCoders 6 hours ago ago
I think it depends on your model size
```
   Fits into 32gb: 5090
   Fits into 64gb - 96gb: Mac Studio
   Fits into 128gb: for now 395+ $/token/s, 
     Mac Studio if you don't care about $ 
     but don't have unlimited money for Hxxx
```
This could be great for models that fit 128gb and you want best $/token/s (if it is faster than a 395+).
[-]
- timc3 4 hours ago ago
  The 395 although it can be supplied with 128GB can’t use all that for VRAM (unless something has changed in the last couple of weeks).
  [-]
  - lhl 3 hours ago ago
    In Linux, you can set it as high as you want, although you should probably have a swap drive and still be prepared for you system to die if you set it to 128GiB. Here's how you'd set it to 120GiB:
```
    # This is deprecated, but can still be referenced
    options amdgpu gttsize=122800

    # This specifies GTT by # of 4KB pages:
    #   31457280 * 4KB / 1024 / 1024 = 120 GiB
    options ttm pages_limit=31457280
```
  - KingOfCoders 2 hours ago ago
    From YouTube it seems up to 105gb Models disksize work, yes.
querez 3 hours ago ago
"developers can prototype, fine-tune, and inference [AI models]"...
shouldn't it be infer?
[-]
- myrmidon 2 hours ago ago
  It should be "run inference on" in my opinion, and would be best shortened IMO to just "prototype, fine-tune, and run".
  I'argue that "inference" has taken on a somewhat distinct new meaning in an LLM-context (loosely: running actual tokens through the model) and deviating from the base term to the verb form would make the sentence less clear to me.
- killerstorm 2 hours ago ago
  No. It's quite common for technical slang to deviate from general vocabulary.
  Cf. "compute" is a verb for normal people, but for techies it is also "hardware resources used to compute things".
- nharada 3 hours ago ago
  I don’t think either of those are right…
- globular-toast 3 hours ago ago
  Perhaps "infer from"? I was also taken aback by how they just decided to make "inference" a verb, though. A decent writer would have rewritten the sentence to make it work, similar to how a software implementation sometimes just doesn't work out. But apparently that's too much to ask from Nvidia marketing.
  Funnily enough things like this show that a human probably was involved in the writing. I doubt an LLM would have produce that. I've often thought about how future generations are going to signal that they are human and maybe the way will be human language changing much more rapidly than it has done, maybe even mid sentence.
apples_oranges 3 hours ago ago
Question from a random consumer: Why not more RAM?
[-]
- Strom an hour ago ago
  So they can sell you the next model which upgrades the RAM capacity.
  [-]
  - nsteel an hour ago ago
    Do larger LPDDR5 chips exist yet? Isn't 32GB the max for a 32-bit package?
garyfirestorm 8 hours ago ago
What did I miss? This was revealed in May - I don’t see anything new in that link since it was revealed.
[-]
- numpad0 2 hours ago ago
  This has been getting delayed for months. Display out isn't working or something.
- wmf 7 hours ago ago
  Not much. There was a presentation yesterday but it's mostly what we already knew: https://www.servethehome.com/nvidia-outlines-gb10-soc-archit...
ComplexSystems 9 hours ago ago
The RAM bandwidth is so slow on this that you can barely train or do inference or do anything on it. I think the only use case they have in mind for this is fine tuning pretrained models.
[-]
- wmf 9 hours ago ago
  It's the same as Strix Halo and M4 Max that people are going gaga about, so either everyone is wrong or it's fine.
  [-]
  - gardnr 8 hours ago ago
    Memory Bandwidth:
    Nvidia DGX: 273 GB/s
    M4 Max: (up to) 546 GB/s
    M3 Ultra: 819 GB/s
    RTX 5090: ~1.8 TB/s
    RTX PRO 6000 Blackwell: ~1.8 TB/s
  - 7thpower 9 hours ago ago
    The other ones are not framed as an “AI Supercomputer on your desk”, but instead are framed as powerful computers that can also handle AI workloads.
  - aurareturn 9 hours ago ago
    M4 max has more than double the bandwidth.
    Strix Halo has the same and I agree it’s overrated.
    [-]
    - Rohansi 8 hours ago ago
      I would expect/hope that DGX would be able to make better use of its bandwidth than the M4 Max. Will need to wait and see benchmarks.
      [-]
      - woooooo 4 hours ago ago
        Matrix vector multiplication for feed forward layers is most of the bandwidth as I understand things, there's not really a way to do it "better", its just a bunch of memory-bound dot products.
        (Posting this comment in hopes of being corrected and learning something).
        [-]
        Rohansi 3 hours ago ago
        The problem is different parts of the SoC (CPU, GPU, NPU) may not actually be able to consume all of the bandwidth available to the system as a whole. This is why you'd need to benchmark - different chips may be able to feed the cores better than others.
      - aurareturn 4 hours ago ago
        It should. It has tensor cores which should drastically improve prompt processing. It should also be highly optimized for most AI apps.
isusmelj 4 hours ago ago
Are there any news about power consumption? I didn’t even see a tdp or so mentioned.
[-]
- lmeyerov 15 minutes ago ago
  One of the first things I looked at too...
monster_truck 5 hours ago ago
Paper launch. The people I know there who I have asked about it haven't even seen one yet
[-]
- fh973 5 hours ago ago
  Ordered one in spring. Delivery time was pushed from July to September. Apparently they had a bug in the HDMI output.
  [-]
  - wtallis 4 hours ago ago
    That's eerily similar to what happened to Qualcomm's failed Snapdragon X Elite dev kit. That one eventually shipped in small quantities with a Type-C to HDMI dongle in the box to make up for the built-in HDMI port going missing. Then Qualcomm cancelled the whole project and refunded everyone, including people who had already received their hardware.
    [-]
    - _zoltan_ 4 hours ago ago
      because they realized it sucked.
numpad0 5 hours ago ago
Do anyone know why official pages don't mention FP16 performance(250 TFLOPS)?
maz1b 7 hours ago ago
Dunno, doesn't seem that good to me. Granted, I recognize the pace of advancement, but fwiw at present time.. yeah.
I'd rather just get an M3 Ultra. Have an M2 Ultra on the desk, and an M3 Ultra sitting on the desk waiting to be opened. Might need to sell it and shell out the cash for the max ram option. Pricey, but seems worthwhile.
dirtyhand 9 hours ago ago
I was considering getting an RTX 5090 to run inference on some LLM models, but now I’m wondering if it’s worth paying an extra $2K for this option instead
[-]
- apitman 8 hours ago ago
  If you want to run small models fast get the 5090. If you want to run large models slow get the Spark. If you want to run small models slow get a used MI50. If you want to run large models fast get a lot more money.
- Apes 9 hours ago ago
  RTX 5090 is about as good as it gets for home use. Its inference speeds are extremely fast.
  The limiting factor is going to be the VRAM on the 5090, but nvidia intentionally makes trying to break the 32GB barrier extremely painful - they want companies to buy their $20,000 GPUs to run inference for larger models.
- skhameneh 8 hours ago ago
  RTX 5090 for running smaller models.
  Then the RTX Pro 6000 for running a little bit larger models (96gb VRAM, but only ~15-20% more perf than 5090).
  Some suggest Apple Silicon only for running larger models on a budget because of the unified memory, but the performance won't compare.
- BoorishBears 9 hours ago ago
  No. These are practically useless for AI.
  Their prompt processing speeds are absolutely abysmal: if you're trying to tinker from time to time, a GPU like a 5090 or renting GPUs is a much better option.
  If you're just trying to prep for impending mainstream AI applications, few will be targeting this form factor: it's both too strong compared to mainstream hardware, and way too weak compared to dedicated AI-focused accelerators.
  -
  I'll admit I'm taking a less nuanced take than some would prefer, but I'm also trying to be direct: this is not ever going to be a better option than a 5090.
  [-]
  - aurareturn 9 hours ago ago
```
  Their prompt processing speeds are absolutely abysmal
```
    They are not. This is Blackwell with Tensor cores. Bandwidth is the problem here.
    [-]
    - BoorishBears 9 hours ago ago
      They're abysmal compared to anything dedicated at any reasonable batch size because of both bandwidth and compute, not sure why you're wording this like it disagrees with what I said.
      I've run inference workloads on a GH200 which is an entire H100 attached to an ARM processor and the moment offloading is involved speeds tank to Mac Mini-like speeds, which is similarly mostly a toy when it comes to AI.
      [-]
      - aurareturn 9 hours ago ago
        Again, prompt processing isn't the major problem here. It's bandwidth. 256GB/s bandwidth (maybe ~210 in real world) limits the tokens per second well before prompt processing.
        Not entirely sure how your ARM statement matters here. This is unified memory.
wewewedxfgdf 9 hours ago ago
It'll be stunted in some way - Nvidia always holds back some crucial feature that you need, to push you up to the next highest priced product line.
[-]
- agnokapathetic 9 hours ago ago
  it uses LPDDR5x instead of the datacenter variant’s HBM3e.
maddynator 7 hours ago ago
So Raspberry Pi With GPU?
[-]
- _zoltan_ 3 hours ago ago
  the GH/GB line is also based around Arm. the CPU here doesn't matter.
sorrythanks 8 hours ago ago
NVIDIA DGX Spark - 4TB
$3,999
MBCook 9 hours ago ago
I’m not in this space, so I don’t know what’s normal, but I guess I’m a little surprised to see only 10 gig Ethernet for high speed connectivity.
Yeah, it’s miles better than WiFi. But if there was something I’d think maybe benefit from Thunderbolt this would’ve been it.
The ability to transfer large models or datasets that way just seems like it would be much faster and a real win for some customers.
[-]
- coder543 9 hours ago ago
  This thing has a ConnectX-7, which gives it 2 x 200 Gbps networking. The 10 gig port is far from the fastest network interface on the Spark.
  [-]
  - MBCook 7 hours ago ago
    But can you hook that up to a normal PC?
    [-]
    - coder543 7 hours ago ago
      You were complaining about speed. Yes, a PC can have the same ports, and then you get much faster speeds than Thunderbolt can provide.
      Why would you ever want a DGX Spark to talk to a “normal PC” at 40+ Gbps speeds anyways? The normal PC has nothing that interesting to share with it.
      But, yes, the DGX Spark does have four USB4 ports which support 40Gbps each, the same as Thunderbolt 4. I still don’t see any use case for connecting one of those to a normal PC.
    - renewiltord 5 hours ago ago
      Yes. Just buy the Mellanox card. We had a bunch of ConnectX 5 hooked up through SFP. Needs cooling but fast.
    - _zoltan_ 3 hours ago ago
      why criticize something in the first place when you clearly have not even looked at the product?
- x2tyfi 9 hours ago ago
  You’re almost always going to bottleneck on your home internet or upstream ISP, rather than this local interface. That being said, you aren’t going to be waiting too long either way, depending on download speed. Deepseek R1 is 671GB. Multiply by 8 to get into bits: 5368Gb At full 10gbps (which, again, you probably won’t get): 5368Gb / 10gbps = 537 seconds to download 537s / 60 = 8.95 minutes. Call it 10m with overhead.
  [-]
  - _zoltan_ 3 hours ago ago
    I have 25 Gbps symmetric ethernet from my ISP (not XGPON). they are talking about rolling out 100 Gbps.
  - rightisleft 6 hours ago ago
    I think I interviewed you the other day and you didn’t get the job…
    [-]
    - wiredpancake 6 hours ago ago
      What?
eadwu 8 hours ago ago
Most people are missing the point. LLMs are not the be all end all of AI.
Even if you were to say memory bandwidth was the problem, there is no consumer grade GPU that can run any SoTA LLM, no matter what you'd have to settle for a more mediocre model.
Outside of LLMs, 256 GB/s is not as much of an issue and many people have dealt with less bandwidth for real world use cases.
[-]
- gardnr 7 hours ago ago
  What other use cases would use 128GB VRAM but not require higher throughput to run at acceptable speeds?
  [-]
  - AuryGlenz 4 hours ago ago
    Fine tuning text to image/video models perhaps?
    For the newest models unless you quantize the crap out of them, even with a 5090 you’re going to be swapping blocks, which slows things down anyways. At least you’d be able to train on them at full precision with a decent batch size.
    That said, I can’t imagine there’s enough of a market there to make it worth it.
DrNosferatu 7 hours ago ago
Now we need a threeway benchmark between this DGX Spark, a maxed out AMD Strix* and the Mac 512GB.
DoctorOetker 10 hours ago ago
suppose 1/3rd of memory is used to host a teacher network, and 2/3rds of memory is used to host a student network, how long would knowledge distillation typically take?
lvl155 10 hours ago ago
Is this worth getting vs AMD?
[-]
- zxexz 10 hours ago ago
  What are you trying to do?
Y_Y 9 hours ago ago
It's a bit disingenuous to claim 1 PFLOPs without making clear that's for FP4 (with "structured sparsity"?)
[-]
- csunoser 9 hours ago ago
  It does say `Experience up to 1 petaFLOP of AI performance at FP4 precision with the NVIDIA Grace Blackwell architecture.` in the features section.
  But yeah, this should have been further up.
- godelski 9 hours ago ago
  If you scroll down a little and see the chip icon, where it says "NVIDIA GB10 Superchip " it also says "Experience up to 1 petaFLOP of AI performance at FP4 precision with the NVIDIA Grace Blackwell architecture."
  Further down, in the exploded view it says "Blackwell GPU 1PetaFLOP FP4 AI Compute"
  Then further down in the spec chart they get less specific again with "Tensor Performance^1 1 PFLOP" and "^1" says "1 Theoretical FP4 TOPS using the sparsity feature."
  Also, if you click "Reserve Now" the second line below that redundant "Reserve Now" button says "1 PFLOPS of FP4 AI performance"
  I mean I'll give you that they could be more clear and that it's not cool to just hype up on FP4 performance, but they aren't exactly hiding the context like they did during GTC. I wouldn't call this "disingenuous"
  [-]
  - Y_Y 8 hours ago ago
    Even if that "sparsity feature" is that two or of every four adjacent values in your areay be zeros, and that performance halves if not doing this?
    I think lots of children are going to be very disappointed running their blas benchmarks on Christmas morning and seeing barely tens of teraflops.
    (For reference see how the still optimistic numbers are for the H200 when you use realistic datatypes.
    https://nvdam.widen.net/s/nb5zzzsjdf/hpc-datasheet-sc23-h200... )
ls612 9 hours ago ago
Is this the $3500 one?
[-]
- gardnr 7 hours ago ago
  There are cheaper ASUS and MSI versions "coming soon" with the same chip and less storage / memory.
  [-]
  - canucker2016 7 hours ago ago
    on the reserve now page for USA, there's:
```
  ASUS Ascent GX10 - 1TB $2,999

  MSI EdgeXpert MS-C931 - 4TB $3,999
```
    the 1TB/4TB seems to be the size of the included NVMe SSD.
    the reserve now page also lists
```
  NVIDIA DGX Spark Bundle
  2 NVIDIA DGX Spark Units - 4TB with Connecting Cable $8,049
```
    The DGX Spark specs lists an NVIDIA ConnectX-7 Smart NIC which is rated at 200Gbe to connect to another DGX Spark, for about double the amount of memory for models.
- x2tyfi 9 hours ago ago
  That was their Digits box.
  [-]
  - jrgifford 9 hours ago ago
    Digits is no more, it’s DGX. Source: signup link in the press release (https://nvidianews.nvidia.com/news/nvidia-puts-grace-blackwe...) now goes to DGX Spark preorder (https://www.nvidia.com/en-us/products/workstations/dgx-spark...)
  - wmf 9 hours ago ago
    It's the same thing. They renamed it from Digits to Spark.
senectus1 9 hours ago ago
Power consumption : TBD
?? this seems more than a little disingenuous...
[-]
- canucker2016 7 hours ago ago
  from https://www.servethehome.com/this-is-the-asus-ascent-gx10-a-...
```
  ASUS and NVIDIA told us that their GB10 platforms are expected to use up to 170W.
```
  [edit] the PSU is 240W so that'd place an upper limit on power draw, unless they upgrade it.
oracel 10 hours ago ago
Can it run Crysis?
[-]
- mynegation 10 hours ago ago
  It’s gonna run Crysis for my wallet alright
- brookst 10 hours ago ago
  It can write Crysis.
- TiredOfLife an hour ago ago
  If raspberry pi can then this probably also can.
  https://www.jeffgeerling.com/blog/2024/amd-radeon-pro-w7700-...
- bigyabai 10 hours ago ago
  Bit of a non-sequitur, but it seems like it's been playable via box86 for quite a while now: https://youtu.be/NcBJG3z8kF0