• @[email protected]
    link
    fedilink
    English
    20
    edit-2
    3 days ago

    The Nvidia GPUs in data centers are separate (and even on separate nodes than, with different memory chips than) gaming GPUs. The sole exception is the 4090/5090 which do see some use in data center forms, but at low volumes. And this problem is pretty much nonexistent for AMD.

    …No, it’s just straight up price gouging and anti competitiveness. It’s just Nvidia being Nvidia, AMD being anticompetitive too (their CEOs are like cousins twice removed), and Intel unfortunately not getting traction, even though Battlemage is excellent.

    For local AI, the only thing that gets sucked up are 3060s, 3090s, and for the rich/desperate, 4090s/5090s, with anything else being a waste of money with too little VRAM. And this is a pretty small niche.

    • RejZoR
      link
      fedilink
      English
      403 days ago

      Chip fabbing allocations are limited and what chips for Ai datacenters takeup, the desktop GPUs don’t get made. And what’s left of it are desktop chips sold for workstation Ai models like the RTX 5090 and even RX 7900 XTX because they have more memory. Meanwhile they still sell 8GB cards to gamers when it hasn’t been enough for a while. Whole situation is just absurd.

      • @[email protected]
        link
        fedilink
        English
        63 days ago

        Fabbing is limited to keep prices high. Just like OPEC turning down oil extraction when the price gets too low.

      • @[email protected]
        link
        fedilink
        English
        33 days ago

        Unfortunately, no one is buying a 7900 XTX for AI, mostly not a 5090 either. The 5090 didn’t even work till recently and still doesn’t work with many projects, doubly so for the 7900 XTX.

        The fab capacity thing is an issue, but not as much as you’d think since the process nodes are different.

        Again, I am trying to emphasize, a lot of this is just Nvidia being greedy as shit. They are skimping on VRAM/busses and gouging gamers because they can.

    • @[email protected]
      link
      fedilink
      English
      83 days ago

      I’m pretty sure the fabs making the chips for datacenter cards could be making more consumer grade cards but those are less profitable. And since fabs aren’t infinite the price of datacenter cards is still going to affect consumer ones.

      • @[email protected]
        link
        fedilink
        English
        33 days ago

        Heh, especially for this generation I suppose. Even the Arc B580 is on TSMC and overpriced/OOS everywhere.

        It’s kinda their own stupid fault too. They could’ve uses Samsung or Intel, and a bigger slower die for each SKU, but didn’t.

        • @[email protected]
          link
          fedilink
          English
          12 days ago

          TSMC is the only proven fab at this point. Samsung is lagging and current emerging tech isn’t meeting expectations. Intel might be back in the game with their next gen but it’s still to be proven and they aren’t scaled up to production levels yet.

          And the differences between the different fabs means that designing a chip to be made at more than one would be almost like designing entirely different chips for each fab. Not only are the gates themselves different dimensions (and require a different layout) but they also have different performance and power profiles, so even if two chips are logically the same and they could trade area efficiency for more consistent higher level layout (like think two buildings with the same footprint but different room layouts), they’d need different setups for things like buffers and repeaters. And even if they do design the same logical chip for both fabs, they’d end up being different products in the end.

          And with TSMC leading not just performance but also yields, the lower end chips might not even be cheaper to produce.

          Also, each fab requires NDAs and such and it could even be a case where signing one NDA disqualifies you from signing another, so they might require entirely different teams to do the NDA-requiring work rather than being able to have some overlap for similar work.

          Not that I disagree with your sentiment overall, it’s just a gamble. Like what if one company goes with Samsung for one SKU and their competition goes with TSMC for the competing SKU and they end up with a whole bunch of inventory that no one wants because the performance gap is bigger than the price gap making waiting for stock the no brainer choice?

          But if Intel or Samsung do catch up to TSMC in at least some of the metrics, that could change.

          • @[email protected]
            link
            fedilink
            English
            21 day ago

            Yeah you are correct, I was venting lol.

            Another factor is that fab choice design decisions were made way before the GPUs launched, when everything you said (TSMC’s lead/reliability, in particular) rang more true. Maybe Samsung or Intel could offer steep discounts for the lower performance (hence Nvidia/AMD could translate that to bigger dies), but that’s quite a fantasy I’m sure…

            It all just sucks now.

    • @[email protected]
      link
      fedilink
      English
      103 days ago

      Still have limited wafers at the fabs. The chips going to datacenters could have been consumer stuff instead. Besides they (nVidia, Apple, AMD) are all fabricated at TSMC.

      Local AI benefits from platforms with unified memory that can be expanded. Watch platforms based on AMD’s Ryzen AI MAX 300 chip or whatever they call it take off. Frameworks you can config a machine with that chip to 128 GB RAM iirc. It’s the main reason why I believe Apple’s memory upgrades cost a ton so that it isn’t a viable option financially for local AI applications.

      • @[email protected]
        link
        fedilink
        English
        1
        edit-2
        3 days ago

        The chips going to datacenters could have been consumer stuff instead.

        This is true, but again, they do use different processes. The B100 (and I think the 5090) is TSMC 4NP, while the other chips use a lesser process. Hopper (the H100) was TSMC 4N, Ada Lovelace (RTX 4000) was TSMC N4. The 3000 series/A100 was straight up split between Samsung and TSMC. The AMD 7000 was a mix of older N5/N6 due to the MCM design.

        Local AI benefits from platforms with unified memory that can be expanded.

        This is tricky because expandable memory is orthogonal to bandwidth and power efficiency. Framework (ostensibly) had to use soldered memory for their Strix Halo box because it’s literally the only way to make the traces good enough: SO-DIMMs are absolutely not fast enough, and even LPCAMM apparently isn’t there yet.

        AMD’s Ryzen AI MAX 300 chip

        Funny thing is the community is quite lukewarm to the AMD APUs due to poor software support. It works okay… if you’re a python dev that can spend hours screwing with rocm to get things fast :/ But it’s quite slow/underutilized if you just run popular frameworks like ollama or the old diffusion ones.

        It’s the main reason why I believe Apple’s memory upgrades cost a ton so that it isn’t a viable option financially for local AI applications.

        Nah, Apple’s been gouging memory way before AI was a thing. It’s their thing, and honestly it kinda backfired because it made them so unaffordable for AI.

        Also, Apple’s stuff is actually… Not great for AI anyway. The M-chips have relatively poor software support (no pytorch, MLX is barebones, leaving you stranded with GGML mostly). They don’t have much compute compared to a GPU or even an AMD APU, the NPU part is useless. Unified memory doesn’t help at all, it’s just that their stuff happens to have a ton of memory hanging off the GPU, which is useful.