Still have limited wafers at the fabs. The chips going to datacenters could have been consumer stuff instead. Besides they (nVidia, Apple, AMD) are all fabricated at TSMC.
Local AI benefits from platforms with unified memory that can be expanded. Watch platforms based on AMD’s Ryzen AI MAX 300 chip or whatever they call it take off. Frameworks you can config a machine with that chip to 128 GB RAM iirc. It’s the main reason why I believe Apple’s memory upgrades cost a ton so that it isn’t a viable option financially for local AI applications.
The chips going to datacenters could have been consumer stuff instead.
This is true, but again, they do use different processes. The B100 (and I think the 5090) is TSMC 4NP, while the other chips use a lesser process. Hopper (the H100) was TSMC 4N,
Ada Lovelace (RTX 4000) was TSMC N4. The 3000 series/A100 was straight up split between Samsung and TSMC. The AMD 7000 was a mix of older N5/N6 due to the MCM design.
Local AI benefits from platforms with unified memory that can be expanded.
This is tricky because expandable memory is orthogonal to bandwidth and power efficiency. Framework (ostensibly) had to use soldered memory for their Strix Halo box because it’s literally the only way to make the traces good enough: SO-DIMMs are absolutely not fast enough, and even LPCAMM apparently isn’t there yet.
AMD’s Ryzen AI MAX 300 chip
Funny thing is the community is quite lukewarm to the AMD APUs due to poor software support. It works okay… if you’re a python dev that can spend hours screwing with rocm to get things fast :/ But it’s quite slow/underutilized if you just run popular frameworks like ollama or the old diffusion ones.
It’s the main reason why I believe Apple’s memory upgrades cost a ton so that it isn’t a viable option financially for local AI applications.
Nah, Apple’s been gouging memory way before AI was a thing. It’s their thing, and honestly it kinda backfired because it made them so unaffordable for AI.
Also, Apple’s stuff is actually… Not great for AI anyway. The M-chips have relatively poor software support (no pytorch, MLX is barebones, leaving you stranded with GGML mostly). They don’t have much compute compared to a GPU or even an AMD APU, the NPU part is useless. Unified memory doesn’t help at all, it’s just that their stuff happens to have a ton of memory hanging off the GPU, which is useful.
Still have limited wafers at the fabs. The chips going to datacenters could have been consumer stuff instead. Besides they (nVidia, Apple, AMD) are all fabricated at TSMC.
Local AI benefits from platforms with unified memory that can be expanded. Watch platforms based on AMD’s Ryzen AI MAX 300 chip or whatever they call it take off. Frameworks you can config a machine with that chip to 128 GB RAM iirc. It’s the main reason why I believe Apple’s memory upgrades cost a ton so that it isn’t a viable option financially for local AI applications.
This is true, but again, they do use different processes. The B100 (and I think the 5090) is TSMC 4NP, while the other chips use a lesser process. Hopper (the H100) was TSMC 4N, Ada Lovelace (RTX 4000) was TSMC N4. The 3000 series/A100 was straight up split between Samsung and TSMC. The AMD 7000 was a mix of older N5/N6 due to the MCM design.
This is tricky because expandable memory is orthogonal to bandwidth and power efficiency. Framework (ostensibly) had to use soldered memory for their Strix Halo box because it’s literally the only way to make the traces good enough: SO-DIMMs are absolutely not fast enough, and even LPCAMM apparently isn’t there yet.
Funny thing is the community is quite lukewarm to the AMD APUs due to poor software support. It works okay… if you’re a python dev that can spend hours screwing with rocm to get things fast :/ But it’s quite slow/underutilized if you just run popular frameworks like ollama or the old diffusion ones.
Nah, Apple’s been gouging memory way before AI was a thing. It’s their thing, and honestly it kinda backfired because it made them so unaffordable for AI.
Also, Apple’s stuff is actually… Not great for AI anyway. The M-chips have relatively poor software support (no pytorch, MLX is barebones, leaving you stranded with GGML mostly). They don’t have much compute compared to a GPU or even an AMD APU, the NPU part is useless. Unified memory doesn’t help at all, it’s just that their stuff happens to have a ton of memory hanging off the GPU, which is useful.