AMD Ryzen AI Max 400 Gorgon Halo pushes unified memory to 192GB with Zen 5 and RDNA 3.5
At a glance:
- AMD's Ryzen AI Max 400 'Gorgon Halo' refresh adds up to 192GB of unified memory, claiming to be the first x86 client SoC capable of running 300B+ parameter LLMs.
- The lineup includes three Pro SKUs — Max+ Pro 495, Max Pro 490, and Max Pro 485 — all built on Zen 5, RDNA 3.5, and XDNA 2 NPU architectures.
- The Ryzen AI Halo box starts at $3,999 with a Ryzen AI Max+ 395, 128GB unified memory, and 2TB storage; pre-orders open in June, with partner systems arriving in Q3 2026.
A minor refresh with a major memory bump
AMD is rolling out a modest but strategically important refresh of its large SoC lineup, giving the Ryzen AI Max 300 'Strix Halo' chips a new codename: Gorgon Halo. The Ryzen AI Max 400 series reuses the same core building blocks — Zen 5 CPU cores, RDNA 3.5 GPU cores, and an XDNA 2 NPU — but adds a single headline feature: support for up to 192GB of unified memory. That's a meaningful jump over the Strix Halo generation, and AMD is positioning the update around AI workloads that simply can't fit inside smaller memory pools.
The three chips in the Pro lineup are:
- Ryzen AI Max+ Pro 495: 16 cores / 32 threads, boost to 5.2 GHz, 80 MB cache, 55 NPU TOPS, Radeon 8065S (40 CUs), up to 192 GB unified memory (160 GB usable as VRAM)
- Ryzen AI Max Pro 490: 12 cores / 24 threads, boost to 5 GHz, 76 MB cache, 50 NPU TOPS, Radeon 8050S (32 CUs), up to 192 GB unified memory (160 GB usable as VRAM)
- Ryzen AI Max Pro 485: 8 cores / 16 threads, boost to 5 GHz, 40 MB cache, 50 NPU TOPS, Radeon 8050S (32 CUs), up to 192 GB unified memory (160 GB usable as VRAM)
The flagship 495 gets a 100 MHz clock bump over the outgoing 395, pushing its boost frequency to 5.2 GHz. Otherwise, the spec sheets are remarkably similar if you simply swap the "4" for a "3" in the model number.
Why 192GB unified memory matters
AMD says that up to 160GB of the unified memory pool can function as VRAM, with 32GB reserved for the system. That configuration makes the Ryzen AI Max 400 chips, the company claims, the first x86 client processors capable of running a 300B+ parameter large language model. It's a category of one for now — Intel doesn't produce a large SoC in this class, and Apple's offerings rely on ARM ISA rather than x86.
The memory advantage isn't just about raw capacity. Unified memory means the CPU, GPU, and NPU all share the same address space, eliminating costly data copies between discrete pools. For inference workloads running massive models locally, that architecture saves both latency and power. It also aligns the Ryzen AI Max 400 more closely with the memory-hungry demands of agentic AI frameworks, where token throughput and context window size directly drive hardware requirements.
The Ryzen AI Halo box and its positioning
The only confirmed system shipping with the new chips so far is the Ryzen AI Halo box, configured with the Ryzen AI Max+ Pro 495. Pre-orders open in June, with a starting price of $3,999. That configuration includes a Ryzen AI Max+ 395 (the existing Strix Halo chip), 128GB of unified memory, and 2TB of storage. AMD says additional configurations will be detailed closer to launch.
The Ryzen AI Halo measures just 5.9 x 5.9 x 1.7 inches, and its specs read like a mini workstation:
- Wi-Fi 7, Bluetooth 5.4, 10Gbps Ethernet
- HDMI 2.1b display output
- Three USB-C ports plus a fourth USB-C for power delivery
- Rated TDP up to 120W
AMD is comparing the Halo box to Nvidia's DGX Spark, which retails for $4,700 with 128GB unified memory, Nvidia's GB10 chip, and 4TB of storage. On Linux, AMD claims the Ryzen AI Halo delivers up to 14% higher tokens per second than the DGX Spark when running the GLM 4.7 Flash 30B model, and up to 4% higher tokens per second on Qwen 3.6 35B. The Halo also supports Windows, whereas the DGX Spark is Linux-only. AMD also benchmarks against the Mac Mini M4 Pro, showing roughly 4X scaling in AI workloads — though it concedes that a Mac Studio is a more appropriate point of comparison for the Halo's class of compute.
The 'token economy' pitch and cost of cloud vs. on-prem
AMD is leaning hard into the "token economy" narrative, arguing that running inference on-premises can save significant money compared to cloud API calls. The company estimates that a single Ryzen AI Halo box can save up to $750 per month over cloud compute, breaking even on cost after six months at a usage rate of six million tokens per day. That math lines up with real-world anecdotes: OpenClaw developer Peter Steinberger recently reported racking up $1.3 million in OpenAI API usage over 30 days across a three-person team working on an agentic AI framework.
Whether that cost-saving narrative holds depends on workload consistency. Teams running sporadic or highly variable inference jobs may not see the same break-even timeline, but for organizations with sustained, high-volume token consumption, a local box with 128–192GB of unified memory could be a compelling alternative to perpetual cloud spend.
What's still unclear
Several details remain pending. AMD hasn't shared an exact pre-order date for June, nor has it named OEM partners for Gorgon Halo systems. An AMD spokesperson said that "several OEM partners have expressed excitement" and that "systems will be announced from our partners starting in Q3 2026," but no specific manufacturers or models were disclosed.
The consumer fate of Gorgon Halo is also uncertain. All three announced SKUs carry the "Pro" tag, which AMD says indicates enterprise-grade security, manageability, and reliability features. When asked about consumer variants, AMD didn't commit either way. The Strix Halo generation was a niche product, available in only a handful of machines — the Framework Desktop, ROG Flow Z13, and GMKtec EVO-X2 — so a similarly conservative rollout for Gorgon Halo is possible.
Notably, AMD kept the GPU core count at 32 CUs for the 490 and 485, down from the 40 CUs used on the refreshed Strix Halo 385 and 390. It's possible a further refresh could bring 40 CUs to the lower-end Gorgon Halo SKUs, but AMD hasn't indicated that yet.
What to watch next
With pre-orders opening in June and partner systems expected in Q3 2026, the next few months should clarify pricing for the full Halo box lineup and reveal which OEMs plan to ship Gorgon Halo machines. Benchmarks from independent reviewers will also be critical — AMD's own token-per-second claims, while promising, need real-world validation across diverse model sizes and inference patterns. If the 300B+ parameter LLM capability holds up, the Ryzen AI Max 400 could carve out a genuine niche for on-premises, x86-based AI inference at a price point below Nvidia's DGX Spark.
FAQ
What are the three Ryzen AI Max 400 Pro SKUs and their key specs?
How does the Ryzen AI Halo box compare to Nvidia's DGX Spark?
When will the Ryzen AI Halo be available and what's the break-even timeline?
More in the feed
Prepared by the editorial stack from public data and external sources.
Original article