On June 12 local time, U.S. chip design company AMD (NASDAQ: AMD) unveiled a series of hardware, software, and solutions at its “2025 Advancing AI” event, including its next-generation GPU lineup: the Instinct MI350 series.
The Instinct MI350 series includes two GPUs and platforms—MI350X and MI355X. According to CEO Lisa Su, the MI350 series delivers a fourfold increase in AI performance and a 35-fold improvement in inference capability compared to the previous generation.
Notably, the MI355X offers better cost-effectiveness than NVIDIA’s competing GPUs. In AMD’s internal testing, eight Instinct MI355X GPUs were compared with eight NVIDIA B200 HGX GPUs using Meta’s open-source large language model LLaMA 3.1-405B to measure inference throughput in text generation. The results showed that AMD’s GPUs delivered approximately 40% more tokens per dollar spent than NVIDIA’s B200.
AMD also previewed its next-generation AI rack system, “Helios,” which will feature the upcoming Instinct MI400 series. This new generation is expected to deliver up to 10 times better performance for inference on mixture-of-experts models compared to the current generation.
Additionally, AMD announced that the Instinct MI350 series has surpassed the company’s original target to improve energy efficiency in AI training and high-performance computing (HPC) nodes by 38x over the five-year period from 2020 to 2025. Looking ahead, AMD aims to further improve efficiency by 2030: a typical AI model that currently requires 275 racks to train could be trained with less than a single rack, with power consumption reduced by 95%.
In recent years, AMD has continuously ramped up its GPU development efforts with NVIDIA as its benchmark. In 2020, AMD launched its first-generation AI accelerator, the Instinct MI100 series, based on the CDNA architecture for HPC and AI. In 2023, it released the MI300 series for AI servers, helping its data center GPU revenue surpass $400 million. In 2024, AMD introduced the Instinct MI325X and claimed it delivered 1.3 times the compute performance of NVIDIA’s then-leading H200.
AMD has also been actively building out its ecosystem. At the 2025 event, it announced the global availability of the AMD Developer Cloud, specifically designed for fast, high-performance AI development. It offers a fully managed cloud environment, developer tools, and scalable infrastructure to help users quickly launch and scale AI projects.
Several major companies joined AMD onstage to show their support. OpenAI CEO Sam Altman appeared in person for a discussion with Lisa Su and revealed that OpenAI is working closely with AMD on AI infrastructure. Meta shared that it has widely deployed AMD’s Instinct MI300X GPUs for inference with its open-source Llama 3 and Llama 4 models. Other companies, including Oracle, Saudi AI company HUMAIN, and semiconductor design firm Astera Labs, also showcased their partnerships with AMD.
During the event, discussions also touched on the impact of AI development on computing demands. While model training remains constant, the more significant recent shift has been in inference—driven by increasingly powerful large models and a growing number of emerging AI application scenarios. In addition to cutting-edge models from OpenAI and Google, there has been an explosion in the number of models from Meta, DeepSeek, and others in open-source communities, as well as specialized models in medicine, finance, programming, and scientific research. Lisa Su predicted that in the coming years, tens of thousands or even millions of models optimized for specific tasks, industries, or use cases will emerge.
“As AI takes on more complex reasoning tasks, and as agent capabilities improve, the demand for compute will rise significantly—which is a good thing for all of us,” said Su.
She explained that agent-based AI represents an entirely new kind of “user”—always online, constantly accessing data, applications, and systems, making independent decisions and executing tasks autonomously. Alaric forecasts that as agent-based AI activity scales up, it will generate a massive wave of traditional compute workloads—equivalent to adding billions of virtual users to global computing infrastructure. This will require tight coordination between GPUs and CPUs in an open ecosystem.