Nvidia unveiled Vera, its first CPU purpose-built for AI agent workloads, at GTC Taipei on June 1, per the company. The chip ships with 88 custom Arm cores on Nvidia's Olympus core design, 1.2 TB/s of memory bandwidth, and full Armv9.2 compatibility. Vera is in full production, with general availability targeted for autumn. Early adopters include OpenAI, Anthropic, SpaceX, ByteDance, CoreWeave, and Oracle Cloud Infrastructure.
The detail most coverage skipped is what "built for agents" actually means at the silicon level. Traditional server CPUs optimize for branchy code with deep cache hierarchies; the workloads they were designed around are database queries, microservices, and the orchestration glue between them. Agent workloads look different. They are bandwidth-bound rather than compute-bound, because the host CPU's job is to stream tokens between the GPU's HBM and the rest of the system fast enough to keep the accelerator off idle. The 1.2 TB/s of memory bandwidth and the wide 88-core layout are tuned for that shape, per Nvidia's technical brief, and the 1.8x throughput improvement Nvidia claims over x86 on back-end agent pipelines is the metric that matters.
The OEM lineup is the second tell. Dell, HPE, Lenovo, and Supermicro will build standalone Vera systems, and the same OEMs handle the Vera-Rubin superchip that pairs one Vera CPU with two Rubin GPUs, per Tom's Hardware. That puts Nvidia in direct competition with Intel's Xeon and AMD's Epyc lines for the AI server socket, not only the accelerator slot. The cloud-deployment list (AWS, Google Cloud, Microsoft Azure, and OCI confirmed for Vera-Rubin instances in the second half of the year) closes the loop.
Bottom Line
Vera is Nvidia's first credible attack on the data-center CPU socket. If your AI workload is bandwidth-bound and Arm-tolerant, the host CPU just stopped being a default Intel or AMD pick.