NVIDIA GB10 Grace Blackwell gathers CPU ARM and GPU Blackwell in 3 Nm with 31 Tflops and 600 GB/s bandwidth

Nvidia took advantage of the edition of Hot Chips 2025 To put on the table one of its most ambitious developments: the GB10 Grace Blackwella compact superchip designed to concentrate a good part of the power of a data center on something that fits in a desktop work station. The idea is simple, but forceful: to bring extreme calculation capacity to the desk format, without the need for huge racks or overflowing consumption.

GB10 is a Multimatrix designwhere they live in a same encapsulated the part of CPU and the GPU part. The chip is manufactured in the 3 NM TSMC process with 2.5D packaging, a solution that allows you to climb performance without fire or energy.

CPU Grace with MediaTak stamp

The CPU part comes from the hand of MediaTak, which has collaborated with Nvidia in this architecture. In numbers, we are talking about 20 ARM V9.2 nucleidivided into two ten clusters. Each group is based on 16 MB of shared L3 cache, which adds a total of 32 MB, while each core maintains its own private L2 cache.

Memory also plays a key role: The subsystem is made up of a 256-bit LPDDR5x-9400 buscapable of managing up to 128 GB capacity with a gross bandwidth around 301 GB/s. This ensures that both CPU and GPU have sufficient flow for heavy loads without falling short in data transfer.

As for connectivity, the CPU concentrates the high speed I/Owhile storage and peripherals are channeled through PCIe. In fact, the configuration itself includes a link PCIE GEN 5 X8 for the Connectx-7 network cardenabling a multi -united scenario with a high capacity network.

Blackwell GPU in reduced format

On the graphic side, the matrix corresponds to the Blackwell familyadapted to a low consumption format and compact size. Despite that reduction, the figures are still striking: Up to 31 teraflops in FP32 and around 1000 tops When the NVFP4 reduced precision format is used, developed by the NVIDIA itself for AI loads.

The GPU also integrates an L2 cache of 24 MBwhich not only supports graphic calculations, but can also function as a visible cache level for the CPU. This generates a coherent memory hierarchy between both matrices, reducing the dependence of intermediate copies and gaining in efficiency.

The C2C link between CPU and GPU reaches an aggregate bandwidth of about 600 GB/swhich allows communication between both parties to be fluid and with low latency. All this within a package that moves around the 140 W of TDP, a figure that, although it is not modest, is surprising for the level of integration it offers.

As if that were not enough, the chip also contemplates multiple video outputs, combining Displayport in alternative mode and HDMI 2.1atogether with safety and virtualization options designed for professional loads.

DGX Spark, the first stop

Together with the announcement of the chip, Nvidia presented the DGX Spark work stationa team designed to give direct access to the DGX ecosystem from the desk. The spark executes dgx bases and the Nvidia pile locally, but also allows you to climb the work to complete DGX systems or even the cloud, depending on the needs. The reference price for this system starts around $ 3,999 (about $ 3,450).

Way to consumption: Futures N1/N1X

Beyond the professional field, the most interesting of the GB10 is what opens forward. This Superchip serves as the basis for future SOC N1/N1x of consumption, which, according to rumors, will retain the philosophy of combining CPU Grace and GPU Blackwell in a single chip.

The power shown by the GB10 suggests that a reduced configuration could be perfectly valid for consumer laptopsoffering abilities of AI and graphics that until now seemed reserved to work stations.

Nvidia GB10 Grace Blackwell is a clear example of where the industry is going: more integration, more shared bandwidth and less barriers between CPU and GPU, all in a compact format and with a tight consumption for what it offers. Hot Chips 2025 has been the stage to make it clear: the future of chips no longer distinguishes so much between desktop and data center.