Liquid Cooling for GPU Data Centers: CDU, Direct-to-Chip, and Rack-Level Integration

TL;DR

Air cooling reaches its practical limit at 40-50kW per rack. Every current-generation NVIDIA GPU platform operating at rack scale exceeds this limit. Liquid cooling is now a prerequisite for any new GPU deployment at scale.

Why Liquid Cooling Is No Longer Optional

Air cooling reaches its practical limit at approximately 40-50kW per rack. Beyond that threshold, the volume of airflow required to dissipate heat exceeds what conventional hot aisle/cold aisle containment can deliver without unacceptable noise levels, energy costs, and physical space consumption.

Every current-generation NVIDIA GPU platform operating at rack scale exceeds this limit. The GB200 NVL72 draws approximately 120kW per rack. The GB300 NVL72 matches or exceeds that figure. Even HGX-based systems using B200 or B300 GPUs in dense configurations push well past the air cooling threshold. The NVIDIA roadmap makes this trajectory clear: Vera Rubin NVL144 targets 600kW per rack by 2027.

Liquid cooling is now a prerequisite for any new GPU deployment at scale. The question is not whether to deploy liquid cooling, but which topology to use, how to integrate it with existing facility infrastructure, and how to maintain it over the operational life of the deployment.

Leviathan Systems integrates liquid cooling for GPU deployments across all current NVIDIA platforms. This guide covers the cooling architectures, components, and integration processes we use in the field.

Liquid Cooling Architectures

Direct-to-Chip (Cold Plate) Cooling

Direct-to-chip cooling places a metal cold plate directly on the GPU die (or the GPU package lid), with liquid coolant flowing through channels machined into the cold plate. Heat is transferred from the GPU through the thermal interface material (TIM) to the cold plate, then carried away by the coolant flow to a heat exchanger.

This is the primary cooling method used in NVIDIA's rack-scale systems (GB200 NVL72, GB300 NVL72) and in many HGX-based server platforms configured for liquid cooling. The cold plate approach provides the lowest thermal resistance between the heat source and the coolant, enabling the highest sustained GPU clock speeds and the longest component lifespan.

Cold plate cooling requires precise flow rate control at each GPU. Insufficient flow causes hot spots and thermal throttling. Excessive flow wastes pump energy and can cause pressure drops that affect other components in the cooling loop. The manifold design within the rack distributes coolant to each compute tray and balances flow across all cold plates.

Rear-Door Heat Exchangers (RDHX)

Rear-door heat exchangers mount on the back of a standard rack and cool the exhaust air before it enters the hot aisle. Chilled water circulates through a coil in the rear door, absorbing heat from the server exhaust air stream. The cooled air can then be recirculated, reducing the load on the room-level cooling system.

RDHX systems are used in deployments where GPU servers are air-cooled at the component level but the aggregate rack heat load exceeds the data hall's air cooling capacity. They are effective for rack densities up to approximately 60kW, depending on the RDHX capacity and water supply temperature.

RDHX has the advantage of not requiring any modifications to the server hardware — the servers exhaust air normally, and the RDHX captures the heat. This makes RDHX suitable for retrofitting existing data centers with GPU racks that exceed the original cooling design capacity.

The limitation of RDHX is that it does not reduce GPU temperatures below what air cooling achieves inside the server. The GPU fans still run at full speed, consuming significant power. For platforms that require liquid cooling at the component level (GB200, GB300), RDHX is not applicable.

Immersion Cooling

Immersion cooling submerges the entire server in a non-conductive dielectric fluid. Heat is transferred from all components to the fluid, which is then pumped through a heat exchanger to reject heat to the facility water loop. Single-phase immersion uses a fluid that remains liquid at operating temperatures. Two-phase immersion uses a fluid that boils at the component surface, providing extremely high heat transfer rates.

Immersion cooling is not the standard for current NVIDIA GPU platforms. The GB200 NVL72 and GB300 NVL72 are designed for direct-to-chip cooling in an air environment, not immersion. Some custom deployments have used immersion for HGX-based platforms, but these require significant engineering modifications and void standard warranty coverage.

Immersion cooling may become more relevant for future platforms operating above 200kW per rack, but for current GPU deployments, direct-to-chip with rack-level manifolds is the proven approach.

Key Components

Coolant Distribution Unit (CDU)

The CDU is the interface between the rack-level cooling loop and the facility water loop. It contains a heat exchanger, pumps, filters, sensors, and control electronics that regulate coolant flow, temperature, and pressure.

CDU capacity must be matched to the rack heat load with margin for thermal transients. A GB300 rack at 120kW requires a CDU capable of rejecting approximately 130-140kW (10-15% margin above nominal load). The CDU must maintain coolant supply temperature within the GPU manufacturer's specified range, typically 25-45°C depending on the platform and configuration.

CDU placement options include in-row (between racks), overhead (above the rack row), and remote (in a separate mechanical room). In-row CDUs are the most common for GPU deployments because they minimize coolant loop length and pressure drop, but they consume rack positions that could otherwise hold IT equipment.

CDU redundancy is a design decision with significant implications. N+1 CDU configurations (one backup for every N active CDUs) provide protection against single CDU failure but increase capital cost and floor space. At minimum, the facility must be capable of gracefully shutting down GPU workloads if a CDU fails, which requires integration between the CDU monitoring system and the cluster management software.

Manifold Systems

The manifold distributes coolant from the CDU to individual compute trays, GPU modules, or rack positions. In rack-scale systems like the GB300 NVL72, the manifold is an integral part of the rack design, with pre-engineered branch lines to each compute tray and NVLink switch tray.

Manifold design must ensure balanced flow distribution across all branches. Unbalanced flow results in some GPUs running hotter than others, which causes performance variation across the rack. Flow balancing valves or fixed orifices at each branch point maintain the required flow rate regardless of pressure variations in the main supply line.

All manifold connections use quick-disconnect fittings to enable tray-level servicing without draining the entire cooling loop. Quick-disconnect fittings must be dripless designs that prevent coolant spillage during disconnection. Even small coolant leaks near electronic equipment can cause corrosion and eventual hardware failure.

Leak Detection Systems

Leak detection is not optional in liquid-cooled GPU deployments. A single undetected coolant leak can destroy hundreds of thousands of dollars of GPU hardware in minutes.

Leak detection systems use conductive sensing cables routed along every coolant pathway, under every manifold connection, and at the base of every rack. When coolant contacts the sensing cable, the system generates an immediate alarm to the building management system and can trigger automatic pump shutdown to isolate the leak.

Point sensors at every quick-disconnect fitting provide faster detection than cable-based systems at known risk points. The combination of cable sensing (for unexpected leak locations) and point sensing (for known risk points) provides comprehensive coverage.

Leak detection systems must be tested regularly — a sensor that fails silently provides false confidence that is worse than no detection at all. Leviathan tests all leak detection sensors at commissioning and recommends quarterly testing thereafter.

Integration Process

Facility Water Loop Assessment

Before installing any rack-level cooling, the facility water loop must be assessed for capacity, temperature, flow rate, water quality, and redundancy. Many data centers that were built for air-cooled IT loads have water loops sized for perimeter cooling units (CRACs/CRAHs) that are inadequate for direct GPU cooling.

Key parameters to verify: total available cooling capacity in kilowatts, supply water temperature range (year-round, including seasonal variation), supply water pressure and flow rate at the point of connection, water quality (conductivity, pH, corrosion inhibitor levels, particulate count), and redundancy configuration (N+1 pumps, backup chillers, generator-backed cooling).

If the facility water loop is undersized, options include adding dedicated chillers for the GPU cooling circuit, connecting to the building chilled water system with isolation heat exchangers, or installing a dedicated outdoor dry cooler or cooling tower for the GPU cooling loop.

CDU Installation

CDU installation begins with positioning the unit in the designated location (in-row, overhead, or remote) and connecting the facility water side. Facility water connections typically use standard pipe fittings (grooved coupling, flanged, or threaded depending on pipe size and site standards).

After facility water connection, the CDU is powered up and commissioned: verify pump operation in both directions (some CDUs have reversible pumps for loop flushing), verify heat exchanger performance by measuring inlet and outlet temperatures on both the facility and IT loops, and calibrate all temperature and flow sensors against reference instruments.

Rack Cooling Connection

With the CDU operational, connect the rack cooling loop. For rack-scale systems, this involves connecting the supply and return hoses from the CDU to the rack manifold using quick-disconnect fittings.

After connection, perform the following sequence: pressure test the complete loop at 1.5x operating pressure for 30 minutes, fill the loop with the specified coolant mixture (typically a propylene glycol/water mixture at the manufacturer's specified concentration), bleed air from all high points using manual bleed valves, verify flow rate at every manifold branch using the CDU flow monitoring system, and verify that no leaks are detected by the leak detection system during a 24-hour observation period.

Thermal Commissioning

After the cooling loop is operational, perform thermal commissioning by running the GPU rack at full load while monitoring coolant temperatures, GPU temperatures, and CDU performance. Verify that all GPUs maintain operating temperature within the manufacturer's specification under sustained full-load conditions.

Thermal commissioning should be performed at the worst-case ambient conditions expected during the facility's operational life. If the deployment occurs during winter, the cooling system performance at summer peak conditions must be validated analytically using the CDU performance curves and facility water temperature data.

Maintenance and Operations

Coolant Management

Coolant must be monitored and maintained throughout the life of the deployment. Key parameters include glycol concentration (checked quarterly with a refractometer), pH level (maintained within manufacturer specification to prevent corrosion), particulate count (checked quarterly to verify filter effectiveness), and conductivity (monitored continuously by the CDU to detect contamination).

Coolant should be replaced on the manufacturer's recommended schedule, typically every 2-3 years. Running coolant beyond its service life degrades corrosion protection and increases the risk of deposit formation that reduces heat transfer effectiveness.

Filter Maintenance

CDU filters capture particulates that would otherwise accumulate in cold plates and restrict coolant flow. Filters must be inspected and replaced on a regular schedule — monthly during the first six months of operation (when the system sheds initial construction debris) and quarterly thereafter.

A clogged filter restricts flow rate to all downstream cold plates simultaneously, causing a gradual increase in GPU temperatures across the entire rack. This failure mode is insidious because it develops slowly and may not trigger thermal alarms until multiple GPUs are already throttling.

Leak Response Procedures

Despite best-practice installation and leak detection, coolant leaks can occur. Response procedures must be documented, tested, and understood by all operations staff before the first rack is energized.

The leak response procedure includes: automatic pump shutdown triggered by leak detection, isolation of the affected rack's cooling circuit using manual valves, controlled shutdown of affected GPU workloads to prevent thermal damage, physical inspection to locate and characterize the leak, repair using the appropriate replacement fittings or hose assemblies, pressure testing of the repaired section before restoring coolant flow, and post-incident documentation including root cause analysis.

Liquid Cooling Services

Leviathan Systems provides complete liquid cooling integration for GPU deployments, including CDU installation, rack-level manifold routing, quick-disconnect fitting installation, leak detection system deployment, pressure testing, filling and bleeding, and thermal commissioning.

Our team has direct experience with the cooling systems used in NVIDIA GB200 NVL72 and GB300 NVL72 rack-scale platforms, as well as liquid-cooled HGX-based systems from Dell, Supermicro, and ASUS.