Data Center Site Assessment for GPU Deployment: What to Evaluate Before Signing a Lease
TL;DR
Most data centers cannot handle GPU racks. A site assessment before committing to a lease prevents the scenario where GPU racks arrive on site and cannot be energized because the power, cooling, or physical infrastructure is inadequate.
Most Data Centers Cannot Handle GPU Racks
The majority of existing data centers in the United States were built for enterprise IT workloads drawing 5-10kW per rack. GPU deployments require 30-120kW per rack. This mismatch means that most colocation facilities, enterprise data centers, and even some purpose-built facilities cannot support modern GPU infrastructure without significant upgrades.
A site assessment before committing to a lease or deployment prevents the scenario where GPU racks arrive on site and cannot be energized because the power, cooling, or physical infrastructure is inadequate. The cost of a thorough site assessment is negligible compared to the cost of discovering infrastructure gaps after equipment has been purchased and delivered.
This guide covers the key factors Leviathan evaluates during site assessments for GPU deployments.
Power Assessment
Available Capacity
The single most important question in a GPU site assessment is: how much power is actually available at the point of deployment?
Colocation providers advertise total facility power capacity, but the available capacity at the specific cabinets or cages allocated to a GPU deployment may be much lower. Power is distributed through a hierarchy of transformers, switchgear, busways, and panel boards, and bottlenecks can exist at any level.
Request the one-line electrical diagram (single-line diagram) for the facility and trace the power path from the utility entrance to the proposed rack locations. Identify the rated capacity and current loading at every point in the path. The available capacity is the minimum headroom at any point in the chain, not the facility's headline capacity.
Also verify that the power capacity is contractually committed, not just theoretically available. A colocation provider may have 10MW of total capacity but only 2MW unallocated, with the remainder committed to other tenants. Your deployment must fit within the unallocated capacity unless the provider agrees to infrastructure upgrades.
Voltage and Phase Configuration
GPU server power supplies accept a range of input voltages, but efficiency and capacity vary. Most current-generation GPU servers are optimized for 200-240V single-phase input or 480V three-phase input. Running GPU servers at 120V single-phase (common in older US data centers) reduces power supply efficiency and capacity, potentially limiting the number of GPUs that can be powered per rack.
Verify the voltage and phase configuration at the proposed rack locations and confirm compatibility with the GPU server power supply specifications. If the facility distributes at a non-optimal voltage, determine whether step-up or step-down transformers can be installed and what lead time is required.
Redundancy Configuration
Determine the facility's redundancy level at every point in the power chain. Key questions include whether the facility has redundant utility feeds from different substations, whether the UPS system is N+1 or 2N, whether generator capacity covers the full IT and cooling load, whether automatic transfer switches (ATS) are present at the rack level or only at the facility level, and what the maximum expected switchover time is from utility to generator.
For GPU training workloads that run continuously for weeks, any power interruption longer than the UPS battery runtime causes a training job restart. Verify that the facility's power redundancy matches the availability requirements of the planned workload.
Cooling Assessment
Current Cooling Capacity
Determine the total cooling capacity available at the proposed deployment location, measured in kilowatts or tons of refrigeration. Air cooling capacity is typically specified as sensible cooling capacity (heat removed from the air) rather than total cooling capacity (which includes latent heat from humidity removal).
For air-cooled GPU deployments (H100, H200), the cooling capacity must exceed the total IT heat load with margin for ambient temperature variation. A rule of thumb is to provide 20% cooling headroom above the maximum expected IT heat load.
For liquid-cooled GPU deployments (GB200, GB300), determine whether the facility has a chilled water loop suitable for CDU connection, the available chilled water capacity in kW, the supply water temperature and flow rate, and whether the chilled water system is on generator backup.
Chilled Water Availability for Liquid Cooling
Many existing data centers have chilled water systems that serve perimeter cooling units (CRACs/CRAHs) but do not have tap points available for rack-level CDUs. Adding chilled water tap points requires piping modifications, valve installation, and potentially chiller upgrades.
Key chilled water parameters to verify: supply temperature (GPU CDUs typically require 25-45°C supply water), return temperature (must be within the chiller's operating range), available flow rate in gallons per minute at the proposed tap point, pipe sizes and pressure available at the tap point, and water quality (filtration, corrosion inhibitor concentration, conductivity).
If the facility does not have adequate chilled water for liquid cooling, evaluate the feasibility and lead time of installing dedicated cooling infrastructure: a standalone dry cooler or cooling tower for the GPU cooling loop, a dedicated chiller connected to the GPU CDUs, or connection to the building's condenser water loop with an isolation heat exchanger.
Hot Aisle Temperature
For air-cooled deployments, measure the hot aisle temperature at the proposed rack locations under current load conditions. Many facilities that appear to have adequate cooling capacity are actually running hot aisles above 35°C, which leaves insufficient thermal margin for GPU racks that generate significantly more heat than the existing IT equipment.
GPU servers are typically specified for ambient air intake temperatures up to 35°C (ASHRAE A1 class), but sustained operation at the upper end of this range reduces fan life, increases power consumption, and may trigger thermal throttling on the most power-dense components. Targeting a maximum hot aisle temperature of 30°C provides adequate margin for reliable GPU operation.
Physical Infrastructure
Floor Loading
Verify the floor loading capacity at every proposed rack location. A GPU rack can weigh 2,000-3,500 pounds depending on the platform. For raised floor installations, this weight must be supported by the floor tiles, pedestals, stringers, and the subfloor.
Request the floor loading specification from the facility and compare against the weight of the proposed GPU racks including all installed equipment, cables, and (for liquid-cooled systems) coolant. If the floor loading capacity is insufficient, options include installing reinforced pedestals and stringers at the affected positions or relocating the deployment to a slab-on-grade area of the facility.
Ceiling Height and Overhead Clearance
GPU racks are typically 42U (approximately 73 inches / 185cm) or 48U (approximately 84 inches / 213cm) tall. With casters and leveling feet, add 2-4 inches. Cable management, overhead cable trays, and lighting must clear the top of the tallest rack.
For facilities with raised floors, the effective ceiling height is reduced by the raised floor height (typically 12-36 inches). Verify that the remaining clearance accommodates the rack height plus overhead cable trays, lighting, fire suppression distribution piping, and HVAC ductwork.
Aisle Width
GPU racks, particularly the wider MGX-format racks used for GB200 and GB300, require wider aisles than standard 19-inch rack installations. Verify that hot and cold aisles provide sufficient width for rack installation (the rack must be moved into position through the aisle), equipment servicing (compute trays and switches must be extracted from the rack), and code compliance (fire code minimum aisle widths for means of egress).
NFPA 75 (Standard for the Fire Protection of Information Technology Equipment) requires minimum 36-inch aisles for personnel access and 42-inch aisles for equipment installation and removal. Many GPU platforms require wider aisles for tray extraction.
Cable Pathways
Evaluate the existing cable pathway infrastructure (overhead ladder rack, under-floor cable trays, vertical cable risers) for capacity to accommodate GPU cabling volumes. A single GPU rack can require 50-100 cables, and a row of 10 racks generates 500-1,000 cables that must be routed through the pathway system.
Measure the available space in overhead cable trays, accounting for existing cables from other tenants. Cable tray fill should not exceed 50% to allow for future additions and to maintain adequate airflow around cables. If existing pathways are insufficient, determine the feasibility and lead time of installing additional cable trays or conduit.
Network Infrastructure
Fiber Connectivity
Verify the availability of fiber connectivity between the proposed rack locations and the facility's meet-me room or network point of presence. GPU clusters require high-bandwidth connections to the internet, cloud providers, and potentially other data centers for data ingestion and model serving.
Determine the fiber type available (multimode vs. single-mode), the number of available fiber pairs, and whether new fiber must be pulled from the meet-me room to the deployment area. For large GPU deployments, dedicated fiber runs with sufficient strand count to support the cluster's external bandwidth requirements are typically required.
Cross-Connect Availability
If the GPU cluster requires connectivity to specific network providers, cloud on-ramps, or other tenants in the facility, verify that cross-connect services are available and determine the lead time for provisioning. Cross-connect provisioning can take days to weeks depending on the facility and the provider.
Contractual Considerations
Power Commitment
Colocation leases for GPU deployments must explicitly commit the required power capacity, including the power density per rack (kW per cabinet), the total power for the deployment area, whether the power commitment is contractually guaranteed or best-effort, and the penalty or remedy if the facility fails to deliver committed power.
Cooling Commitment
For liquid-cooled deployments in colocation, the lease must address who provides and maintains the CDUs (tenant or provider), the chilled water supply temperature and flow rate commitment, the responsibility for cooling system maintenance and consumables (coolant, filters), and the process for cooling system failures and repair.
Scaling Terms
GPU deployments often scale over time as additional racks are added to the cluster. The lease should include options for additional rack positions with committed power and cooling, a timeline for infrastructure upgrades needed to support expansion, and pricing terms for incremental power and space.
Site Assessment Services
Leviathan Systems provides comprehensive site assessments for GPU deployments. Our assessment evaluates power availability and quality, cooling capacity and thermal margin, physical infrastructure (floor loading, clearance, aisle width), cable pathway capacity, network connectivity, and overall facility readiness for the specific GPU platform planned for deployment.
The assessment deliverable is a detailed report documenting the facility's current capabilities, any gaps relative to the deployment requirements, recommended remediation actions, estimated costs and lead times for remediation, and a go/no-go recommendation for the site.
Contact us to schedule a site assessment.