GPU Deployment SOW Template & Guide | Leviathan Systems

A well-written Statement of Work (SOW) is the foundation of any successful GPU infrastructure deployment. Unlike traditional data center installations, GPU deployments—especially modern platforms like NVIDIA GB200 and GB300—introduce complexity that standard SOW templates simply don't address. From NVLink topology routing to liquid cooling integration boundaries, the details matter more than ever.

This guide provides a comprehensive template and practical guidance for writing GPU deployment SOWs that protect both clients and deployment partners while ensuring project success.

Why SOWs Matter More for GPU Deployments

Traditional server deployments follow well-established patterns. Rack, cable, power on, validate. The SOW can be relatively straightforward because the industry has decades of standardized practices.

GPU infrastructure is different. The platform complexity of GB200 and GB300 systems requires specifying details that traditional SOWs never addressed:

NVLink topology routing and validation requirements
Liquid cooling integration scope and facility boundaries
OTDR testing requirements for high-speed optical interconnects
Platform-specific commissioning procedures
Clear boundaries between deployment partner, mechanical contractor, and facility operations

Without these details explicitly defined, projects face scope creep, finger-pointing when issues arise, and acceptance criteria disputes that delay go-live dates.

Essential SOW Sections for GPU Deployments

A comprehensive GPU deployment SOW should include the following sections, each with specific details relevant to the platform and scope.

1. Project Scope Summary

The scope summary provides the high-level overview that frames the entire engagement. This section should specify:

Total rack count and configuration (e.g., "24 racks of GB200 NVL72 systems")
NVIDIA platform(s) being deployed (H100, H200, GB200, GB300)
Hardware vendors (Supermicro, Dell, NVIDIA reference design, etc.)
Facility location(s) and any site-specific constraints
High-level services included (deployment, testing, commissioning)

This section sets expectations and provides context for the detailed specifications that follow.

2. Hardware Bill of Materials (BOM)

The hardware BOM should be a complete list provided by the client, including:

Compute nodes with exact model numbers and quantities
Network switches (typically Arista or similar) with port counts
All cables (power, copper network, fiber optic) with types and lengths
Liquid cooling components if applicable (CDUs, manifolds, hoses)
Rack infrastructure (PDUs, cable management, labeling materials)

The BOM establishes what equipment the deployment partner will be working with and helps identify any missing components before work begins.

3. Services in Scope

This is where specificity matters most. Clearly define what the deployment partner will and will not do:

Rack assembly and equipment installation
Power cabling from PDU to equipment (specify voltage and connector types)
Network cabling with specific cable types (DAC, AOC, fiber with connector types)
Liquid cooling integration if applicable (connection to facility CDUs, pressure testing, leak detection)
Testing requirements including OTDR on all fiber connections
Commissioning procedures (power-on sequence, BIOS configuration, initial validation)

Be explicit about what's excluded. For example: "Facility power distribution upstream of rack PDUs is not included" or "Network switch configuration is client responsibility."

4. Documentation Deliverables

Documentation requirements should be specified upfront to avoid disputes at project completion. Essential deliverables include:

Per-connection test results for all network links
OTDR trace files for all fiber optic connections
Cable maps showing physical and logical connectivity
Rack elevation drawings (as-built)
Photographs of completed installation
Summary report with test results and any deviations from plan

Specify the format and delivery method for each deliverable. Digital documentation should be provided in standard formats (PDF, Excel, CAD files) via secure file transfer or client portal.

5. Timeline and Milestones

Break the project into phases with clear milestones and dependencies:

Pre-deployment planning and site survey
Equipment staging and inspection
Rack assembly and equipment installation
Power and network cabling
Liquid cooling integration (if applicable)
Testing and validation
Commissioning and handoff

Include realistic timeframes based on rack count and complexity. A 24-rack GB200 deployment with liquid cooling typically requires 3-4 weeks of on-site work, not including pre-deployment planning.

6. Crew Requirements

Specify the deployment team composition and qualifications:

Team size (e.g., "4-person crew for duration of deployment")
Required certifications (BICSI, manufacturer-specific training)
On-site leadership requirements (project manager or lead technician presence)
Background check or security clearance requirements if applicable

For complex platforms like GB200, specify that the team must have prior experience with the specific platform or have completed manufacturer training.

7. Quality Control Standards

Define the quality control process and inspection requirements:

QC process (peer review, independent inspection, client walkthrough)
Phase-gate inspections (e.g., rack assembly complete before cabling begins)
Industry standards referenced (TIA-942 for data centers, BICSI for cabling)
Defect remediation process and timeline

Referencing industry standards provides objective criteria for quality and reduces subjective disputes about workmanship.

8. Acceptance Criteria

Define exactly what 'done' means. Acceptance criteria should be objective and measurable:

All equipment installed per rack elevation drawings
100% of network connections tested and passing
OTDR results within specified loss budgets for all fiber
Liquid cooling system pressure tested and leak-free for 24 hours (if applicable)
All nodes power on and POST successfully
Documentation deliverables provided in specified format

Clear acceptance criteria prevent scope creep and establish when payment milestones are triggered.

9. Change Order Process

No deployment goes exactly as planned. Define how changes will be handled:

What constitutes a change (scope additions, equipment substitutions, timeline extensions)
Change request submission process
Approval authority and timeline
Pricing methodology for change orders

A clear change order process prevents disputes and keeps projects moving when adjustments are needed.

10. Warranty and Rework

Specify warranty terms for the deployment work:

Warranty period (typically 90 days for workmanship)
What's covered (installation defects, cable failures, connection issues)
What's excluded (equipment failures, client-caused damage, normal wear)
Response time for warranty issues

Note that equipment warranties are separate and provided by manufacturers, not the deployment partner.

Platform-Specific Considerations

The SOW requirements vary significantly based on the GPU platform being deployed.

Air-Cooled Platforms (H100, H200)

Air-cooled deployments are relatively straightforward compared to liquid-cooled systems. The SOW can focus on traditional data center deployment elements:

No liquid cooling scope or facility integration complexity
Standard rack assembly and equipment installation
Power and network cabling per standard practices
OTDR testing on fiber interconnects
Basic commissioning (power-on, POST validation)

The primary complexity with H100 and H200 deployments is the high-speed networking and ensuring proper NVLink topology, but the physical installation is less complex than liquid-cooled platforms.

Liquid-Cooled Platforms (GB200, GB300)

Liquid-cooled deployments require significantly more detailed SOW specifications. Critical elements include:

CDU provision: Specify whether CDUs are client-provided or deployment partner-provided
Facility boundary: Define the demarcation point between facility cooling infrastructure and rack-level cooling (typically at the CDU)
Pressure testing specifications: Define test pressure, duration, and acceptance criteria (e.g., "1.5x operating pressure for 24 hours with zero pressure drop")
Thermal commissioning criteria: Specify temperature targets and validation procedures under load
Leak detection validation: Define inspection procedures and acceptance criteria

The boundary between deployment partner and mechanical contractor must be crystal clear. Typically, the deployment partner handles rack-level cooling connections while the mechanical contractor handles facility-level infrastructure, but this must be explicitly stated.

Common SOW Mistakes to Avoid

Based on hundreds of GPU deployments, these are the most common SOW mistakes that lead to project issues:

Vague Scope Definitions

Phrases like "install and configure equipment" or "deploy GPU infrastructure" are too vague. Every service must be explicitly defined. Does "install" include unpacking? Firmware updates? BIOS configuration? If it's not written, it's not included.

Missing Acceptance Criteria

Without objective acceptance criteria, "done" becomes subjective. This leads to disputes at project completion and delayed final payments. Define measurable criteria that both parties can verify.

Undefined Liquid Cooling Boundaries

For liquid-cooled deployments, failing to define the boundary between deployment partner and facility/mechanical contractor is the single biggest source of project delays. Specify exactly where one party's responsibility ends and another's begins.

No Documentation Requirements

Assuming documentation will be provided without specifying what, when, and in what format leads to incomplete or unusable deliverables. List every document, its format, and delivery timeline.

Ignoring Change Order Process

Changes will happen. Without a defined process, every change becomes a negotiation that delays the project. Establish the process upfront so changes can be handled efficiently.

Working with Deployment Partners

A good deployment partner will help you write the SOW, not just respond to it. During the discovery phase, experienced partners can identify scope gaps, suggest realistic timelines, and help define appropriate acceptance criteria based on the specific platform and facility constraints.

The SOW should be a collaborative document that protects both parties and sets the project up for success. If a deployment partner is unwilling to help refine the SOW or pushes back on reasonable documentation and acceptance criteria, that's a red flag.

How to Write a GPU Deployment Statement of Work (SOW)