How to Write a GPU Deployment Statement of Work (SOW)
Template and guidance for writing a GPU infrastructure deployment SOW covering scope, platforms, testing, documentation, and acceptance criteria for NVIDIA deployments.
A well-written Statement of Work (SOW) is the foundation of any successful GPU infrastructure deployment. Unlike traditional data center installations, GPU deployments—especially modern platforms like NVIDIA GB200 and GB300—introduce complexity that standard SOW templates simply don't address. From NVLink topology routing to liquid cooling integration boundaries, the details matter more than ever.
This guide provides a comprehensive template and practical guidance for writing GPU deployment SOWs that protect both clients and deployment partners while ensuring project success.
Why SOWs Matter More for GPU Deployments
Traditional server deployments follow well-established patterns. Rack, cable, power on, validate. The SOW can be relatively straightforward because the industry has decades of standardized practices.
GPU infrastructure is different. The platform complexity of GB200 and GB300 systems requires specifying details that traditional SOWs never addressed:
- NVLink topology routing and validation requirements
- Liquid cooling integration scope and facility boundaries
- OTDR testing requirements for high-speed optical interconnects
- Platform-specific commissioning procedures
- Clear boundaries between deployment partner, mechanical contractor, and facility operations
Without these details explicitly defined, projects face scope creep, finger-pointing when issues arise, and acceptance criteria disputes that delay go-live dates.
Essential SOW Sections for GPU Deployments
A comprehensive GPU deployment SOW should include the following sections, each with specific details relevant to the platform and scope.
1. Project Scope Summary
The scope summary provides the high-level overview that frames the entire engagement. This section should specify:
- Total rack count and configuration (e.g., "24 racks of GB200 NVL72 systems")
- NVIDIA platform(s) being deployed (H100, H200, GB200, GB300)
- Hardware vendors (Supermicro, Dell, NVIDIA reference design, etc.)
- Facility location(s) and any site-specific constraints
- High-level services included (deployment, testing, commissioning)
This section sets expectations and provides context for the detailed specifications that follow.
2. Hardware Bill of Materials (BOM)
The hardware BOM should be a complete list provided by the client, including:
- Compute nodes with exact model numbers and quantities
- Network switches (typically Arista or similar) with port counts
- All cables (power, copper network, fiber optic) with types and lengths
- Liquid cooling components if applicable (CDUs, manifolds, hoses)
- Rack infrastructure (PDUs, cable management, labeling materials)
The BOM establishes what equipment the deployment partner will be working with and helps identify any missing components before work begins.
3. Services in Scope
This is where specificity matters most. Clearly define what the deployment partner will and will not do:
- Rack assembly and equipment installation
- Power cabling from PDU to equipment (specify voltage and connector types)
- Network cabling with specific cable types (DAC, AOC, fiber with connector types)
- Liquid cooling integration if applicable (connection to facility CDUs, pressure testing, leak detection)
- Testing requirements including OTDR on all fiber connections
- Commissioning procedures (power-on sequence, BIOS configuration, initial validation)
Be explicit about what's excluded. For example: "Facility power distribution upstream of rack PDUs is not included" or "Network switch configuration is client responsibility."
4. Documentation Deliverables
Documentation requirements should be specified upfront to avoid disputes at project completion. Essential deliverables include:
- Per-connection test results for all network links
- OTDR trace files for all fiber optic connections
- Cable maps showing physical and logical connectivity
- Rack elevation drawings (as-built)
- Photographs of completed installation
- Summary report with test results and any deviations from plan
Specify the format and delivery method for each deliverable. Digital documentation should be provided in standard formats (PDF, Excel, CAD files) via secure file transfer or client portal.
5. Timeline and Milestones
Break the project into phases with clear milestones and dependencies:
- Pre-deployment planning and site survey
- Equipment staging and inspection
- Rack assembly and equipment installation
- Power and network cabling
- Liquid cooling integration (if applicable)
- Testing and validation
- Commissioning and handoff
Include realistic timeframes based on rack count and complexity. A 24-rack GB200 deployment with liquid cooling typically requires 3-4 weeks of on-site work, not including pre-deployment planning.
6. Crew Requirements
Specify the deployment team composition and qualifications:
- Team size (e.g., "4-person crew for duration of deployment")
- Required certifications (BICSI, manufacturer-specific training)
- On-site leadership requirements (project manager or lead technician presence)
- Background check or security clearance requirements if applicable
For complex platforms like GB200, specify that the team must have prior experience with the specific platform or have completed manufacturer training.
7. Quality Control Standards
Define the quality control process and inspection requirements:
- QC process (peer review, independent inspection, client walkthrough)
- Phase-gate inspections (e.g., rack assembly complete before cabling begins)
- Industry standards referenced (TIA-942 for data centers, BICSI for cabling)
- Defect remediation process and timeline
Referencing industry standards provides objective criteria for quality and reduces subjective disputes about workmanship.
8. Acceptance Criteria
Define exactly what 'done' means. Acceptance criteria should be objective and measurable:
- All equipment installed per rack elevation drawings
- 100% of network connections tested and passing
- OTDR results within specified loss budgets for all fiber
- Liquid cooling system pressure tested and leak-free for 24 hours (if applicable)
- All nodes power on and POST successfully
- Documentation deliverables provided in specified format
Clear acceptance criteria prevent scope creep and establish when payment milestones are triggered.
9. Change Order Process
No deployment goes exactly as planned. Define how changes will be handled:
- What constitutes a change (scope additions, equipment substitutions, timeline extensions)
- Change request submission process
- Approval authority and timeline
- Pricing methodology for change orders
A clear change order process prevents disputes and keeps projects moving when adjustments are needed.
10. Warranty and Rework
Specify warranty terms for the deployment work:
- Warranty period (typically 90 days for workmanship)
- What's covered (installation defects, cable failures, connection issues)
- What's excluded (equipment failures, client-caused damage, normal wear)
- Response time for warranty issues
Note that equipment warranties are separate and provided by manufacturers, not the deployment partner.
Platform-Specific Considerations
The SOW requirements vary significantly based on the GPU platform being deployed.
Air-Cooled Platforms (H100, H200)
Air-cooled deployments are relatively straightforward compared to liquid-cooled systems. The SOW can focus on traditional data center deployment elements:
- No liquid cooling scope or facility integration complexity
- Standard rack assembly and equipment installation
- Power and network cabling per standard practices
- OTDR testing on fiber interconnects
- Basic commissioning (power-on, POST validation)
The primary complexity with H100 and H200 deployments is the high-speed networking and ensuring proper NVLink topology, but the physical installation is less complex than liquid-cooled platforms.
Liquid-Cooled Platforms (GB200, GB300)
Liquid-cooled deployments require significantly more detailed SOW specifications. Critical elements include:
- CDU provision: Specify whether CDUs are client-provided or deployment partner-provided
- Facility boundary: Define the demarcation point between facility cooling infrastructure and rack-level cooling (typically at the CDU)
- Pressure testing specifications: Define test pressure, duration, and acceptance criteria (e.g., "1.5x operating pressure for 24 hours with zero pressure drop")
- Thermal commissioning criteria: Specify temperature targets and validation procedures under load
- Leak detection validation: Define inspection procedures and acceptance criteria
The boundary between deployment partner and mechanical contractor must be crystal clear. Typically, the deployment partner handles rack-level cooling connections while the mechanical contractor handles facility-level infrastructure, but this must be explicitly stated.
Common SOW Mistakes to Avoid
Based on hundreds of GPU deployments, these are the most common SOW mistakes that lead to project issues:
Vague Scope Definitions
Phrases like "install and configure equipment" or "deploy GPU infrastructure" are too vague. Every service must be explicitly defined. Does "install" include unpacking? Firmware updates? BIOS configuration? If it's not written, it's not included.
Missing Acceptance Criteria
Without objective acceptance criteria, "done" becomes subjective. This leads to disputes at project completion and delayed final payments. Define measurable criteria that both parties can verify.
Undefined Liquid Cooling Boundaries
For liquid-cooled deployments, failing to define the boundary between deployment partner and facility/mechanical contractor is the single biggest source of project delays. Specify exactly where one party's responsibility ends and another's begins.
No Documentation Requirements
Assuming documentation will be provided without specifying what, when, and in what format leads to incomplete or unusable deliverables. List every document, its format, and delivery timeline.
Ignoring Change Order Process
Changes will happen. Without a defined process, every change becomes a negotiation that delays the project. Establish the process upfront so changes can be handled efficiently.
Working with Deployment Partners
A good deployment partner will help you write the SOW, not just respond to it. During the discovery phase, experienced partners can identify scope gaps, suggest realistic timelines, and help define appropriate acceptance criteria based on the specific platform and facility constraints.
The SOW should be a collaborative document that protects both parties and sets the project up for success. If a deployment partner is unwilling to help refine the SOW or pushes back on reasonable documentation and acceptance criteria, that's a red flag.
Leviathan Systems works with clients to define deployment scope during the discovery phase, ensuring SOWs are comprehensive and realistic. Our engineering team has executed deployments across H100, GB200, and GB300 platforms on Supermicro, Dell, and NVIDIA hardware with Arista switching. We can help scope your project and define SOW requirements within one business day.