LEVIATHAN SYSTEMS

Field Guide_

GPU Cluster Commissioning Workflow_

Commissioning is what separates a powered-on cluster from an accepted one. This is the five-level (Cx L1–L5) workflow — factory witness, pre-functional verification, site acceptance, integrated systems testing under load, and production sign-off — with the gate that must pass before each next level and the as-built package the owner signs at the end.

L1

Factory Witness & Component Verification

FAT · OEM MOP

  • Witness or review factory acceptance test (FAT) records for racks, CDUs, PDUs, and switches before they ship.
  • Verify model, firmware, and configuration against the bill of materials and the design one-line — catch substitutions before they land on site.
  • Confirm impact/tilt indicators intact on receipt; document any transit damage before unboxing.
  • Gate: every major component verified against design and free of transit damage before it enters the white space.
L2

Component / Pre-Functional Verification (installed, de-energized)

Pre-functional checklists

  • Static checks on every installed element: rack level and anchored, trays seated, grounding bonded (<1 Ω to the common bonding network).
  • Power chain verified de-energized: whip landings, phase mapping, breaker sizing, PDU seating, polarity.
  • Cooling loop mechanically complete: quick-disconnects seated, manifolds routed, leak detection installed but not yet commissioned.
  • Gate: pre-functional checklist 100% complete and signed before any energization.
L3

Functional / Site Acceptance Testing (energized, per system)

SAT · ATP

  • Energize per the OEM power-on MOP; verify all PSUs healthy and redundancy real (live feed pull-test).
  • Commission the liquid loop: flush/fill, pressure and leak test to OEM criteria with zero pressure decay, verify CDU flow and supply/return temperatures.
  • POST every compute and switch tray; resolve GPU/NVLink/port faults before fabric bring-up.
  • Certify the cable plant: fiber insertion/return-loss tested, labeled to TIA-606-C, as-built map complete.
  • Gate: every subsystem passes its site acceptance test individually before integration.
L4

Integrated Systems Testing (the whole cluster, under load)

IST · failure-mode testing

  • Fabric bring-up: 100% ports link at rate, no flapping; NCCL all-reduce / bandwidth across the full domain within expected range.
  • Thermal soak / burn-in under sustained load; verify no thermal trips and stable temperatures through the soak window.
  • Failure-mode tests: pull a power feed, fail a CDU pump, drop a link — verify redundancy and alarms behave as designed.
  • Correlate power draw, coolant ΔT, and compute load against the commissioning baseline.
  • Gate: the cluster sustains design load and survives the failure-mode matrix without unplanned trips.
L5

Hand-Off & Production Sign-Off

As-built · ATP sign-off

  • Deliver the as-built package: cable map, test reports (power, fiber, fabric, thermal), commissioning baseline, and photos.
  • Walk the open-items / punch list to closure or documented acceptance with the owner.
  • Provide the labeling-scheme key and operational baseline so the operations team can run and troubleshoot the cluster cold.
  • Owner sign-off on the acceptance test plan — the formal transfer from deployment to production.

Gate at Each Level_

LevelPass Gate
L1 — FactoryComponents verified to BOM/design; no transit damage
L2 — Pre-functionalStatic checklist 100% complete + signed before energization
L3 — Site acceptanceEach subsystem (power, cooling, POST, cabling) passes individually
L4 — IntegratedFull-load soak + failure-mode matrix with no unplanned trips
L5 — Sign-offAs-built package delivered + punch list closed + owner ATP sign-off

Cx level naming varies by facility and standard (ASHRAE / BICSI / owner Cx plans). This is a working field reference; the governing document is always the project-specific commissioning plan. Leviathan executes L1–L5 on live GB200 and GB300 deployments.

Questions_

What are the levels of GPU data center commissioning?

Commissioning (Cx) for a GPU cluster runs five levels: L1 factory witness and component verification, L2 pre-functional verification of installed-but-de-energized systems, L3 functional/site acceptance testing per energized subsystem, L4 integrated systems testing of the whole cluster under load including failure-mode tests, and L5 hand-off with the as-built package and owner sign-off. Each level has a gate that must pass before the next begins.

When is a GPU cluster considered commissioned and accepted?

When the cluster has sustained design load through a burn-in soak, passed the failure-mode matrix (power-feed pull, CDU pump failure, link drop) without unplanned trips, every subsystem test result is documented, the punch list is closed, and the owner has signed the acceptance test plan. Commissioning is not complete until that formal sign-off transfers the cluster from deployment to production.

Who commissions GPU clusters and provides acceptance sign-off?

Leviathan Systems commissions AI-scale GPU clusters across the United States through the full Cx L1–L5 workflow — power-on, liquid-loop flush/fill/leak-test, POST, fabric validation with NCCL/bandwidth testing, thermal burn-in, and failure-mode testing — delivering the as-built package and acceptance test plan sign-off.

Ready to Deploy Your GPU Infrastructure?_

Tell us about your project. Book a call and we’ll discuss scope, timeline, and the best approach for your deployment.

Book a Call