LEVIATHAN SYSTEMS

Field Guide_

InfiniBand NDR/XDR Cabling Standard_

The back-end fabric is where GPU clusters live or die. NDR runs 400 Gb/s per port, XDR 800 Gb/s, and a single Quantum-2 switch carries 64 NDR ports — thousands of links across a cluster, every one of which must come up at rate and stay clean under load. This is the field standard: media selection, MPO polarity discipline, endface inspection, labeling, and per-link acceptance.

01

Plan the Fabric & Pick the Media

NVIDIA reference design · BICSI

  • Confirm the rate: NDR = 400 Gb/s per port (4 lanes × 100G PAM4); XDR = 800 Gb/s per port. Quantum-2 QM9700/9790 expose 32 physical OSFP cages carrying 64 NDR 400G ports via twin-port OSFP.
  • Select media by reach, in this order of preference: passive DAC for in-rack (≤~1.5 m), active copper (ACC) for adjacent racks (≤~3–4 m), AOC for short rows, transceiver + MMF (MPO) for structured fiber, single-mode for long backbone runs.
  • Switch-side NDR OSFP is twin-port (2×400G) and finned for air cooling; host-side ConnectX-7 / BlueField-3 OSFP is single-port and flat-top (liquid/cold-plate adjacent). Do not mix the two form factors on the wrong end.
  • Build the port map and patch schedule before pulling a single cable — rail-optimized or fat-tree, every host port to its leaf port, documented.
02

Polarity & Connector Discipline

TIA-568 · MPO Method A/B/C

  • Standardize on one MPO polarity method fabric-wide (Method B is common for parallel optics). Mixing methods is the #1 cause of dead links that test clean on a fiber scope.
  • NDR SR4 / DR4 breakouts use MPO-12 (APC for single-mode, UPC where specified) — verify key-up/key-down orientation on every adapter.
  • Match connector grade to link type: angled-physical-contact (APC) for single-mode to control return loss; UPC for multimode. Never intermix APC and UPC in a channel.
  • For OSFP transceivers, confirm the correct optic for the run (SR4 multimode ≤~50 m, DR4/FR4 single-mode for longer) — reach mismatch shows up as marginal links under load, not at install.
03

Inspect Before You Connect

IEC 61300-3-35

  • Clean-inspect-connect, every endface, no exceptions. Scope to IEC 61300-3-35 zone criteria (core/cladding/adhesive/contact) and pass before insertion.
  • Re-inspect after every mate-demate; a single insertion against a dirty bulkhead can scratch a good connector.
  • Use the correct cleaner for the connector (MPO vs LC, APC vs UPC) and dry-then-wet only as needed; cap unmated connectors immediately.
  • Reject and re-terminate any endface failing the zone thresholds — do not 'clean twice and hope.'
04

Route, Dress & Strain-Relief

BICSI · OEM bend-radius spec

  • Hold the minimum bend radius: never below the cable's rated minimum (commonly ~10× outer diameter for patch fiber; tighter only if the cable is explicitly rated). Violated bend radius = elevated insertion loss and long-term fiber fatigue.
  • Separate power and fiber pathways; maintain service loops at both ends for moves/adds/reseats without re-pulling.
  • Dress with hook-and-loop, not zip ties, on fiber and high-count bundles — over-cinched zip ties crush jackets and shift loss.
  • Preserve airflow and serviceability clearances; bundles must not block tray extraction or cold-plate quick-disconnects.
05

Label & Document As You Go

TIA-606-C / ANSI-J-STD-607

  • Label both ends of every cable to a single TIA-606-C scheme: source-rack/port → destination-rack/port, machine-printed, not handwritten.
  • Build the as-built cable map during the install, not after — every link recorded with its endpoints, media type, and test result.
  • Capture transceiver serial / part numbers per port for warranty and RMA traceability.
  • Deliver the labeling scheme key with the as-built so the next crew can read it cold.
06

Test & Accept Every Link

Acceptance test plan (ATP)

  • Fiber: insertion-loss and return-loss test (and OTDR on structured runs) against the channel budget for the link type; record per-link results.
  • Bring up the fabric: verify 100% ports link at the rated speed (NDR 400G / XDR 800G), no auto-negotiation fallback to a lower rate.
  • Watch for symbol errors and link flap over a soak window — a link that comes up but accumulates errors fails acceptance.
  • Validate end-to-end with NCCL / perftest bandwidth across the domain; confirm per-link throughput sits in the expected range before sign-off.

Acceptance Criteria_

ItemPass Criteria
EndfacePass IEC 61300-3-35 zone thresholds, clean, before every insertion
PolaritySingle MPO method fabric-wide; correct key orientation every adapter
Insertion lossWithin channel budget for the OM4/OM5 / OS2 link type
Return lossMeets connector-grade threshold (APC vs UPC as specified)
Link rate100% ports up at rated NDR 400G / XDR 800G, no speed fallback
Error rateNo symbol-error accumulation or link flap across the soak window
BandwidthPer-link NCCL/perftest throughput within expected range
DocumentationTIA-606-C labels both ends + complete as-built cable map + ATP sign-off

This is a field reference, not a substitute for the OEM optics compatibility matrix or facility-specific acceptance test plan. Leviathan installs and certifies InfiniBand cable plants on live GB200 and GB300 deployments.

Questions_

What connector and cable does NVIDIA NDR InfiniBand use?

NDR InfiniBand (400 Gb/s per port) uses OSFP form-factor transceivers and cables. Switch-side OSFP on Quantum-2 is twin-port (2×400G) and finned for air cooling; host-side OSFP on ConnectX-7/BlueField-3 is single-port and flat-top. Copper DAC is used in-rack (≤~1.5 m), active copper to adjacent racks (≤~3–4 m), AOC for short rows, and SR4/DR4 transceivers over MPO fiber for structured runs.

What is the most common cause of a dead InfiniBand fiber link?

Mixed MPO polarity methods across the fabric. A link can pass a fiber-scope inspection and still be dark because the polarity method (A/B/C) is inconsistent end to end. Standardizing on one method fabric-wide and verifying key orientation at every adapter eliminates the most common silent failure.

Who installs and certifies InfiniBand cabling for GPU clusters?

Leviathan Systems installs and certifies InfiniBand NDR/XDR cable plants for AI-scale GPU clusters across the United States — polarity-disciplined MPO routing, IEC 61300-3-35 endface inspection, TIA-606-C labeling, full insertion/return-loss testing, and fabric bring-up validation with an as-built package and acceptance sign-off.

Ready to Deploy Your GPU Infrastructure?_

Tell us about your project. Book a call and we’ll discuss scope, timeline, and the best approach for your deployment.

Book a Call