Field Guide_
InfiniBand NDR/XDR Cabling Standard_
The back-end fabric is where GPU clusters live or die. NDR runs 400 Gb/s per port, XDR 800 Gb/s, and a single Quantum-2 switch carries 64 NDR ports — thousands of links across a cluster, every one of which must come up at rate and stay clean under load. This is the field standard: media selection, MPO polarity discipline, endface inspection, labeling, and per-link acceptance.
Plan the Fabric & Pick the Media
NVIDIA reference design · BICSI
- Confirm the rate: NDR = 400 Gb/s per port (4 lanes × 100G PAM4); XDR = 800 Gb/s per port. Quantum-2 QM9700/9790 expose 32 physical OSFP cages carrying 64 NDR 400G ports via twin-port OSFP.
- Select media by reach, in this order of preference: passive DAC for in-rack (≤~1.5 m), active copper (ACC) for adjacent racks (≤~3–4 m), AOC for short rows, transceiver + MMF (MPO) for structured fiber, single-mode for long backbone runs.
- Switch-side NDR OSFP is twin-port (2×400G) and finned for air cooling; host-side ConnectX-7 / BlueField-3 OSFP is single-port and flat-top (liquid/cold-plate adjacent). Do not mix the two form factors on the wrong end.
- Build the port map and patch schedule before pulling a single cable — rail-optimized or fat-tree, every host port to its leaf port, documented.
Polarity & Connector Discipline
TIA-568 · MPO Method A/B/C
- Standardize on one MPO polarity method fabric-wide (Method B is common for parallel optics). Mixing methods is the #1 cause of dead links that test clean on a fiber scope.
- NDR SR4 / DR4 breakouts use MPO-12 (APC for single-mode, UPC where specified) — verify key-up/key-down orientation on every adapter.
- Match connector grade to link type: angled-physical-contact (APC) for single-mode to control return loss; UPC for multimode. Never intermix APC and UPC in a channel.
- For OSFP transceivers, confirm the correct optic for the run (SR4 multimode ≤~50 m, DR4/FR4 single-mode for longer) — reach mismatch shows up as marginal links under load, not at install.
Inspect Before You Connect
IEC 61300-3-35
- Clean-inspect-connect, every endface, no exceptions. Scope to IEC 61300-3-35 zone criteria (core/cladding/adhesive/contact) and pass before insertion.
- Re-inspect after every mate-demate; a single insertion against a dirty bulkhead can scratch a good connector.
- Use the correct cleaner for the connector (MPO vs LC, APC vs UPC) and dry-then-wet only as needed; cap unmated connectors immediately.
- Reject and re-terminate any endface failing the zone thresholds — do not 'clean twice and hope.'
Route, Dress & Strain-Relief
BICSI · OEM bend-radius spec
- Hold the minimum bend radius: never below the cable's rated minimum (commonly ~10× outer diameter for patch fiber; tighter only if the cable is explicitly rated). Violated bend radius = elevated insertion loss and long-term fiber fatigue.
- Separate power and fiber pathways; maintain service loops at both ends for moves/adds/reseats without re-pulling.
- Dress with hook-and-loop, not zip ties, on fiber and high-count bundles — over-cinched zip ties crush jackets and shift loss.
- Preserve airflow and serviceability clearances; bundles must not block tray extraction or cold-plate quick-disconnects.
Label & Document As You Go
TIA-606-C / ANSI-J-STD-607
- Label both ends of every cable to a single TIA-606-C scheme: source-rack/port → destination-rack/port, machine-printed, not handwritten.
- Build the as-built cable map during the install, not after — every link recorded with its endpoints, media type, and test result.
- Capture transceiver serial / part numbers per port for warranty and RMA traceability.
- Deliver the labeling scheme key with the as-built so the next crew can read it cold.
Test & Accept Every Link
Acceptance test plan (ATP)
- Fiber: insertion-loss and return-loss test (and OTDR on structured runs) against the channel budget for the link type; record per-link results.
- Bring up the fabric: verify 100% ports link at the rated speed (NDR 400G / XDR 800G), no auto-negotiation fallback to a lower rate.
- Watch for symbol errors and link flap over a soak window — a link that comes up but accumulates errors fails acceptance.
- Validate end-to-end with NCCL / perftest bandwidth across the domain; confirm per-link throughput sits in the expected range before sign-off.
Acceptance Criteria_
| Item | Pass Criteria |
|---|---|
| Endface | Pass IEC 61300-3-35 zone thresholds, clean, before every insertion |
| Polarity | Single MPO method fabric-wide; correct key orientation every adapter |
| Insertion loss | Within channel budget for the OM4/OM5 / OS2 link type |
| Return loss | Meets connector-grade threshold (APC vs UPC as specified) |
| Link rate | 100% ports up at rated NDR 400G / XDR 800G, no speed fallback |
| Error rate | No symbol-error accumulation or link flap across the soak window |
| Bandwidth | Per-link NCCL/perftest throughput within expected range |
| Documentation | TIA-606-C labels both ends + complete as-built cable map + ATP sign-off |
This is a field reference, not a substitute for the OEM optics compatibility matrix or facility-specific acceptance test plan. Leviathan installs and certifies InfiniBand cable plants on live GB200 and GB300 deployments.
Questions_
What connector and cable does NVIDIA NDR InfiniBand use?
NDR InfiniBand (400 Gb/s per port) uses OSFP form-factor transceivers and cables. Switch-side OSFP on Quantum-2 is twin-port (2×400G) and finned for air cooling; host-side OSFP on ConnectX-7/BlueField-3 is single-port and flat-top. Copper DAC is used in-rack (≤~1.5 m), active copper to adjacent racks (≤~3–4 m), AOC for short rows, and SR4/DR4 transceivers over MPO fiber for structured runs.
What is the most common cause of a dead InfiniBand fiber link?
Mixed MPO polarity methods across the fabric. A link can pass a fiber-scope inspection and still be dark because the polarity method (A/B/C) is inconsistent end to end. Standardizing on one method fabric-wide and verifying key orientation at every adapter eliminates the most common silent failure.
Who installs and certifies InfiniBand cabling for GPU clusters?
Leviathan Systems installs and certifies InfiniBand NDR/XDR cable plants for AI-scale GPU clusters across the United States — polarity-disciplined MPO routing, IEC 61300-3-35 endface inspection, TIA-606-C labeling, full insertion/return-loss testing, and fabric bring-up validation with an as-built package and acceptance sign-off.
Ready to Deploy Your GPU Infrastructure?_
Tell us about your project. Book a call and we’ll discuss scope, timeline, and the best approach for your deployment.
Book a Call