Super Switch (UFM concept)
A logical grouping of multiple physical 1U switches managed as a single entity through NVIDIA’s UFM software. Not a physical device — it’s a management abstraction that makes multiple switches appear as one “director-like” switch. Individual switches within the group are assigned line or spine roles.
UFM (Unified Fabric Manager)
Software platform (not a dedicated hardware device) that runs on a standard server (or a chassis) connected to the InfiniBand fabric. Communicates via the InfiniBand management plane (out-of-band, not in the data path) to collect telemetry from switches and adapters. Requires one switch port for its host server’s ConnectX adapter.
Monitors network health (120+ counters per port: bandwidth, congestion, errors, latency, cable temperature).
Three tiers:
- Telemetry — data collection
- Enterprise — management + dashboard
- Cyber-AI — predictive analytics with ML
Does not monitor GPU health — that’s handled by DCGM.