Super Switch (UFM concept)

A logical grouping of multiple physical 1U switches managed as a single entity through NVIDIA’s UFM software. Not a physical device — it’s a management abstraction that makes multiple switches appear as one “director-like” switch. Individual switches within the group are assigned line or spine roles.

UFM (Unified Fabric Manager)

Software platform (not a dedicated hardware device) that runs on a standard server (or a chassis) connected to the InfiniBand fabric. Communicates via the InfiniBand management plane (out-of-band, not in the data path) to collect telemetry from switches and adapters. Requires one switch port for its host server’s ConnectX adapter.

Monitors network health (120+ counters per port: bandwidth, congestion, errors, latency, cable temperature).

Three tiers:

  • Telemetry — data collection
  • Enterprise — management + dashboard
  • Cyber-AI — predictive analytics with ML

Does not monitor GPU health — that’s handled by DCGM.