Um Ihnen die Funktionen unseres Online-Shops uneingeschränkt anbieten zu können setzen wir Cookies ein. Weitere Informationen

NVIDIA DGX POD
The DGX POD is an optimized data center rack containing up to nine DGX-1 servers or three DGX-2 servers, storage servers, and networking switches to support single and multi-node AI model training and inference using NVIDIA AI software.
There are several factors to consider when planning a DGX POD deployment in order to determine if more than one rack is needed per DGX POD. This reference architecture is based on a single 35 kW high-density rack to provide the most efficient use of costly data center floorspace and to simplify network cabling. As GPU usage grows, the average power per server and power per rack continues to increase. However, older data centers may not yet be able to support the power and cooling densities required; hence the three-zone design allowing the DGX POD components to be installed in up to three lower-power racks. The DGX POD is designed to fit within a standard-height 42 RU data center rack. A taller rack can be used to include redundant networking switches, a management switch, and login servers. This reference architecture uses an additional utility rack for login and management servers, and has been sized and tested with up to six DGX PODs. Larger configurations of DGX PODs can be defined by an NVIDIA solution architect. A primary 10 GbE (minimum) network switch is used to connect all servers in the DGX POD and to provide access to a data center network. The DGX POD has been tested with an Arista switch with 48 x 10 GbE ports and 4 x 40 GbE uplinks. VLAN capabilities of the networking hardware are used to allow the out-of-band management network to run independently from the data network, while sharing the same physical hardware. Alternatively, a separate 1 GbE management switch may be used. While not included in the reference architecture, a second 10 GbE network switch can be used for redundancy and high availability. In addition to Arista, NVIDIA is working with other networking vendors who plan to release switch reference designs compatible with the DGX POD. A 36-port Mellanox 100 Gbps switch is configured to provide four 100 Gbps InfiniBand connections to the DGX servers in the rack. This provides the best possible scalability for multi-node jobs. In the event of switch failure, multi-node jobs can fall back to use the 10 GbE switch for communications. The Mellanox switch can also be configured in 100 GbE mode for organizations that prefer to use Ethernet networking. Alternately, by configuring two 100 Gbps ports per DGX server, the Mellanox switch can also be used by the storage servers. With the DGX family of servers, AI and HPC workloads are fusing into a unified architecture. For organizations that want to utilize multiple DGX PODs to run cluster-wide jobs, a core InfiniBand switch is configured in the utility rack in conjunction with a second 36-port Mellanox switch. |
Technical Structure |
|||
OR |
|||
Storage architecture is important for optimized DL training performance. |
![]() |
![]() |
|
|
![]() |
DGX POD Installation and Management |
||||||||
Deploying a DGX POD is similar to deploying traditional servers and networking in a rack. However, with high-power consumption and corresponding cooling needs, server weight, and multiple networking cables per server, additional care and preparation is needed for a successful deployment. As with all IT equipment installation, it is important to work with the data center facilities team to ensure the DGX POD environmental requirements can be met.
|
NVIDIA AI Software |
||||||
NVIDIA AI software running on the DGX POD provides a high-performance DL training environment for large scale multi-user AI software development teams. NVIDIA AI software includes the DGX operating system (DGX OS), cluster management and orchestration tools, NVIDIA libraries and frameworks, workload schedulers, and optimized containers from the NGC container registry. To provide additional functionality, the DGX POD management software includes third-party open-source tools recommended by NVIDIA which have been tested to work on DGX PODs with the NVIDIA AI software stack. Support for these tools can be obtained directly through third-party support structures.
|
This content is taken from the "DGX POD Reference Design Whitepaper" from NVIDIA.
If you have any questions, please do not hesitate to contact us.
We are available for you at any time.
via e-mail or by phone +49 (0) 40-300-672-20