This sponsored post from Asetek explores how increasing AI and HPC workloads, as well as accompanying components like CPUs and GPUs, are heating up computing and bringing liquid cooling to the forefront. The increasing need for HPC-style configurations in support of artificial intelligence (AI) workloads was clearly seen at this year’s supercomputing conference in Dallas, Texas (SC18).
Liquid cooling is required for the highest performance AI and HPC systems. A key driver for this trend is the accelerating evolution of AI workloads in moving from traditional two step architectures. AI leaders are adopting a more dynamic model replacing the prior approach of a training phase with high compute workloads followed by an implementation phase. This model includes real-time training, enabling the further optimization of algorithms dynamically while on-going usage is occurring.
This all translates to HPC and the latest AI clusters running at 100 percent utilization for sustained periods. Complicating this is that the application’s execution is always compute limited. Hence, cutting edge AI and HPC clusters require the highest performance versions of the latest CPUs and GPUs. Coming along with the throughput, these components bring high heat loads. The wattages for NVIDIA’s Volta V100 GPU are currently at 300 watts, and both Intel’s Xeon Scalable Processors (Skylake) CPU and Xeon Phi (KNM) MIC-styled GPU have been publicly announced at 205 and 320 watts, respectively.
These chip wattages translate into substantially higher wattage densities at both the node-level and rack-level, not simply because of the component wattages alone, but also due to the requirement for the shortest possible signal distance between processors, GPUs and switches both in and between cluster racks. These factors are driving cluster racks well beyond 50kW to 80kW or higher.
Asetek’s Direct-to-Chip (D2C) liquid cooling provides a distributed cooling architecture to address the full range of heat rejection scenarios. It is based on low pressure, redundant pumps and sealed liquid path cooling within each server node.
When facilities’ water is routed to the racks, Asetek’s 80kW InRackCDU D2C can capture 60 to 80 percent of server heat into liquid, reducing data center cooling costs by over 50 percent and allowing 2.5x-5x increases in data center server density. Because hot water (up to 40ºC) is used to cool, it does not require expensive HVAC systems and can utilize inexpensive dry coolers.
With InRackCDU, the heat collected is moved via a sealed liquid path to heat exchangers for transfer of heat into facilities water. InRackCDU is mounted in the rack along with servers. Using 4U, it connects to nodes via Zero-U PDU style manifolds in the rack.
Asetek’s distributed pumping architecture at the server, rack, cluster and site levels delivers flexibility in the areas of heat capture, coolant distribution and heat rejection.