Xilinx, Inc., the leader in adaptive computing, today at the SC21 supercomputing conference introduced the Alveo U55C data center accelerator card and a new standards-based, API-driven clustering solution for deploying FPGAs at a massive scale. The Alveo U55C accelerator brings superior performance-per-watt to high-performance computing (HPC) and database workloads and easily scales through the Xilinx HPC clustering solution.
Purpose-built for HPC and big data workloads, the new Alveo U55C card is the company’s most powerful Alveo accelerator card ever, offering the highest compute density and HBM capacity in the Alveo accelerator portfolio. Together with the new Xilinx RoCE v2-based clustering solution, a broad spectrum of customers with large-scale compute workloads can now implement powerful FPGA-based HPC clustering using their existing data center infrastructure and network.
“Scaling out Alveo compute capabilities to target HPC workloads is now easier, more efficient and more powerful than ever,” said Salil Raje, executive vice president and general manager, Data Center Group at Xilinx. “Architecturally, FPGA-based accelerators like Alveo cards provide the highest performance at the lowest cost for many compute-intensive workloads. By introducing a standards-based methodology that enables the creation of Alveo HPC clusters using a customer’s existing infrastructure and network, we’re delivering those key advantages at a massive scale to any data center. This is a major leap forward for even broader adoption of Alveo and adaptive computing throughout the data centre.”
Built for HPC and big data applications
The Alveo U55C card combines many key features that today’s HPC workloads require. It delivers more parallelism of data pipelines, superior memory management, optimized data movement throughout the pipeline, and the highest performance-per-watt in the Alveo portfolio.
The Alveo U55C card is a single-slot full height, half-length (FHHL) form factor with a low 150W max power. It offers superior compute density and doubles the HBM2 to 16GB compared to its predecessor, the dual-slot Alveo U280 card. The U55C provides more compute in a smaller form factor for creating dense Alveo accelerator-based clusters. It’s built for high-density streaming data, high IO math, and big compute problems that require scale-out like big data analytics and AI applications.
Leveraging RoCE v2 and data center bridging, coupled with 200 Gbps bandwidth, the API-driven clustering solution enables an Alveo network that competes with InfiniBand networks in performance and latency, with no vendor lock-in. MPI integration allows for HPC developers to scale out Alveo data pipelining from the Xilinx Vitis™ unified software platform. Utilizing existing open standards and frameworks, it’s now possible to scale out across hundreds of Alveo cards regardless of the server platforms and network infrastructure and with shared workloads and memory.
Software developers and data scientists can unlock the benefits of Alveo and adaptive computing through the high-level programmability of both the application and cluster utilizing the Vitis platform. Xilinx has invested heavily in the Vitis development platform and tools flow to make adaptive computing more accessible to software developers and data scientists without hardware expertise. The major AI frameworks like Pytorch and Tensorflow are supported, as well as high-level programming languages like C, C++ and Python, allowing developers to build domain solutions using specific APIs and libraries, or utilize Xilinx software development kits, to easily accelerate key HPC workloads within an existing data center.
HPC customer use cases
CSIRO, an Australian national lab with the world’s largest radio astronomy antenna array, is utilizing Alveo U55C cards for signal processing in its Square Kilometer Array radio telescope. Deploying the Alveo cards as network-attached accelerators with HBM allows for massive throughput at scale across the HPC signal processing cluster. The Alveo accelerator-based cluster allows CSIRO to tackle the massive compute task of aggregating, filtering, preparing and processing data from 131,000 antennae in real time. The 460GBs of HBM2 bandwidth across the signal processing cluster is served by 420 Alveo U55C cards fully networked together across P4-enabled 100Gbs switches. The Alveo U55C cluster delivers processing performance with overall throughput at 15Tb/s while requiring half the number of servers and less than half the power compared to commodity GPUs for significant cost savings. CSIRO is now completing a reference design in order to help other radio astronomy or adjacent industries achieve the same success.
Ansys LS-DYNA crash simulation software is used by nearly every automotive company in the world. The design of safety and structural systems hinges on the performance of models as they mitigate the costs of physical crash testing with computer-aided design finite element method (FEM) simulations. FEM solvers are the primary algorithms driving simulations with hundreds of millions of degrees of freedom, these enormous algorithms can be broken out into more rudimentary solvers like PCG, Sparse matrices, ICCG. By scaling out across many Alveo cards with hyperparallel data pipelining, LS-DYNA can accelerate performance by more than 5X in comparison to x86 CPUs. This results in more work per clock cycle in an Alveo pipeline with LS-DYNA customers benefiting from game changing simulation times.
TigerGraph, the provider of a leading graph analytics platform, is using multiple Alveo U55C cards to cluster and accelerate the two most prolific algorithms that drive graph-based recommendation and clustering engines. Graph databases are a disruptive platform for data scientists. Graphs take data from silos and bring focus to the relationships between data. The next frontier for graphs is finding those answers in real-time. Alveo U55C accelerates the query times and predictions for recommendation engines from minutes down to milliseconds. By utilizing multiple U55C cards to scale-up analytics, the superior computational power and memory bandwidth accelerate graph query speeds up to 45X faster compared to CPU-based clusters. The quality of scores is also increased by up to 35 percent, resulting in greater confidence dramatically lowering false positives to low single digits.
Product availability and easy evaluations
The Alveo U55C card is currently available on Xilinx.com and through Xilinx authorized distributors. It’s also available for easy evaluation via public cloud-based FPGA-as-a-Service providers, as well as select colocation data centers for private previews. Clustering is available now for private previews, with general availability expected in the second quarter of next year.
Xilinx is showcasing the Alveo U55C accelerator card, along with partner solutions, at the SC21 conference taking place this week. Register at SC21 to visit the Xilinx virtual booth.