#IntroducingAzureHBv5
Explore tagged Tumblr posts
Text
Introducing Azure HBv5 & Azure ND GB200 V6 Virtual Machines
Introducing Azure HBv5 Virtual Machines: An Advancement in HPC Memory Bandwidth
The upcoming generation of specially designed virtual machines for HPC Azure At Microsoft Ignite today, Satya Nadella introduced Azure HBv5, the newest CPU-based virtual machine for HPC clients and their applications. The most memory-intensive HPC applications, including computational fluid dynamics, automotive and aerospace simulation, weather modeling, energy research, molecular dynamics, computer-aided engineering, and more, are best suited for this new virtual machine (VM).Image credit to Azure
To overcome the largest HPC constraint, increase memory bandwidth by 8 times. The biggest obstacle to reaching the required levels of workload performance (time to insight) and cost-effectiveness for many HPC customers is memory performance from traditional server designs. Microsoft and AMD have collaborated to create a unique 4th Generation EPYC processor with high bandwidth memory (HBM) to get around this issue. Four of these processors collaborate to provide almost 7 TB/s of memory bandwidth in an Azure HBv5 virtual machine.
For comparison, this is up to 35 times more expensive than a 4–5-year-old HPC server nearing the end of its hardware lifecycle, up to 8 times more expensive than the newest bare-metal and cloud alternatives, and nearly 20 times more expensive than Azure HBv3 and Azure HBv2 (3rd Gen EPYC with 3D V-cache “Milan-X” and 2nd Gen EPYC “Rome”).
HPC advancements and enhancements throughout the technology stack
A notable characteristic of Azure HBv5 is its memory bandwidth, but Microsoft and AMD have co-engineered enhancements across the platform to give users a virtual machine (VM) that is safe, balanced, user-configurable, and incredibly performant for a range of HPC workloads.
Every Azure HBv5 virtual machine will have:
400–450 GB of RAM with 6.9 TB/s of memory bandwidth (STREAM Triad) (HBM3)
Each core can have up to 9 GB of memory (customer configurable).
A maximum of 352 AMD EPYC “Zen4” CPU cores with peak rates of 4 GHz (customizable)
Infinity Fabric bandwidth between CPUs is doubled compared to other AMD EPYC server platforms.
Single-tenant only architecture with SMT disabled (1 VM per server)
200 Gb/s per CPU SoC, balanced with 800 Gb/s of NVIDIA Quantum-2 InfiniBand.
Scaling MPI applications to hundreds of thousands of CPU cores with HBM power is possible with Azure VMSS Flex.
Azure Accelerated Networking @ 160 Gbps with a second-generation Azure Boost NIC
Up to 50 GB/s read and 30 GB/s write bandwidth can be achieved with a 14 TB local NVMe SSD.
Register for the Azure HBv5 Virtual Machine Preview
The Azure HBv5 Preview, which will launch in the first half of 2025, is now available for registration. See Azure HBv5 and other Azure supercomputing solutions at Microsoft Azure booth #1905 at Supercomputing 2024 in Atlanta, Georgia, November 19–22. You can also speak with professionals about how this virtual machine may help your HPC workloads.
NVIDIA Blackwell is used by Microsoft to power the upcoming AI supercomputing frontier
Based on the NVIDIA accelerated computing architecture, I am happy to inform that the release of the first cloud private preview of the Azure ND GB200 V6 VM series. The NVIDIA GB200 Grace Blackwell Superchip powers this most recent virtual machine, which has NVIDIA Grace CPUs and NVIDIA Blackwell GPUs with remarkable AI supercomputing capabilities for training cutting-edge frontier models and speeding up generative inferencing.
Microsoft’s custom server with NVIDIA Blackwell, which has two GB200 Grace Blackwell Superchips, is the foundation of the Azure ND GB200 V6 VM series. Each GB200 Superchip uses the NVIDIA NVLink-C2C interface to link a Grace CPU with two powerful Blackwell GPUs. Applications may now access a unified memory space at high speed and coherently thanks to NVLink-C2C, which makes programming easier and supports the high-speed memory requirements of next-generation trillion-parameter large language models (LLMs).
Up to 72 Blackwell GPUs can be used in a single NVLink domain with Microsoft’s ND GB200 V6 virtual machines, which can scale up to 18 compute servers via NVIDIA NVLink Switch trays. These virtual machines may also expand out to tens of thousands of GPUs for previously unheard-of AI supercomputing performance because they are connected via the newest NVIDIA Quantum InfiniBand.
The most recent version of Azure Boost, a specially designed solution that improves the server virtualization stack for increased robustness, manageability, and security, will be installed on any Microsoft server with NVIDIA Blackwell. Azure Boost provides optimal IO performance for both CPU and GPU, supports 200 Gbps network speeds, and speeds up storage performance.
These capabilities offer outstanding value to its clients by improving performance, scalability, and dependability. Commercial businesses can stay ahead of the curve and boost business results by being able to develop and implement advanced AI models more rapidly and effectively. Azure clients can confidently take on their most ambitious AI projects with the remarkable compute power provided by the cutting-edge architecture of the most recent Azure VM series with NVIDIA GB200 Superchips and Microsoft’s optimized AI software stack, whether they are creating intricate neural networks or using pre-existing models with special datasets to make them more business-relevant.
Selected partners will have access to a restricted private preview of Azure ND GB200 V6 virtual machines, enabling co-validation and co-optimization.
Read more on govindhtech.com
#IntroducingAzureHBv5#AzureND#GB200V6#VirtualMachines#AzureHBv5Preview#AzureBoost#AzureIntegratedHSM#hardwaresecuritymodule#AzureVMseries#News#technews#technology#technologynews#govindhtech
1 note
·
View note