Nvidia unveils new AI Blackwell chip, microservices and more | TechTarget
Nvidia today introduced a number of new AI services and infrastructure for enterprises and cloud providers that need more computing power to build large language models and AI-powered apps.
The company released a raft of products during its annual GTC AI developer conference, focused on bolstering its dominance in the exploding GenAI market. Nvidia touted a list of major cloud and AI application providers that will use its latest AI chips.
As expected, the vendor introduced its new generation of Blackwell GPUs, the next version of its AI platform Nvidia AI Enterprise 5.0, with the newly introduced Nvidia microservices and other industry-specific applications for its Nvidia Omniverse platform.
New AI chips
The new Blackwell B200 GPU architecture includes six technologies for AI computing. It is manufactured with two GPU dies connected by a 10 TB-per-second chip-to-chip link, according to Nvidia.
“Hopper is fantastic, but we need bigger GPUs,” Nvidia CEO Jensen Huang said during his keynote.
Nvidia GB200 Grace Blackwell Superchip connects two Nvidia B200 Tensor Core GPUs to the Nvidia CPU through an NVLink. It is built to support more compute and model sizes with new inference capabilities.
Enterprises looking to build AI systems for the future should utilize the latest and best GPU technology they can access, and the Blackwell architecture provides a “compelling” option, Gartner analyst Chirag Dekate said.
Chirag DekateAnalyst, Gartner
He added that it enables users to deliver more compute in a smaller footprint.
However, it’s unlikely that it will be available for broad enterprise adoption until later this year, Dekate added. Cloud hyperscalers such as Google, Microsoft, AWS and Oracle will likely be the first to access the latest chips as they order mass quantities of GPUs to build and train very large language models (LLMs).
They will also offer Nvidia’s compute power to customers through their cloud services. Oracle, for example, already announced plans to add Nvidia Grace Blackwell to Nvidia DGX Cloud on Oracle Cloud Infrastructure. Oracle customers can now access the new GB200 NVL72-based instances for training and inference.
Hyperscalers will also likely be the first to use Blackwell because of the expense, Forrester Research analyst Alvin Nguyen said.
Most enterprises also don’t have the resources available that are available to hyperscalers trying to train large GenAI models.
“The understandable way to get better with GenAI is to increase the data sets, increase the size of networks,” Nguyen said. “The problem is that the infrastructure has to be huge. Only hyperscalers can really do that.”
While Nvidia is in a leadership position in the AI hardware market, there is also Intel with its upcoming Gaudi 3 AI accelerator, and other vendors such as Sambanova that provide other options.
“You don’t necessarily have to be at the same speeds as long as you can meet customer demands and have close enough or decent enough speeds,” Nguyen said.
He added that companies running smaller LLMs might not need the highest-performing chips from Nvidia and could use lower-cost chips from other AI hardware providers.
AI microservices
In addition to Blackwell, Nvidia introduced GenAI microservices.
Microservices enable enterprises to create and deploy custom applications on their platforms.
The microservices are built on the Nvidia CUDA platform. The CUDA platform is a computing and programming model platform that works across all of Nvidia’s GPUs.
Included in microservices is Nvidia NIM (NeMo Inference Microservices). Nvidia NeMo is a service introduced last year that lets developers customize and deploy inferencing of LLMs.
NIMs enable optimized inference on more than two dozen AI models from Nvidia, AI21 Labs, Getty Images and Shutterstock, and open models from Google, Hugging Face, Meta, Microsoft, Mistral AI and Stability AI, according to Nvidia.
“It is a missing piece of the whole puzzle of GenAI,” Forrester Research analyst Charlie Dai said.
NIM is important in helping developers maximize the output of different GenAI models.
“This kind of offering, microservices, will power both developers and the ops team, together with the data engineers in the world,” Dai continued.
Microservices are building blocks that enable enterprises to build and deploy AI applications from an inferencing perspective, Dekate said.
Nvidia is creating a preconfigured and precontainerized microservice in users’ inference applications. With this, the company is creating a dependency for users from GPUs to software for inference and deployment, Dekate said.
“If you’re trying to leverage NIMs, you better be running Nvidia hardware for both training or for inference,” he said. “That is a stroke of genius, and no matter where you’re running it, whether it’s on premises, in the cloud or the edge.”
Cloud providers such as Google, Microsoft and AWS already offer a similar service for training and inferencing AI, but Nvidia’s well-known H100 GPUs are now synonymous with the training ecosystem.
“If you’re developing cloud-native applications or on-premises applications or edge applications, you are going to find Nvidia everywhere — not just from a hardware perspective, but also from the microservices and software perspective,” Dekate said. “That cannot be said for all of Nvidia’s competitors. That’s the advantage that Nvidia is trying to build off.”
Enterprises can now deploy NIM inside Nvidia AI Enterprise 5.0, the newest version of its AI platform, on Nvidia-certified systems and cloud providers.
Nvidia also introduced Nvidia Edify, a multimodal architecture for visual GenAI.
Shutterstock will roll out early access to an API built on Edify that lets users use text prompts or images to generate 3D objects for virtual scenes.
Nvidia also revealed that its metaverse platform, Nvidia Omniverse Cloud, will be available as APIs. The APIs are USD Render, USD Write, USD Query, USD Notify and Omniverse Channel.
Omniverse Cloud APIs will be available on Microsoft Azure later this year.
Esther Ajao is a TechTarget Editorial news writer and podcast host covering artificial intelligence software and systems.