NVIDIA this week confirmed it has acquired Run:ai, a provider of a workload management and orchestration platform based on Kubernetes that enables IT teams to make more efficient use of scarce graphical processor unit (GPU) resources.
NVIDIA will continue to make the Run:ai platform available as a standalone offering. The company also plans to deepen integration between the Run:ai platform and the NVIDIA DGX Cloud. That’s a service through which NVIDIA manages cloud infrastructure on behalf of enterprise IT organizations that build and deploy artificial intelligence (AI) applications on top of the NVIDIA GPUs and data processing units (DPUs).
The Run:ai platform is already integrated with NVIDIA DGX in addition to providing support for the NVIDIA DGX SuperPOD and NVIDIA Base Command platforms as well as NGC containers, and NVIDIA AI Enterprise software.
Run:ai enables IT teams to take advantage of container orchestration, so they can schedule AI workloads across multiple GPUs by adding a single code to a Kubernetes platform. IT teams can then prioritize workloads to maximize GPU utilization.
Paul Nashawaty, a practice lead of The Futurum Group, said the acquisition of Run:ai represents a strategic effort to address the complexity of AI. At a time when expensive GPUs are also challenging to find, there is a clear need to get more out of existing resources. By acquiring Run:ai, NVIDIA is addressing an issue that has proven to be especially vexing for data science teams building AI applications.
Kubernetes in recent years has emerged as the de facto default platform for training AI models because of its ability to dynamically scale IT infrastructure resources. NVIDIA is now moving to make it simpler to invoke those capabilities via the Run:ai platform. That’s critical: Many data science teams still find it challenging to master Kubernetes’s inherent complexities, which limits the number of people who can train AI models. Organizations running AI workloads on Kubernetes have typically extended the platform themselves to accommodate a set of workloads that Kubernetes was never designed to run.
In the long term, Kubernetes also serves to make it simpler to invoke multiple types of processors as additional advances are made. That may reduce the current dependency data science teams have on GPUs in an era where AI workloads will need to be deployed in the cloud, at the network edge and in on-premises IT environments, noted Nashawaty.
Ultimately, there will be templates based on a standard set of application programming interfaces (API) at the core Kubernetes platform. As more organizations look to operationalize AI technologies, it’s only a matter of time before data scientists are added to the DevOps teams, to contribute their expertise to creating enterprise-class applications. The issue, as always, is to make sure the inevitable friction created any time different cultures clash is kept to an absolute minimum to ensure applications continue to be built and deployed as quickly as possible.
Photo credit: Claudiu Constantin on Unsplash