Google this week launched an enterprise edition of its managed Google Kubernetes Engine (GKE) service through which it will manage fleets of clusters in addition to applying customer configurations and policy guardrails.
GKE Enterprise was announced at the Google Cloud Next 2023 conference. In addition to making it easier to isolate workloads, GKE Enterprise makes available security services, including workload vulnerability insights, governance and policy controls and a managed service mesh based on open source Istio software that Google initially co-developed.
GKE Enterprise has been integrated with Google Distributed Cloud, an integrated platform based on software and hardware from Google that enables IT teams to deploy applications distributed across public clouds and on-premises IT environments.
Finally, Google announced that GKE Enterprise also now supports Cloud TPU v5e instances to make it simpler to deploy complex artificial intelligence (AI) models on Kubernetes clusters. Google has been making a case for adopting its TPU ASIC to run neural networks to train generative AI applications.
Google and NVIDIA this week also announced the general availability of its latest A3 instances, powered by NVIDIA H100 Tensor Core graphical processor units. (GPUs).
Thomas Kurian, CEO for Google Cloud, told conference attendees that Google is already making extensive use of NVIDIA GPUs to train the foundational models that drive its generative AI models.
NVIDIA CEO Jensen Huang added that generative AI is revolutionizing every layer of the computing stack. The two companies are working together to reengineer cloud infrastructure to optimize it for generative AI models, he added.
It’s not clear whether cloud-native computing platforms such as Kubernetes are at the core of building and deploying AI models. At the very least, almost every cloud-native application will need to invoke multiple AI models via application programming interfaces (APIs). Inevitably, that will introduce some orchestration challenges that platforms such as Kubernetes are meant to address.
In the meantime, each organization will need to decide whether they prefer to manage Kubernetes clusters themselves or rely on a managed service that enables them to devote more resources to the development of applications and AI models. The tradeoff, historically, has been the limitations that managed services have imposed on DevOps workflows that tend to be fairly unique within each organization.
Of course, in the case of Kubernetes, the expertise required to manage these platforms is still hard to find and retain. As AI advances continue, it’s only a matter of time before more infrastructure management tasks are automated. It remains to be seen how quickly internal IT teams will benefit from the advances compared to a managed service provider that also happens to build and host the large language models (LLMs) used to create AI models.
Regardless of the outcome, it’s clear that many of the tasks that once made Kubernetes clusters challenging to manage are being automated due to higher levels of abstraction.