CAST AI today published a report that shines a light on the massive amount of infrastructure resources being wasted by Kubernetes clusters. Organizations, on average, only use 13% of provisioned CPUs and 20% of memory when deploying clusters that have 50 to 1,000 processors, according to the report.
Based on an analysis of more than 4,000 clusters running on Amazon Web Services (AWS), Google Cloud Platform (GCP) and Microsoft Azure between January 1 and December 31, 2023, the report finds overprovisioning of infrastructure resources coupled with low utilization of spot instances on cloud services is leading to a massive waste of resources.
Laurent Gil, chief product officer for CAST AI, said the larger the Kubernetes environment, the lower the waste. For example, the report finds large clusters with 1,000 to 30,000 processors, on average, only use 17% of provisioned CPUs.
Overall, the report finds CPU utilization rates vary little between AWS and Azure, with both at 11%. Wastage on GCP, at 17%, is slightly less. Memory utilization across GCP (18%), AWS (20%) and Azure (22%) is also relatively similar, the report finds.
This is becoming a more significant issue as the total cost of cloud services continues to rise. For example, the report finds spot instance pricing across the six most popular instances for US-East and US-West regions (excluding Gov regions) increased 23% between 2022 and 2023.
The core issue is that organizations, especially finance teams, lack visibility into the consumption of IT resources that have become too complex for IT teams to effectively manually manage, said Gil. The fact that more compute resources are being provisioned in the cloud than is absolutely required is an open secret that results in a lot of wasted IT spending, he noted.
CAST AI is making a case for using machine learning algorithms and other forms of AI to automatically rightsize IT environments. Given the complexity of modern IT environments, it’s not reasonable to expect IT professionals to cost-effectively optimize highly dynamic IT environments without help from AI, said Gil.
In theory, Kubernetes environments are supposed to scale up and down as application requirements change. In practice, developers are more concerned about ensuring application availability than cost, so there is a natural tendency to overprovision IT infrastructure.
As the number of Kubernetes clusters deployed in production environments continues to increase steadily, more attention is starting to be paid to cost optimization. IT teams are under considerable pressure to rein in costs in an uncertain economic environment, noted Gil.
Many of those IT teams are, as a result, embracing platform engineering as a methodology for applying best DevOps practices to rein in those costs.
Arguably, there would be greater adoption of Kubernetes if the platform was easier to manage. Kubernetes itself was designed by software engineers for other software engineers. Despite the rise of DevOps to programmatically manage IT environments, most IT teams are still made up of administrators who have limited programming expertise. The only way to make up for that lack of expertise is to rely more on AI to automate as many routine management issues, including cost control, as possible.