One element growing in importance in the cloud-native stack is provisioning. As software delivery timelines accelerate, there is a greater need to quickly automate and configure computing environments. Applying standard configurations across a cluster of nodes can greatly ease deployment and maintenance hurdles. Thankfully, many tools now exist to help operators quickly configure computing resources.
Below, we continue our exploration of helpful Cloud Native Computing Foundation (CNCF) tools. This group of tools focuses on concepts around automation and configuration. Many of the CNCF tools in this area are Kubernetes-native, bringing automation to extend cloud-native architecture for edge computing, bare metal and AI/ML processing. As of 2022, there are nine incubating and sandbox projects CNCF projects that fall into this category.
KubeEdge
Kubernetes-native edge computing framework (project under CNCF)
Edge computing is becoming more commonplace as businesses seek to reduce costly data egress over the internet. Organizations may prefer to handle computing at the edge for security reasons, too. KubeEdge, which became an incubating CNCF project in 2020, helps extend the cloud-native capabilities that operators have come to expect to the edge. The framework can be used to help create an edge cloud computing ecosystem, handling unique constraints such as network reliability and resource limitations on edge nodes. Using KubeEdge, you can deploy ML/AI applications at the edge or scale highly distributed edge architectures.
Akri
A Kubernetes resource interface for the edge
Some operators may want to run Kubernetes across edge nodes. However, at the edge of a network, you may be supporting many devices that are too small to run Kubernetes themselves. These devices often have intermittent availability and use unique communication protocols. For example, ONVIF is a standard used by many IP cameras.
The Akri open source project is designed to help better discover and manage small edge devices, also known as leaf devices. Akri is built over the native Kubernetes Device Plugins framework. According to the documentation, Akri excels at “handling the dynamic appearance and disappearance of leaf devices.” At the time of writing, Akri is a sandbox project with the CNCF.
cdk8s
Cloud development kit for Kubernetes
One problem with Kubernetes is its complexity, which can cause a steep learning curve. Thus, being able to configure Kubernetes in the languages you are most familiar with is one way to decrease this barrier to entry.
CDK for Kubernetes (cdk8s) is a cloud development toolkit for Kubernetes that enables you to use the same language to build and configure your application.
Using cdk8s, you can define applications in TypeScript, JavaScript, Python, Java and Go. It then produces YAML, which can be applied to define Kubernetes applications for any cluster. This reduces the need to write a bunch of YAML and copy and paste templates; what some call YAML engineering. The tool was developed by developers at AWS and later was open sourced for all to use. It’s now a sandbox project within the CNCF.
Cloud Custodian
Apply standard rules and cost optimizations for the cloud
Cloud Custodian is a robust yet straightforward toolkit for applying standard policies across your cloud infrastructure. It’s a YAML domain-specific language (DSL) for defining policies that take management actions on cloud resources. The tool supports the three major cloud service providers: AWS, GCP and Azure.
Using Cloud Custodian, you can replace ad-hoc configurations with standard rules for things like security policies, access control, cloud cost optimizations and more. The documentation showcases many example policies that could be applied to your environment. For example, this policy will find any service running at 60% and increase the threshold by 30%:
policies:
- name: account-service-limits
resource: account
filters:
- type: service-limit
threshold: 60
actions:
- type: request-limit-increase
percent-increase: 30
The Cloud Custodian application itself is written in Python and can be run on most operating systems. To try it out, you can read the getting started guides for AWS, Azure and GCP.
KubeDL
A utility to easily run deep learning models on Kubernetes
KubeDL is another open source CNCF project that can be used to configure and run your machine learning workloads more easily using Kubernetes. KubeDL supports popular deep learning frameworks, including TensorFlow, PyTorch, XGBoost, Mars and MPI. These can all be run from a single controller.
For example, here is a sample of configuring a training job with Tensorflow:
apiVersion: training.kubedl.io/v1alpha1
kind: "TFJob"
metadata:
name: "mnist"
namespace: kubedl
spec:
cleanPodPolicy: None
tfReplicaSpecs:
Worker:
replicas: 1
restartPolicy: Never
template:
spec:
containers:
- name: tensorflow
image: kubedl/tf-mnist-with-summaries:1.0
command:
- "python"
- "/var/tf_mnist/mnist_with_summaries.py"
- "--log_dir=/train/logs"
- "--learning_rate=0.01"
- "--batch_size=150"
volumeMounts:
- mountPath: "/train"
name: "training"
resources:
limits:
cpu: 2048m
memory: 2Gi
requests:
cpu: 1024m
memory: 1Gi
volumes:
- name: "training"
hostPath:
path: /tmp/data
type: DirectoryOrCreate
Using KubeDL, you can manage models, track versions of models and auto-tune features to optimize how machine learning workloads are run in K8s. It also provides a way to store metadata from your projects, advanced scheduling features, the ability to sync files upon container launch and other capabilities. At the time of writing, KubeDL is a CNCF sandbox project.
Metal3.io
Bare metal host provisioning for Kubernetes
Metal3.io is a tool for provisioning Kubernetes on bare metal hosts. It offers a Kubernetes API to manage provisioning details on bare metal; the provisioning stack itself is run on Kubernetes. Metal3.io uses the concept of a BareMetalHost to define the host’s desired state, bare metal health statuses and provisioning details such as settings related to deploying an image.
Below is an example snippet taken from the documentation. Written in YAML, it’s a partial example of running a cluster of a BareMetalHost resource.
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
creationTimestamp: "2019-09-20T06:33:35Z"
finalizers:
- baremetalhost.metal3.io
generation: 2
name: bmo-controlplane-0
namespace: bmo-project
resourceVersion: "22642"
selfLink: /apis/metal3.io/v1alpha1/namespaces/bmo-project/baremetalhosts/bmo-controlplane-0
uid: 92b2f77a-db70-11e9-9db1-525400764849
spec:
bmc:
address: ipmi://10.10.57.19
credentialsName: bmo-controlplane-0-bmc-secret
bootMACAddress: 98:03:9b:61:80:48
consumerRef:
apiVersion: machine.openshift.io/v1beta1
kind: Machine
name: bmo-controlplane-0
namespace: bmo-project
externallyProvisioned: true
hardwareProfile: default
…
New features include pivoting as part of the CI workflow, which enables the movement of objects between clusters. At the time of writing, Metal3.io is a sandbox project within the CNCF.
OpenYurt
Extend K8s to the edge
OpenYurt is another tool to consider if you’re looking to bring cloud-native infrastructure like Kubernetes to the edge. It’s quite an extensible framework to bring cloud-native capabilities, such as elasticity, high availability, logging and DevOps into edge environments.
For example, OpenYurt provides self-healing capabilities so if a node connection goes offline, it can sync automatically once the connection is reinstated. It provides this and many more capabilities for edge service orchestration and leaf device management.
Many companies have used OpenYurt to extend native Kubernetes experience to edge environments across logistics, transportation, IoT, CDN, retail and manufacturing spaces. At the time of writing, OpenYurt is a sandbox project within the CNCF.
SuperEdge
Container management for edge computing
SuperEdge is yet another framework to extend Kubernetes to edge environments. Its core features set includes components like edge-health
, which runs on end nodes to detect their health. There’s also lite-apiserver
, a lightweight version of the Kubernetes API server that provides caching, authentication capabilities.
SuperEdge also uses a network tunnel to proxy requests between the cloud and the edge. By utilizing these proxies, the project bills itself as a non-intrusive tool for configuring edge devices. SuperEdge was created by Tencent cloud and is now a CNCF sandbox project.
Tinkerbell
A workflow engine for provisioning bare metal
Another utility designed to help provision bare metal is Tinkerbell, the open source bare metal provisioning engine maintained by Equinix. It’s comprised of five key microservices: A network server, a metadata service, an operating system installation environment and a workflow engine. The workflow engine, called Tink, is the main provisioning engine that communicates using gRPC and offers a CLI for developers to work with.
Tinkerbell is generic enough to work with any operating system and provides declarative APIs to programmatically control automation. And since Tinkerbell is supported by Equinix Metal, you can pretty much guarantee that the project will be actively maintained well into the future. Tinkerbell is a CNCF sandbox project. For more information, you can check out the docs here.
Final Thoughts
As you’ll notice, many of these tools can be run on Kubernetes, enabling you to manage your infrastructure in the same way you manage your applications. Since we’ve already seen so much investment into cloud configuration, much of the new development in this area is around configuring for alternative scenarios, like bare metal, edge and IoT.
The Cloud Native Computing Foundation (CNCF) has become a hub of excellent tools to support operations across the cloud-native stack. And outside of configuration, there are plenty of packages for service proxies, persistent storage, scheduling, CI/CD, and more.