Enacting persistent storage of data is somewhat counterintuitive when it comes to containerization. The ephemeral container is a short-lived computing environment where code isn’t stored forever. But you still need to store data on a physical disc somewhere!
The highly variable nature of containers is at odds with the need for stateful storage, a quandary that has introduced myriad workarounds. To enable a stateful Kubernetes approach, teams typically must rely on external tools and databases to hold and transmit this data.
The Cloud Native Computing Foundation (CNCF) has become an excellent host of open source technologies to support our cloud-infused world. And when it comes to persistent data storage, this is no exception—CNCF has a wide range of tools that integrate with Kubernetes to help manage the administrative tasks of working with persistent storage volumes. Below, we’ll review some of these tools hosted by CNCF. These packages range from providing cloud-native storage, offering a standard interface between client applications and storage and providing data backup and recovery options. Let’s dive in.
1. Rook
Storage orchestration for Kubernetes
Persistent storage systems need a good amount of upkeep to maintain operations. Rook is an open source cloud-native storage utility for Kubernetes that aims to automate some of the tasks of a storage administrator, such as programmatic storage, migration, disaster recovery, monitoring and resource management. Rook supports file, block and object storage types. As this introduction video demonstrates, Rook actually leverages the very architecture of Kubernetes using special K8s operators. As of 2022, Rook, a graduated CNCF project, supports three storage providers—Ceph, Cassandra and NFS. Developers can check out the Rook forum here to keep up-to-date with the project and ask questions.
2. Longhorn
Cloud-native distributed storage built on and for Kubernetes
Longhorn is an open source tool for distributed block storage for Kubernetes. Using Longhorn, you replicate storage for Kubernetes clusters and take advantage of built-in incremental backups of persistent volumes. You can make these snapshots recurring and back them up to secondary object storage. According to the documentation, this works by “partitioning a large block storage controller into a number of smaller storage controllers,” thus helping alleviate the woes associated with storage for various container-based microservices. Longhorn is also compatible with non-cloud hosted K8s clusters and has a sleek graphical management UI that’s free to use. Similar to Rook, it is Kubernetes-native. Developed initially by Rancher, Longhorn is now an incubating project within the CNCF.
3. CubeFS
Cloud-native distributed file system and object store.
CubeFS, formerly known as ChubaoFS, is a distributed file system designed to support large-scale cloud-native architectures. One study found that CFS is around three times faster compared to Ceph. CubeFS functions by having client applications hosted within a container cluster speak to volumes that communicate with a metadata subsystem and data subsystem. These volumes can be deployed to various containers to enable simultaneous file sharing among many different clients. CubeFS’s advanced underlying metadata subsystem is itself distributed to increase performance and scalability. CubeFS could be used as a general-purpose storage engine for multi-tenant access or to ensure consistency for replicas of the same file. As the documentation notes, a distributed file system like CubeFS could especially aid the creation of machine learning models. A the time of writing, CubeFS is an incubating project within the CNCF.
4. K8up
Kubernetes and OpenShift backup operator
K8up, lovingly nicknamed “ketchup” by its creators, is a Kubernetes operator for performing backups. Conveniently distributable via a Helm chart, K8up is simple to deploy and customize for specific cloud-native backup use cases. K8up can be used to automatically backup any persistent volume claim (PVC) marked as ReadWriteMany
or with a custom label. You can also use K8up to initiate on-demand backups, schedule routine backups, schedule long-term archival and view and manage backups. K8up works with S3-compatible storage. At the time of writing, K8up is a sandbox project with the CNCF.
5. OpenEBS
Open source container-attached storage (CAS)
OpenEBS is another open-source project aiming to help simplify the process of maintaining stateful workloads with cloud-native infrastructure. With OpenEBS, developers can use familiar K8s commands and APIs to control the storage of workloads for particular containers. The storage software itself is containerized and orchestrated by Kubernetes. The project refers to this setup as container-attached storage (CAS). Initially created and sponsored by MayaData, OpenEBS is a sandbox project at CNCF at the time of writing.
6. ORAS
OCI registry-as-storage
This one requires some explanation as it’s a bit more nuanced. You’re probably familiar with Open Container Initiative (OCI), the group that sets industry standard formats for containers. One such format is the distribution specification, which defines a standard way to store, process and pull container images. Well, developers have begun to use OCI registries to store non-container types, too. As such, OCI artifacts were created to define these arbitrary storage types. Finally, OCI registry-as-storage (ORAS) is a utility that specifically helps push and pull these generic OCI artifacts from OCI registries. To date, there has been little implementation of ORAS. The documentation only cites Singularity and Helm projects as current implementations. ORAS is a sandbox project with CNCF.
7. Piraeus Datastore
High available datastore for Kubernetes
Piraeus is an open source cloud-native storage system designed to work with Kubernetes local persistent volumes. The utility provides features like dynamic provisioning, resource management and high availability, enabling a failover process for stateful workloads. Piraeus is pretty easy to use compared to others on this list, and getting started only takes a couple of commands. Piraeus is a good option to consider if your project is just working with local storage. At the time of writing, Piraeus is a sandbox project with the CNCF.
8. Vineyard
An in-memory immutable data manager
Unlike others on this list, Vineyard (v6d) is unique in that it’s focused on in-memory data storage. Vineyard is suitable for large data systems as it uses zero-copy data sharing to reduce redundant processing. It provides an abstracted way to work with multiple computation frameworks that might utilize graph databases. At the time of writing, Vineyard is a sandbox CNCF project.
Final Thoughts: Cloud-Native Storage Tools
To enact persistent storage in Kubernetes, one must define a persistent volume, of which there are many classes for various storage types. For example, you use local storage and could point to a specific folder on the host where Kubernetes is running, but this isn’t always a best practice, as you often need to share storage across nodes. Running an NFS server is one option, but most use cases will want to bake-in cloud storage as a persistent volume.
No matter what infrastructure they’re working on, engineers and ITOps need easy access to store and retrieve data. And to reap the full benefits of a cloud-native ecosystem, it’s important for storage to be decoupled from the end node and smartly orchestrated throughout a container ecosystem. As we’ve seen above, there are many packages within the CNCF attempting to streamline the process of uniting Kubernetes with persistent, stateful storage.