Observability is becoming more and more critical for overseeing the health of large cloud-native software ecosystems. With better access to data like logs, metrics and traces, engineers can better maintain their systems and reduce mean-time-to-recovery (MTTR) when issues arise.
Thankfully, implementing observability is becoming more accessible. Many open source projects now exist, such as Prometheus, Jaeger and Fluentd, to help engineers introduce various aspects of observability into their workflows.
CNCF recently conducted a microsurvey, Cloud Native Observability: Hurdles Remain to Understanding the Health of Systems. The study polled 186 respondents on their use of cloud-native observability. Though the report has a small sample size, it nonetheless sheds some light on the status of observability within the larger industry.
Below, I’ll highlight the key takeaways from the study. We’ll examine the top tools in use today and consider common obstacles engineers face as they set their sights on making software systems more observable.
Trending Tools for Cloud-Native Observability
Prometheus is, hands down, the most-adopted tool for implementing cloud-native observability. The report found 86% use Prometheus, the popular graduated CNCF project, to power monitoring and alerting systems in many large-scale production environments. It also offers a highly-queryable time-series database.
Other popular cloud-native tools for observability include OpenTelemetry (49%), Fluentd (46%) and Jaeger (39%). Other less common tools in use today include OpenTracing, Cortex and Open Metrics.
The proliferation of observability has ushered in many different ways to integrate applications and track metrics. As such, most teams use multiple observability tools simultaneously for various purposes, such as monitoring or gathering logging and tracing data. In fact, 72% of respondents employ up to nine different tools to accomplish these goals. Over one-fifth of respondents cite using between 10 and 15 tools.
Ongoing Challenges of Observability Projects
Engineers continue to rely mainly on open source projects within their cloud-native stack. For example, we’ve noted a rise in open source tooling running on Kubernetes. Yet setting up and maintaining technology for observability doesn’t come without its hangups, especially when open source is involved.
A top ongoing concern is the sheer complexity of observability projects. A full 41% of survey respondents say observability projects are too complex to understand or run. Other top problems include projects lacking sufficient documentation (36%), worries that open source projects may become inactive (26%) and installation difficulties (17%). All these issues underpin the need for observability software that is mature and continually supported by an active community.
The sheer number of tools also adds to the complexity of implementing observability. Just over half (51%) of respondents say that engineers and teams using multiple tools is a top challenge. When the number of components rises, it’s more challenging to handle integration and interoperability. Other top roadblocks include a shortage of necessary skills (40%), silos between teams (36%) and a lack of resources (35%).
Interestingly, the option to purchase commercial support was ranked highest in importance when selecting observability tools. We may toot the open source horn all day, but the data suggests that teams like having the security blanket of licensed software with high SLAs within reach, at least where observability is concerned.
Deployment Patterns and Goals
The CNCF microsurvey also reveals deployment patterns related to observability. The most common way to deploy observability tools is to self-manage them on the public cloud. This is done by 64% of organizations. But many engineers use it as a service on the public cloud (44%) or as a self-managed instance on-premises (40%).
The study also questioned respondents on their top priorities and overall DevOps goals. The top priority for most organizations is to continue developing best practices. Providing engineers with the tools they need to identify issues and quickly respond is also a pressing concern.
Observability tools are great for exposing all sorts of data. The study found that analytics, profiles, crash dumps and events rank as some of the most valuable data types to track. Just as necessary is data related to diagnostics, traces, alerts, logs, metrics and monitoring.
Lastly, with technology sprawl becoming a more prevalent issue, establishing a single, unified view of the technology stack is crucial to understanding how many different components interact. Observability is a key piece of the puzzle, helping to centralize data analysis and telemetry from various applications and networks.
Final Thoughts
An emphasis on observability can help development and operations teams maintain healthier systems and increase the overall availability of distributed systems. Observability can direct recovery attempts, inform A/B testing and influence the daily life of an SRE.
Yet, as the CNCF microsurvey demonstrates, some hurdles still do exist when implementing an observability strategy. Hopefully, these barriers will eventually lower as offerings mature and standards, like OpenTracing and OpenTelemetry, become more established.