Announcer: This is Techstrong TV.

Mitch Ashley: And we’re back at KubeCon 2023 here in Chicago with a great conversation. We’re going to take a little bit of a different track here. I’m joined by Morgan McLean, who’s director of product management with Splunk.

Morgan McLean: Yes.

Mitch Ashley: And we’re talking about some Splunk, but really talking about one of the roles that you have with the CNCF with OpenTelemetry.

Morgan McLean: Yeah, so I’m one of the co-founders of OpenTelemetry. I’ve been with the project, by definition, since its beginning, since 2019 when we announced it. OpenTelemetry was formed by the merger of two pre-existing open-source projects: OpenCensus and OpenTracing. Those are both deprecated now, and so really, I’ve been working on open-source solutions in this space since around 2016, 2017.

Mitch Ashley: It’s amazing that it’s been that long, because I remember reading about, oh, OpenTracing is now part of…

Morgan McLean: OpenTelemetry. Yes.

Mitch Ashley: And to make that announcement is one thing. To get to the point where it’s been deprecated…

Morgan McLean: Yes.

Mitch Ashley: That’s a huge, huge effort.

Morgan McLean: Massive. And it’s funny. At the time, we thought OpenCensus and OpenTracing were relatively big and successful. And they were, in their own way. But OpenTelemetry so rapidly dwarfed them in terms of market penetration and success and features and everything else that the fact that they were deprecated relatively recently is interesting in the context of 2019 and the announcement of fulfillment of our original vision. And yet, in some ways, today, kind of a small deal, because OpenTelemetry is so ubiquitous and so successful.

Mitch Ashley: So, rolling back history a little bit, what did you think was possible? What were you hoping for with OpenTelemetry when you were first starting out? I imagine it wasn’t like, oh, we’re going to have all these technology vendors contributing.

Morgan McLean: If you had told me then that within a year of announcing Open Telemetry, it’d be the second most active project in the CNCF, so the second-highest number of monthly developers, I would’ve assumed you were lying. That a project just focused on data extraction could become on the same order of magnitude, as big in terms of contributions and code, as something like Kubernetes, this compute platform that powers billions of dollars of industry every month… it’s quite incredible, and it’s certainly been really exciting to be part of it.

Mitch Ashley: I think it’s a model, at least one model for open-source success, when the contributors will also say, “Okay. We know that. We do that, too, on our own stuff, but this is more valuable, for all of us to do the same thing.”

Morgan McLean: Exactly. And if you go back in time to that era, before OpenTelemetry got big, you had these established vendors in the space. Particularly in APM, because for APM, you require integrations of actual applications. So to contrast with, say, logging, or even system metrics, where you generally have integrations with Windows and Linux and maybe a handful of third-party apps that people run to do distributed tracing for APM, you need integrations with every language runtime and every single client library that people use on those runtimes. There’s hundreds of thousands of these integration points that people need.

No one company was ever going to be able to provide that, let alone maintain it, even if they built it in the first place. That’s where OpenTelemetry… their success is being very clear, not only for end users, the people who want deep visibility in their own services, where they want to use OpenTelemetry, export the data, but even the vendors who had existing offerings, or components of their offerings, and their agents have very rapidly adopted OpenTelemetry, because it’s almost impossible to go it alone. You’ll have support for, say, Java and .Net, but not maybe Python or something else.

I think OpenTelemetry has really shown everybody the way, and it’s been lovely in that the project has been so collaborative, both between the end users and all the vendors in this space, because it’s scoped just on data extraction, data collection from infrastructure and services. No one’s commercial interests are really threatened by it, and it really just has helped everyone. It’s been a very honestly positive success story all around.

Mitch Ashley: When you can pick a domain where you say, “Is that really where we want to differentiate supporting another agent or another environment?”

Morgan McLean: Yes. And the answer is often no. In this case, it’s almost exclusively, across the industry, been no. Which is good for everybody.

Mitch Ashley: So you’re on the governance committee, governance board, right?

Morgan McLean: Yeah. So OpenTelemetry is divided into many different groups. At the lowest level… lowest might not be the right word, but that’s the most project-oriented level. We have SIG, special interest groups, and those maintain the different parts of it. So there’s the instrumentation for Java, or for .Net, or for Python. Each of those is a different SIG. And then the Collector is our large agent for OpenTelemetry that has its own SIG.

The committees that sit above that, or distinct from that, is: There’s the technical committee, formed of the most senior engineers, the people who can dedicate the most time to the project. They oversee the overall spec that says, this is what each of these language instrumentations should do. This is what the agent should do, or this is a metric. This is how you can operate on them. This is how a metric that describes, say, an HTTP request from a given service, this is the sort of data, the payload, it should have. The technical committee does that.

Then there’s the overall governance committee that just steers the direction of the project. I’ve been on that one since the beginning of the project, and it’s been very good.

Mitch Ashley: Plus, you get to see the whole thing, the whole picture, right?

Morgan McLean: Correct.

Mitch Ashley: Maybe help shape the evolution…

Morgan McLean: Yeah. Granted, there’s maintainers and other people in the community who also have very broad visibility. That’s not exclusive to the governance committee.

Mitch Ashley: To that point, it’s not like it’s a top-down effort. No open-source is, by definition, right?

Morgan McLean: Yeah. And it’s a funny thing. People will come up at KubeCon and say, “What’s the roadmap for OpenTelemetry?” And we do have an official roadmap. You can go on GitHub and see it. It’s there, and we can talk a bit about the things on it, but the roadmap in open-source is only so good as the willingness of the community to actually go and build those things.

It’s one thing for the governance committee, or me, because I’m generally the one driving a lot of this, to go and say, “The big focus next year, one of them will be adding profiling as the fourth signal type.” Or this year, logging as the third signal type. It’s one thing for us to have that vision and spell it out. And in a company, you hire people and have managers and chains of control that go implement that vision. But in open-source… OpenTelemetry is not a company, and it’s not a single-vendor project. There’s dozens of vendors deeply involved, which is a very positive thing, by the way.

So it’s one thing for me to say, “This is our vision,” but we need people to actually show up and do these. And so it’s always been interesting, building that roadmap. This mix of: This is what our end users want, our stakeholders want, but also, this is what the community and the developers are actually willing and able to deliver and implement on it.

Mitch Ashley: I almost equate it to a shared vision, right?

Morgan McLean: It is.

Mitch Ashley: Because no one person has the full…

Morgan McLean: Correct.

Mitch Ashley: Or creates the full thing, right?

Morgan McLean: And no one stakeholder, or cohort of stakeholders, even has that full…

Mitch Ashley: Other people wouldn’t participate if one or other company was like, “We’re it, and you all follow us.” Yeah, they’re not going to do that.

Morgan McLean: It’s an interesting balancing act. When the project started, I think there were concerns. Less for me, but I think externally, of… there’s a lot of vendors involved in OpenTelemetry, and there were early on, so people were concerned it would be this sort of vendorfest. That hasn’t really happened, and I think that’s because it’s so focused on data extraction.

Again, most of the companies are happy to get more data out of OpenTelemetry, so there hasn’t been any walking a tightrope between vendors. That’s actually been a really positive relationship. But still, you have to balance what people are actually willing to sit down and spend their time developing versus what you think the project needs overall. I think, generally, we’ve done a good job of achieving that.

Mitch Ashley: Well, I mean, to have the support you do, I think that’s a good testament that it’s working, right?

Morgan McLean: We set a milestone. We have 1100 monthly active developers in the community. So every month…

Mitch Ashley: How many?

Morgan McLean: 1100.

Mitch Ashley: 1100. Wow.

Morgan McLean: So every month, there’s at least 1100 people who are either checking in code, making pull requests, reviewing pull requests, activity like that. It goes beyond just basic comments.

And that’s huge. That’s almost exactly half the size of Kubernetes every month. And Kubernetes is, again, this massive, huge compute platform. To have achieved that for a project that fundamentally is just focused on data extraction, data collection, data pre-processing, is very, very impressive. I think it does speak to the community that we’ve built and the positivity and inclusiveness that we’ve…

Mitch Ashley: So what are things that a governance committee deals with that you aren’t going to handle or couldn’t only work on as an individual group or component?

Morgan McLean: Honestly, not much. And I say this in a loving way. The governance…

Mitch Ashley: Which is a good thing, right?

Morgan McLean: Yeah.

Mitch Ashley: Otherwise, not everybody’s going to be happy.

Morgan McLean: If you had an open-source project where the governors of it were constantly meddling in everything or dictating things… there are some examples for something that’s so big where, yes, at times you do need to bring everyone together and drive a very specific direction. But in OpenTelemetry, as I’ve seen in other communities in CNCF, I think the fact that the governance committee generally doesn’t have to step in that much is actually really positive.

Again, it speaks to the collaborative nature of the community, as well as to the systems that we set up. If you’re maintaining the Java SDK, right? You have a spec that you come and implement, but that spec is not defined by the governance committee. It’s defined by the spec group, which is mostly technical committee members as well as all the people who are implementing it. The fact that it’s not some super top-down thing, where the governance committee can focus a bit more on project direction or end user feedback or showing up at conferences like this, I think it’s very positive.

Mitch Ashley: Going back a little ways, but when OpenTracing, bringing that, adopting that in, has that come bottom-up? Is this something we should do, and then approved by the government, or is it come…

Morgan McLean: Well, that was almost the other way around.

Mitch Ashley: You approached… sometimes that happens.

Morgan McLean: So OpenTelemetry only exists because OpenCensus and OpenTracing merged.

Mitch Ashley: That’s true. Good point.

Morgan McLean: It was literally a group of meetings where… I was working on OpenCensus. We had, like, Ben Sigelman and Ted from Lightstep who were working on OpenTracing. And various others, I don’t mean to diminish the group, were coming in and saying, “There’s two projects here. They’re maybe not technically in direct competition, but they effectively are.” And because there’s two different APIs, two different implementations, two different ways to extract this data from applications, no one’s building the integrations, the instrumentation needed on either of them to make it really successful.

Because, if you’re maintaining a database or a language runtime or something, and an end user comes and says, “Hey, can you implement tracing or metrics or some way for me to extract my data and send it anywhere?” And you see there’s two competing APIs or implementations of this. You’re often just going to stare at this and close the issue. It’s not that you’re going to even pick one, and that half the people will pick one and half pick the other. That would be bad, but it’s even worse where you just do nothing. You stare at it and go like, “This is kind of a joke. I don’t need this right now.” And you move on to doing something totally different.

And that’s where OpenTelemetry… the fact that it was generated out of both of those and created the single standard for instrumentation, for data collection, data extraction made it very, very successful. Yes, there’s the super positive community behind it. Yes, there’s the 1100 developers every month working on it. Those are all massive and big. But if nothing else, OpenTelemetry made sure there was a single sane option in this space that people could rely on. That alone accounts for a huge amount of the project’s success.

Mitch Ashley: So you’re wearing a couple different hats.

Morgan McLean: Yeah.

Mitch Ashley: In some ways, you could say, sometime they might be in conflict, right? What you might want as the director of product management for Splunk might not be exactly what the rest of the community wants.

Morgan McLean: In theory, that’s very possible. And I mentioned there was some concern about that.

Mitch Ashley: I’m not accusing you of anything. I just, I could see a scenario where…

Morgan McLean: Yeah. I think one of the smartest things we did with OpenTelemetry is kept the project focused strictly on extracting data from applications and infrastructure.

OpenTelemetry is not a backend. It’s not a replacement for Prometheus or Splunk or Datadog or Dynatrace or Jaeger or anything else. It doesn’t do that. All it does is take your data out, ensure it’s nicely formatted, and you can send it to the locations you want. And because of that, the commercial interests are generally not threatened. Because the commercial interest for these companies and for the open-source solutions here, like Prometheus and Jaeger, is just like, “We want to make this as easy as possible for people to get their data out.” And we want it to be well-structured, right? We want it to be sensible. We want HTTP metric from this service written in Java to be correlatable with this one for a service written in .Net, to have the same attributes and things on them. OpenTelemetry achieves that. That’s all those vendors wanted.

And the other thing is, I don’t think any of them were satisfied with their own offerings, historically. Right? If you wind the clock back to 2016, when I was getting into the space, the incumbent APM vendors, you’d go to them and if you had, say, a Java application or .Net application, they’d say, “Yep, we work with that.” And you’d buy their product and they’d still have to send a bunch of field people out to go make the integration work. And meanwhile, if you came to them with, say, PHP… I’m making up an example. They might’ve had good PHP support, but PHP, or for some of them, maybe Ruby. They would say, “You can’t use our products. It doesn’t work with that.”

So I don’t think any of them were particularly happy with the solutions they had built. There were customers that they had to turn away, or customers for whom the experience was always kind of bad.

And secondly, they were spending a ton of money on agentry. These were big teams of very experienced engineers doing automatic instrumentation for a Java application or… pick your language of choice. It’s not a trivial thing. You need to hire people with deep, intimate knowledge of the JVM, or the CLR for .Net, or the Python runtime for Python, and so on. So they’re spending a lot of money on this. It wasn’t working for them, and it was shrinking the overall market, or at least not letting the market grow.

OpenTelemetry has, in many ways, sort of uncorked the bottle. Those firms, in theory, don’t need to spend as much on instrumentation as they did in the past. The data comes out in a beautiful, well-structured way, so it’s always consistent, and better yet, their customers can quickly adopt it and they can switch solutions if they want. We did find a way, by scoping the project down, to make it so that the interests of some of the vendors who are backing it aren’t in conflict with it.

I will also say, with OpenTelemetry in the last few years… early on, it was Google, Microsoft, Splunk, Lightstep, a bunch of observability or cloud vendors with observability offerings. Some of the biggest contributors now are end users, like companies that are making money off OpenTelemetry, and that it makes their own services more observable internally. It makes them more reliable. It’s companies that are getting so much value out of OpenTelemetry that they’re putting engineers back on the project to make it even better. And we’ve seen more and more of that every single year.

Mitch Ashley: It’s interesting how it’s evolved to kind of an op-centric model, if you will, from APM to securities, the domain. And while adopted, Splunk’s had great success there.

Morgan McLean: It has, yeah.

Mitch Ashley: Domain-driven design in applications and architecture, adopting, using OpenTelemetry within your applications, through your stack.

Morgan McLean: Right.

Mitch Ashley: I’m curious. Do you see AI as another domain that can benefit from using OpenTelemetry?

Morgan McLean: Potentially. I am not an AI expert by any means, but when you’re analyzing reams of data, because you’re building a large language model or something, or any other type of AI, right? So much of our current ML technology is based on massive data being piped in for training. It helps a lot when that data is well-structured, so yeah. OpenTelemetry probably isn’t necessarily directly helping build a large language model that’s based off of human text on Wikipedia or Reddit or something, because that’s human-readable text. The whole point of that thing is to parse it. But for any ML model or AI work that’s modeling the type of data you would get from OpenTelemetry: Metrics, traces, logs, machine data, really. Then yeah, it’s incredibly beneficial, because that data is well-structured.

I’ll take logging as an example. We’re announcing tomorrow at this conference that OpenTelemetry logging is GA. This is the third major signal type in OpenTelemetry. This is a big deal because we have metrics already. We had traces already. Now we’re adding logs. Those are the three data types that most firms are focused on capturing from their services, which means they can just exclusively rely on OpenTelemetry if they want.

This is particularly important for logs, though, because logs, historically, are generally unstructured. And when we started working on logging in OpenTelemetry, we wanted to only go into it if we felt we could actually improve things. We talked to large companies that write a lot of logs or vendors. They had two major problems today. One is the performance of the logging agents. Well, that’s somewhat fixable and not particularly relevant to the AI conversation. But the other is, logs often come in with wildly different formats. You have a service over here written by one team and one over here written by another team. Your timestamps come in differently. This one might have the right host information on it; this one doesn’t, or it’s structured differently.

Mitch Ashley: Level of specificity in what the message is about.

Morgan McLean: It’s all over the place. Or even someone structured the key for timestamp like time-dash-stamp, and the other is timestamp without the hyphen, so you have to do a ton of pre-processing to make that work. Or in many cases, you don’t, and you can’t query it properly.

Well, from an ops perspective, this is incredibly annoying, but at least there’s some fields where you can spend a bunch of money, do a ton of work either in your queries or pre-processing to format it consistently. But if you’re trying to train an AI system… and again, not like a large language model that’s trying to learn human language.

Mitch Ashley: Like machine learning or something, yeah.

Morgan McLean: Yeah. Some ML system that’s going to alert you to anomalies or something. Well, the fact that your data is totally wildly differently structured is really going to limit your effectiveness, right? With OpenTelemetry, this data’s coming in, even your logs is coming in properly structured. It actually has a data model in OpenTelemetry. So not only is it more performant to capture and pre-process, if you even need to, but it’s coming in where all your timestamps, your machine IDs, your service information, all of the additional metadata you might have about certain interactions or events that generated those logs are all the same. So now you can have success much, much earlier, because you can actually focus on training that AI model, training that system, instead of spending all of your time formatting things and running out of time to actually do the real AI work.

So no, I think it’ll be very effective. Caveat: Again, I’m not some AI expert. It’s a massive benefit.

Mitch Ashley: That’s a whole other domain, but it seems like the common denominator for everything is everything produces log messages.

Morgan McLean: Well, there’s sort of examples I can draw. There was a big push a few years ago amongst various AI companies, nothing to do with cloud ops and stuff, but to do AI analytics of health records. To say, “We’ll look at these health records of a million people and we know the ones who had a certain type of cancer and the ones who didn’t. And so we’ll see if we can find a pattern in their health records where we could then, in the future, look at other records and say, ‘Hey, maybe you should get checked out for this certain type of cancer, because you have some similar symptoms or things you went to the doctor for.'” And my impression is that most of those efforts failed because the health records are written by people. They’re all over the place.

Mitch Ashley: Have you seen a doctor’s handwriting? More or less, what’s in it… [inaudible 00:19:45]

Morgan McLean: Even once you parse the handwriting, it’s short-form notes.

Mitch Ashley: The details of what’s put in… yeah.

Morgan McLean: My impression is, none of those efforts, or very few, were successful. At least the ones I know about. There’s a corollary there. It’s a similar problem where, if your data coming in isn’t particularly useful, either because it’s misshapen or it’s just missing things to drive insights, even with a very effective AI model, or even, frankly, people looking at it, it’s going to be really limited. So the same benefits that OpenTelemetry brings to users of Splunk or Prometheus or Grafana or whatever… choose your observability solution. The same benefits it brings to them, because of structured data making their queries actually work and everything else, will also apply to any AI or ML models built on top of this.

Mitch Ashley: Great. I think it’s pretty obvious, but if someone wasn’t quite clear, why would Splunk and all the other companies dedicate the amount of time and resource… not yourself, but there’s a lot more to it, right? It isn’t like you’re the sole contributor here to what’s happening.

Morgan McLean: No, neither I nor Splunk. OpenTelemetry is proudly a true open-source, open-license, open-governance project where there’s a huge number of different contributors. Splunk is probably one of the more prominent ones, but you’ve got Microsoft and Google and Dynatrace and Lightstep, or I guess ServiceNow now, and various other companies and vendors, Amazon, who are all putting in a ton of time and a ton of effort on the project. And that’s good. And the CNCF owns the license and trademark for it, so there’s not going to be any licensing shenanigans over it like we’ve seen at times in other communities.

Mitch Ashley: That’s nice.

Morgan McLean: Yeah, it is.

Mitch Ashley: Intellectual property issues, all that.

Morgan McLean: It’s all above board. It’s open-governance. We’re all elected. There’s no permanent positions. It is a true success story. A few years ago, that probably wasn’t that uncommon, but I think, given some of the changes in the industry recently, that’s a very nice thing, very stable thing to point at.

Mitch Ashley: Strange things are happening, yeah.

Morgan McLean: Yeah. And I think it goes back to what we were talking about earlier, where the commercial interests are almost perfectly aligned with what the end users want here, which is great.

Mitch Ashley: And it seems like carving out that lane, everybody’s clear, this is what we’re doing.

Morgan McLean: It’s very critical.

Mitch Ashley: We’re not going to venture into what we all do.

Morgan McLean: Yeah.

Mitch Ashley: Makes a lot of sense.

Morgan McLean: Because you see other projects like… you can take the extreme example of open-core projects where a company has an open-source version that’s kind of lightweight, and then the enterprise version that has more features. If you go to the open-source version and make submissions that’ll make it too good, they have an incentive to make those go away, because it competes with a commercial offering. With OpenTelemetry, not to belabor the point, but because it’s not doing analytics or anything else, that doesn’t really exist.

Mitch Ashley: Well, congrats on the success and longevity you’ve had with the project.

Morgan McLean: We’re expecting even more growth over the coming year. Like I mentioned, logging is going GA, or is GA now. For a lot of companies, I think that was… first off, OpenTelemetry is super successful in the industry, but I think there’s a lot for whom they looked at it and said, “There’s tracing and metrics. It doesn’t solve my full story for logging. Maybe I’ll adopt it for those two, but not the whole thing, but maybe I won’t.” With logging, it’s the clear solution. It’s the way you can extract all of that data you have today, and over the next year, that’s going to get further extended. Profiling is getting added, the fourth signal type that allows deep analytics into compute, into where you’re spending money on given function calls that might be costing you a lot of performance.

We’re also extending OpenTelemetry to more and more platforms. Projects historically have been focused on backend services and infrastructure, typically in cloud environments, but not exclusively. We’re adding support for client applications for Android and JavaScript for web browsers and others. It’s really going to extend OpenTelemetry out of just the data center and into client applications. Even, in theory, into embedded systems, and that’s very, very exciting.

There’s work that’s even started recently on mainframe support for OpenTelemetry. This, to me, is just really the full realization of the vision of, you want observability, you want insight into your entire estate that you’ve built. Well, that goes all the way from customer interactions to finding out why they’re slow or why they might be broken, all the way down to the smallest of backend surfaces or infrastructure.

Mitch Ashley: It’s a great example of why constrain it to one person’s vision if you can get a shared vision?

Morgan McLean: Correct.

Mitch Ashley: It’s amazing what you can…

Morgan McLean: And fairness is always part of the vision, but yeah.

Mitch Ashley: Everybody does. Right. But this is a great example of when you have a shared vision, shared outcomes…

Morgan McLean: And it’s expertise. There’s a lot of people from IBM and Broadcom who want to work on the mainframe part. This is fantastic. They’re the experts at that.

Mitch Ashley: I’m getting the hook here from my guys, so…

Morgan McLean: There’s a lot of great things happening. OpenTelemetry has already been very successful. I think, over the next year and a bit, we’re going to see that grow even faster.

Mitch Ashley: We’re looking forward to more great things.

Morgan McLean: Yeah, likewise.

Mitch Ashley: Thanks for joining us. Morgan McLean, who is director of product management, as well as on the governing board…

Morgan McLean: Yes, and one of the co-founders of OpenTelemetry.

Mitch Ashley: And one of the co-founders, absolutely, of OpenTelemetry. I wanted to say Tracing. Thank you.

Morgan McLean: Thank you.

Mitch Ashley: I appreciate you joining us. We will be back with another great guest. I don’t know as much longevity with an open-source project, but another great guest nonetheless. We’ll see you in a few minutes.