Hey HN, Nir, Gal and Tomer here. We’re open-sourcing a set of extensions we’ve built on top of OpenTelemetry that provide visibility into LLM applications - whether it be prompts, vector DBs and more. Here’s the repo:
https://github.com/traceloop/openllmetry.
There’s already a decent number of tools for LLM observability, some open-source and some not. But what we found was missing for all of them is that they were closed-protocol by design, vendor-locking you to use their observability platform or their proprietary framework for running your LLMs.
It’s still early in the gen-AI space so we think it’s the right time to define an open protocol for observability. So we built OpenLLMetry. It extends OpenTelemetry and provides instrumentations for LLM-specific libraries which automatically monitor and trace prompts, token usage, embeddings, etc.
Two key benefits with OpenTelemetry are (1) you can trace your entire system execution, not just the LLM (so you can see how requests to DBs, or other calls affect the overall result); and (2) you can connect to any monitoring platform—no need to adopt new tools. Install the SDK and plug it into Datadog, Sentry, or both. Or switch between them easily.
We’ve already built instrumentations for LLMs like OpenAI, Anthropic and Cohere, vector DBs like Pinecone and LLM Frameworks like LangChain and Haystack. And we’ve built an SDK that makes it easy to use all of these instrumentations in case you’re not too familiar with OpenTelemetry.
Everything is written in Python (with Typescript around the corner) and licensed with Apache-2.0.
We’re using this SDK for our own platform (Traceloop), but our hope is that OpenLLMetry can evolve and thrive independently, giving everyone (including our users) the power of choice. We’ll be working with the OpenTelemetry community to get this to become a first-class citizen of OpenTelemetry.
Would love to hear your thoughts and opinions!
Check it out -
Docs: https://www.traceloop.com/docs/python-sdk/introduction
Github: https://github.com/traceloop/openllmetry
What do you think is the key differentiator between you and everyone else? Is vendor lock-in really that huge of an issue?
[0] https://hegel-ai.com, https://www.vellum.ai/, https://www.parea.ai, http://baserun.ai, https://www.trychatter.ai, https://talc.ai, https://github.com/BerriAI/bettertest, https://langfuse.com