Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: OpenLLMetry – OpenTelemetry-based observability for LLMs (github.com/traceloop)
154 points by nirga on Oct 11, 2023 | hide | past | favorite | 31 comments
Hey HN, Nir, Gal and Tomer here. We’re open-sourcing a set of extensions we’ve built on top of OpenTelemetry that provide visibility into LLM applications - whether it be prompts, vector DBs and more. Here’s the repo: https://github.com/traceloop/openllmetry.

There’s already a decent number of tools for LLM observability, some open-source and some not. But what we found was missing for all of them is that they were closed-protocol by design, vendor-locking you to use their observability platform or their proprietary framework for running your LLMs.

It’s still early in the gen-AI space so we think it’s the right time to define an open protocol for observability. So we built OpenLLMetry. It extends OpenTelemetry and provides instrumentations for LLM-specific libraries which automatically monitor and trace prompts, token usage, embeddings, etc.

Two key benefits with OpenTelemetry are (1) you can trace your entire system execution, not just the LLM (so you can see how requests to DBs, or other calls affect the overall result); and (2) you can connect to any monitoring platform—no need to adopt new tools. Install the SDK and plug it into Datadog, Sentry, or both. Or switch between them easily.

We’ve already built instrumentations for LLMs like OpenAI, Anthropic and Cohere, vector DBs like Pinecone and LLM Frameworks like LangChain and Haystack. And we’ve built an SDK that makes it easy to use all of these instrumentations in case you’re not too familiar with OpenTelemetry.

Everything is written in Python (with Typescript around the corner) and licensed with Apache-2.0.

We’re using this SDK for our own platform (Traceloop), but our hope is that OpenLLMetry can evolve and thrive independently, giving everyone (including our users) the power of choice. We’ll be working with the OpenTelemetry community to get this to become a first-class citizen of OpenTelemetry.

Would love to hear your thoughts and opinions!

Check it out -

Docs: https://www.traceloop.com/docs/python-sdk/introduction

Github: https://github.com/traceloop/openllmetry




LLM observability strikes me as an extremely, extremely crowded space. And YC has funded an enormous number of them.

What do you think is the key differentiator between you and everyone else? Is vendor lock-in really that huge of an issue?

[0] https://hegel-ai.com, https://www.vellum.ai/, https://www.parea.ai, http://baserun.ai, https://www.trychatter.ai, https://talc.ai, https://github.com/BerriAI/bettertest, https://langfuse.com


Note that these products aren't the same, even though they all fall under the category of observability - similarly to how you'd use Datadog AND Sentry, although both can be called "observability platforms".

I do think vendor locking is a key differentiator, which some of the reasons why OpenTelemetry succeeded in the first place. I know that my previous company switched to OpenTelemetry for exactly this reason. You get the flexibility of using any platform you'd want (since we're compatible with OpenTelemetry), so it's not vendor-locking you to a specific platform with specific capabilities. Why use any of the ones you mention - maybe Datadog is enough if your use case is simple?

But there are more advantages - you get much more than just observability to the LLM itself - you can see calls to vector DBs, network calls, DB queries, etc. - this can be extremely useful IMO for RAG and autonomous agents for example


Just to add to Nir's answer here:

Let's say your application takes several steps to build up a prompt dynamically, such as a RAG pipeline. You'll end up with a different prompt for potentially each user, depending on the application.

The result is you've likely increased the accuracy of the LLM, but at the expense of understanding the whole system's behavior by introducing more steps upstream of the LLM call. Those steps could be super simple, or they could be (like in our case) dozens of steps that could all potentially fail or have a bug or whatever.

And so how do you wrangle all of this in context? You need something like OpenLLMetry that treats a request to an LLM as one of several components that make up a request and/or user experience. Otherwise you're just throwing stuff at the wall, guessing at what could improve stuff (or guessing at what could make an eval score better).


Any thoughts of contributing this upstream directly or to CNCF?

We would be interested in hosting and supporting this type of work.

You can reach out to me via cra@linuxfoundation.org if you want to chat


Sure, would love to! I'll ping you.


What is the difference from using OpenLLMetry versus using OTel directly? Is the issue that there aren't conventions for the needed attributes?


2 differences:

1. You don't have instrumentations for libraries like OpenAI, LangChain, etc. so you need to manually open spans

2. As you said, there are no semantic conventions for logging things like prompts and chains.

What we did is just defined the new set of semantic conventions, and built the instrumentations. But we're using vanilla OpenTelemetry so it's fully compatible with standard OpenTelemetry.


Cool! It looks like you effectively do auto instrumentation. Have you found there to be interesting nuances between LLM providers? Tracing is great and trace aggrgegates (with context!) cross-vendor would be even more awesome.


Wow, where do I start? The APIs are not that similar, but we're trying to use the same set of semantic conventions for everyone so for example you'll always get the model version, or the temperature in the same attribute. Which kinda means it's identical cross-vendor, at least on the o11y side.

Here are all the semantic conventions we've defined so far - https://github.com/traceloop/openllmetry/tree/main/packages/...


Pretty neat! I assume it's just measuring traces right now? Any plans to add some top level metrics like build times, prompt length, etc?


Yes, only traces for now. We do want to send out metrics for prompt length, token usage, etc. like you mentioned. Hopefully will be available soon (and we welcome contributions :) )


Hello,

Is it possible to use Traceloop's LLM instrumentations with already existing opentelemetry implementation ?


Yes, ofc. The LLM instrumentations are just like all other instrumentations.


Thank you,

Does it work on Azure OpenAI calls for langchain ? seems it did not work for me or im missing somethin


It should work, but LangChain has many quirks so it can depend on which syntax you're using. Ping us on slack and we'll assist -

https://join.slack.com/t/traceloopcommunity/shared_invite/zt...


Hey,

Is it possible to used Traceloop LLM instrumentations only with already existing opentelemetry implementation


Great idea!

Observability (AKA, debug/proxy/statistics/logging/visualization layer) -- for LLM's (AKA Chat AI's)...

Hmmm, you know, I would love something for ChatGPT (and other AI chatbots) -- where you could open a second tab or window -- and see (and potentially interact with) debug info and statistics from prompts given to that AI in its main input window, in realtime...

Sort of like what Unix's STDERR is for programs running on Unix -- but an "AI STDERR" AKA debug channel, for AI's...

I'm guessing (but not knowing) that in the future, there will be standards defined for debug interfaces to AI's, standards defined for the data formats and protocols traversing those interfaces, and standards defined for such things as error, warning, hint, and informational messages...

Oh sure, a given AI company could pick a series of their own interfaces, data protocols and how to interpret that data.

But if so, that "AI debug interface" -- wouldn't be universal.

Of course, on the flip side, if a universal "AI debug interface" were ever established, perhaps such a thing would eventually suffer from the complexities, over-engineering and bloatedness that plague many "designed-by-committee" standards in today's world.

So, it will be interesting to see what the future holds...

To take an Elon Musk quote and twist it around (basically abuse it! <g>):

"Proper engineering of future designed-by-committee standards with respect to AI interfaces and protocols is NOT guaranteed -- but excitement is!"

:-) <g> :-)

Anyway, with respect to the main subject/article/authors, it's a very interesting and future-thinking idea what you're doing, you're breaking new ground, and I wish you all of the future success with your company, business, product and product ideas!


Thanks! Related to what you're saying, I was actually expecting some reactions from devs who'd ask "why is it a separate repo and not part of opentelemetry from day 1?".

And for that my answer would be that I think having a separate repo would allow this to evolve in a more natural way, and faster (whereas OpenTelemetry, given it's massive adoption already, evolves much slower, with committees etc.).

Then, at some point when this is stabilized and useful - we can merge.

Kind of like Tesla's NACS vs. CCS


any smooth way to get this work with javascript? would love to use this in a project but my inferences are all in js


Definitely! (Tomer from Traceloop here)

We've already started developing the typescript SDK. Would love to see exactly what your use case is, so we can prioritize specific instrumentation and collaborate on it. We'll ping you.


Nice!

Does traceloop support OpenTelemetry Protocol File Exporter?

I'm the maintainer of Insomnium (https://github.com/ArchGPT/insomnium) and I'm building a LanceDB-based prompt orchestration framework for automated software development that I'm integrating into Insomnium these few weeks. (The orchestration framework will also be open-source soon) Traceloop cloud looks good but I think for simple cases my users will prefer to have a 100% local solution.

would be nice to have a simple API to export to local; thanks!


Yes, since we're using vanilla OpenTelemetry, you can set your exporter to whatever you want, including OpenTelemetry Protocol File Exporter. But I'd still use some sort of a dashboard, like Jaeger or one of the open source observability platforms like SigNoz or HyperDX that you can run locally.


Any plans for pgvector? Graphana tempo


Hey Gal from Traceloop here,

We definitely have pgvector on our roadmap (which tbh I think we better publish in the repo). For Graphana tempo, it's just a matter of making sure that it works as a destination - we'll do it today/tomorrow.


Will vLLM be supported as well?


Hey it's Gal from Traceloop,

That's a good tbh. I wonder whether we should implement instrumentations for LLMs "hosting solutions" or for specific LLMs (E.g. LLaMa/Falcon) and ignore the hosting solution (not sure if that's even possible though as it sort of dictates the inference api).

wdyt?


Love it!


Would've preferred LLMetry, My Dear Watson.


but it's open! :)


Worst pun ever, starred.


> observability

I really don't like that word for some reason. It's abstracting away something simple. Logs? Graphs? Debug data? Telemetry data? There is way better words for "this".




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: