Show HN: You don't need to adopt new tools for LLM observability

marcklingen · 2024-02-14T19:59:09 1707940749

Fully agree - even as a founder of an ‘LLM observability company’. Observability does not need to be reinvented to get detailed traces/metrics/logs of the LLM part of an application.

LLM Observability usually means: prompts and completions, which model was used, errors and exceptions (rate limits, network errors), as well as metrics (latency, output speed, time to first token when streaming, USD/token and cost breakdowns). All of this is well suited to be captured in the existing observability stack. OpenLLMetry makes this really easy and interoperable - chapeau.

In my view, observability is not the core value that solutions like Baserun, Athina, LangSmith, Parea, Arize, Langfuse (my project) and many others solve for. Developing a useful LLM application requires iterative workflows and tinkering. That's what these solutions help with and augment.

There are specific problems to building an LLM application such as managing/versioning of prompts, running evaluations, blending multiple different evaluation sources, collecting datasets to test/benchmark an application, helping with fine-tuning models on high-quality production completions, debugging root causes of quality/latency/cost issues, ...

Most solutions either replicate logs (LLM I/O) or traces at first, as they are a necessary starting point to then build solutions for the other workflow problems. As the observability piece gets more standardized over time, I can see how integrating with the standard makes a ton of sense. Always happy to chat about this.

nirga · 2024-02-14T20:36:59 1707943019

Hey Marc :wave:

Would love to see you integrate and adopt this as soon as it makes sense to you. OpenTelemetry is a great and mature piece of technology and we should all be aligning around it now, while it’s still easy to do so.

epistasis · 2024-02-14T18:50:09 1707936609

I was looking to see what the actual metrics would be for a completion, to see if this is something of interest to me. So I tried to run the example here:

https://www.traceloop.com/openllmetry

Problem 1 (very minor): it's missing an `import os`

Problem 2: I need an API key.

Problem 3: The link that it tells me to go to for an API key is malformed: https://https//app.traceloop.com/settings/api-keys

Is there a way to see what the output is like without getting an account, and presumably also connecting to an observability platform like Grafana? I already made a venv and installed the package, so I'm not sure if I'm ready for even more steps just to see if this is something that might be useful to me.

nirga · 2024-02-14T18:57:01 1707937021

Thanks for the issues - I'll fix it! :sweat_smile:

Reg. Grafana and others - it's simple, just set the env vars - https://www.traceloop.com/docs/openllmetry/integrations/intr...

epistasis · 2024-02-15T04:51:38 1707972698

Thanks for effort of a reply, however I'm not quite willing to jump through hoops of creating accounts on two different services, plus the effort of integration, merely to see what the outputs might be. Shouldn't the benefit, the potential win to customers, be the selling point? Maybe others are willing to put in lots more effort than I am to see what the benefit is, but I've already spent a ton of time and am no closer to even understanding the concrete benefits of your product.

I might revisit if a trusted friend tells me it's useful, but it will take that sort of recommendation in order for me to spend more time on this. Seems like some example outputs would the thing you should show, at least somewhere prominently.

nirga · 2024-02-15T12:44:03 1708001043

You don't need to create an account in 2 services - you can just connect the SDK directly to Grafana - https://traceloop.com/docs/integration/grafana

Happy to assist if needed over slack - https://traceloop.com/slack

Aqueous · 2024-02-14T19:23:34 1707938614

I thought Observability in this context means the ability to introspectively make sense of why the LLM output what it did, which is a difficult problem because the model parameters are effectively an unintelligible morass of numbers. Does this help with that and if so how?

tracerbulletx · 2024-02-14T19:44:38 1707939878

Pretty sure this just structures logs for requests to common 3rd party LLM providers. Which I guess is useful, but it's not some kind of problem unique to LLMs.

Aqueous · 2024-02-14T21:59:09 1707947949

Correct- the summary is misleading marketing. This is just normal system / service observability. What people mean by observability in the LLM context is specific.

nirga · 2024-02-15T12:46:01 1708001161

I wouldn't call it misleading marketing - it is what it is, similar to what you can get today from tools like Langsmith, etc - Observability for the LLM part of your system, but using your existing tools. You can further extend that to monitor specific LLM outputs - but that's just another layer on top of that.

Aqueous · 2024-02-16T14:31:09 1708093869

Not talking about just monitoring outputs though. I'm talking about monitoring the internals of the model as it reaches its output. The entire issue around interpretability / observability inside the LLM's model is the hard problem, one for which considerable resources are being dedicated to solve - not simply hooking the public-facing APIs up to observability tools like any other service API. This is just conventional telemetry. Calling this LLM observability implies there is something special about it and unique to LLMs in particular that enhances introspection into the AI model itself, which is not true. The title is highly misleading, classic startup-bro fake-it-til-you-make-it hustling crap, and deserves to be called out.

a_wild_dandan · 2024-02-14T18:28:46 1707935326

What problem(s) does this solve? I have a ticket in my backlog. Your SDK unlocks the solution. What is that ticket's title? (I'm a bit thick, and need concrete examples for things to click.)

hooverd · 2024-02-14T20:23:05 1707942185

It's LLM specific OpenTelemetry tracing. What's going on inside your model isn't the focus. It's everything surrounding your model. How many prompts are people submitting? How long does each prompt take? Did certain prompts time out or return an error? What's the P95/P99 latency for your LLM? And so on.

nirga · 2024-02-14T18:57:53 1707937073

Same ticket that gets you to install something like Sentry - you wanna see what's happening in production and get alerted when things go wrong

lmeyerov · 2024-02-14T17:33:50 1707932030

Re:python, if we are already doing otel, how would this interop? Eg, if we don't want to break our current imports, and control where the new instrumentation goes

(Fwiw, This is a great direction!)

nirga · 2024-02-14T17:43:37 1707932617

Super easy - you can just use the standalone instrumentations directly - https://www.traceloop.com/docs/openllmetry/tracing/without-s...

lmeyerov · 2024-02-17T17:22:24 1708190544

... and we did, added to our Jaeger/Prometheus. Works great!

tomgs · 2024-02-14T16:45:11 1707929111

Cool! Two questions:

1. Where do you see this observability for LLM thing going? What's the end game? Is it like in traditional observability where all formats eventually will converge to one format (which OpenTelemetry is trying to be)? I feel it might be a little bit early to tell, tho

2. I noticed you do auto-detection of the framework used, like LLamaIndex et al. Except for annotations, is there a deeper connection to the LLM framework used? This is auto-instrumentation, so I assume you do most of the heavy lifting, but should users of this framework expect some cool hidden eggs when they look at their telemetry?

tomerf2 · 2024-02-14T16:48:49 1707929329

Thanks!

1. Huh, good question. Hopefully there will be convergence. We started discussing with other companies in this domain to support or even switch to OpenTelemetry.

2. Nothing specific, except for - as you mentioned - being able to see trace of a RAG pipeline automatically.

tomgs · 2024-02-14T16:54:46 1707929686

A quick one :)

While we're on the topic - how does traceloop factor into all of this? What's the connection between the two? I assume the former is the LLM observability platform (Datadog for LLM?) and the latter is your own auto-instrumentation thingie to supplement it?

nirga · 2024-02-14T16:58:59 1707929939

Yes, Traceloop is kind of a Sentry for LLMs

ssijak · 2024-02-14T19:31:01 1707939061

Its priced way high, 500$ for 50k LLM calls? 50k is not much at all.

nirga · 2024-02-14T20:34:18 1707942858

The open source is free for all ofc. Our platform provides capabilities for monitoring and detecting hallucinations hence cost more.