Hey HN, we are Marc, Clemens, and Max – the founders of Langfuse. Langfuse leverages traces, evaluations, prompt management, and metrics to help developers debug and improve LLM applications. Here is a full walkthrough:
https://www.youtube.com/watch?v=2E8iTvGo9HsWith Langfuse, you can instrument your app and start ingesting traces, thereby tracking LLM calls and other relevant logic in your app such as retrieval, embedding, or agent actions. Langfuse then helps to analyze traces and use features such as evaluations or prompt management to make improvements to your app.
You can sign up to try Langfuse Cloud (https://cloud.langfuse.com/ – we have a generous free tier) or self-host Langfuse (https://langfuse.com/self-hosting) within a couple of minutes.
In the 15 months since our “Show HN” (https://news.ycombinator.com/item?id=37310070), thousands of teams adopted the project (including teams like KhanAcademy, Twilio, and Samsara) and we hit all of the scaling limits that we anticipated in the original Show HN thread.
On our v1/v2 setup, we frequently exhausted IOPS on Postgres and had our Node.js container grind to a halt during tokenizations. Since then, we migrated our Cloud infrastructure from Vercel/Supabase to Porter and then to AWS & Clickhouse.
Last week, we put the finishing touches on the Langfuse v3.0.0 release (https://github.com/langfuse/langfuse/releases/tag/v3.0.0) that unlocks major scalability improvements we have made over the past half year and are happy to share with the OSS ecosystem today.
Langfuse v3 addresses three challenges we encountered as an LLM observability platform: a) handling high ingestion throughput with large events (long strings, multimodal images/audio/video), b) providing fast analytical, table, and single-item reads across the product, and c) serving prompts quickly and reliably in the critical path of user’s applications. Langfuse is used by thousands of active self-hosting deployments, so at every point we needed to prioritize stability, fully automated migrations/upgrades, and use of infrastructure components that self-hosters can deploy freely on any cloud vendor.
The v3 release adds powerful infrastructure with a Clickhouse database next to Postgres, blob storage for events and introduces a worker as well as queues and caches (Redis) for data ingestion.
The Langfuse SDKs were originally written to send updates to a single trace to our backend. The backend then upserts tracing data in Postgres. Dealing with these updates to guarantee backwards compatibility with older SDK versions was a challenge.
Our ingestion pipeline writes all events into S3 and sends a reference to the file via Redis to our worker container. From there, we read all events with the same id (including all previously ingested ones) and merge them into a final event. We insert the new row into ClickHouse which automatically replaces the existing data for the same ID. Re-merging all event updates enables us to keep a high-throughput pipeline by converting updates into new insert-only records.
We ran many iterations to optimize our sorting keys in ClickHouse, use skip indexes efficiently, and rewrote almost all of our queries and API endpoints to make optimal use of the schema. Using a specialized, analytical database required a more database-centric application design than a swiss-army-knife database like Postgres.
The new infrastructure delivers dramatic performance gains: dashboards now respond within 400ms (95th percentile) instead of timing out on large projects and lookback windows, and tables load up to 90% faster - displaying data within 800ms even for the largest projects.
Finally, to serve prompts from prompt management with low-latency and high availability, we use caches heavily and also decoupled our infrastructure. For sensitive paths, we use dedicated deployments to avoid “noisy neighbors” within the same server. We also improved client-side caching in our SDKs. This enhancement allows them to prefetch prompts and revalidate them in the background, resulting in zero latency when retrieving a prompt at runtime.
If you have any questions or feedback, please join us in this HN thread, or in future on our Discord and GitHub Discussions. While Langfuse v3 is scalable, we tried hard to make it easy to get started with Langfuse and self-host it in your own infrastructure (https://langfuse.com/self-hosting).
PS: Here (https://langfuse.com/blog/2024-12-langfuse-v3-infrastructure...) is a more in-depth blog post on how we built Langfuse V3.
PPS: if you find these problems exciting, we are hiring (https://langfuse.com/join-us) in Berlin!
reflections/thoughts on where this field goes next:
1. i wonder if there are new ops solutions for the realtime apis popping up
2. retries for instructor like structured outputs mess up the traces, i wonder if they can be tracked and collapsible
3. chatgpt canvas like "drafting" workflows are on the rise (https://www.latent.space/p/inference-fast-and-slow) and again its noisy to see in a chat flow
4. how often do people actually use the feedback tagging and then subsequently finetuning? i always feel guilty that i dont do it yet and wonder when and where i should.