$ ai-evals
← all companies

Langfuse

Open-source LLM observability with evals, prompt management, and best-in-class tracing.

score8.4
observabilityLLM evalsprompt managementopen-sourceopen sourcelangfuse.com

Verdict

The default open-source pick. Self-hostable, batteries-included, and the cloud version is reasonably priced if you don't want to run it yourself. Tracing is the strongest part of the product — closer to a real APM for LLM apps than anything else we've used.

What it is

Langfuse is an open-source observability and eval platform for LLM apps. Trace inference calls, attach scores, define datasets, run experiments, and manage prompts — all in one self-hostable service. Free if you self-host; cloud starts at $29/month with usage-based pricing.

Developer experience

SDKs in Python and TS, plus OpenTelemetry and OpenAI/LangChain integrations that "just work." Drilling into a multi-step agent run feels closer to a real APM than what most eval-first competitors offer.

import { Langfuse } from "langfuse";
 
const lf = new Langfuse();
const trace = lf.trace({ name: "triage" });
const gen = trace.generation({ name: "classify", model: "gpt-4o" });
gen.end({ output });

Where it shines

  • Self-hosting. Helm chart, docker-compose, and a SOC 2-compliant cloud offering — pick your flavor. This is the differentiator for teams that can't ship customer data to a third-party SaaS.
  • Tracing. Best-in-class for debugging real agent traffic. Session grouping connects related requests cleanly.
  • Pricing. OSS is OSS. Cloud is reasonable.

Where it falls short

  • Evals UX. Functional but less opinionated than Braintrust. You'll spend more time wiring things together to get a CI-gated eval flow.
  • Scale ops. Self-hosting at high trace volume needs a real ClickHouse story.

Bottom line

If self-hosting matters or you want OSS, Langfuse is the obvious choice and a credible alternative to closed-source incumbents. If you'd pay anything to skip the ops work and want the most polished eval flow out of the box, look at Braintrust first — but Langfuse is the one we'd start with for almost any team that takes data control seriously.

Related