ai-evals.tools

Reviews of AI eval tools — written for developers.

We test, compare, and review the tools shaping how engineering teams measure LLMs and agents in production.

companies reviewed: 16
last updated: Apr 19, 2026

Browse all companies →How we evaluate

Featured companies

all companies →

Braintrust

Eval-driven dev platform combining traces, datasets, scorers, and a playground in one product.

9.1

LLM evalsobservabilityprompt managementfreemium

Fiddler

Enterprise ML governance platform extended to LLMs and generative AI, with audit-ready traces and in-environment evaluations.

7.2

AI governanceagent observabilityenterprise

Galileo

Agent reliability platform with cheap, fast evaluators that can run on every request in production.

7.5

agent observabilityLLM evalsfreemium

Helicone

Proxy-based LLM observability — drop in by changing the base URL, no SDK changes needed.

7.5

observabilityproxy / gatewayfreemium

Langfuse

Open-source LLM observability with evals, prompt management, and best-in-class tracing.

8.4

observabilityLLM evalsprompt managementopen-source

Vellum

Visual workflow builder with built-in observability for low-code agent development.

7.0

prompt managementagent observabilityfreemium

Recent editorial

all editorial →

ListicleApr 26, 2026

The best prompt management tools (2026)

Seven prompt management tools, ranked by what they actually solve — from no-code editors to Git-style versioning to eval-first platforms.

ListicleApr 25, 2026

The best AI agent observability tools (2026)

Five tools we'd actually pick for monitoring multi-step agents in production — what they cover, where they break, and who each one is for.

ListicleApr 24, 2026

Arize AI alternatives (2026)

Five platforms to consider if Arize's ML-first architecture isn't the right fit for an LLM-only workflow — and one honest case for sticking with Arize.