ai-evals.tools

What it is

Promptfoo is an open-source command-line tool for evaluating and red-teaming LLM prompts and apps. Test cases live in YAML/JSON configs in your repo. Run batch evaluations across providers and prompt variants, with built-in security scanning for prompt injection, PII exposure, and jailbreak resistance.

Free and open source under the MIT license. Enterprise pricing for team features and managed deployments.

Where it shines

OSS-first. Real open source — no feature gates, no "open core" trickery.
Red teaming. The built-in PII, jailbreak, and prompt-injection probes are genuinely useful and not on offer in most eval platforms.
Config-as-code. Test cases live in your repo, get reviewed in PRs, version with the code they test. The right model for engineering-led teams.
CI integration. Drop into GitHub Actions in minutes.

Where it falls short

No managed UI. You'll either live in the terminal or build your own dashboards.
No pre-built test suites. You write the tests. That's by design, but it's a real cost.
Non-engineer accessibility. PMs won't open a YAML file. If cross-functional iteration matters, look elsewhere.

Bottom line

For engineering teams that already think config-as-code and need real red-teaming, Promptfoo is the obvious pick. Pair it with Braintrust or Langfuse for the human-facing parts (dashboards, playground, PM collaboration) and you've got a complete stack at minimal cost.

Promptfoo

Verdict

What it is

Where it shines

Where it falls short

Bottom line

Related

Arize AI

Braintrust

Galileo