$ ai-evals
← all companies

Promptfoo

Open-source CLI for evaluating LLM prompts and red-teaming applications, with YAML/JSON configs that live next to your code.

score7.4
LLM evalsred-teamingopen-sourceopen sourcewww.promptfoo.dev

Verdict

The strongest open-source eval CLI we've used, and the best red-teaming option in the category by a wide margin. If your team prefers config-as-code over a web UI and you care about prompt-injection / PII / jailbreak testing, this is the answer.

What it is

Promptfoo is an open-source command-line tool for evaluating and red-teaming LLM prompts and apps. Test cases live in YAML/JSON configs in your repo. Run batch evaluations across providers and prompt variants, with built-in security scanning for prompt injection, PII exposure, and jailbreak resistance.

Free and open source under the MIT license. Enterprise pricing for team features and managed deployments.

Where it shines

  • OSS-first. Real open source — no feature gates, no "open core" trickery.
  • Red teaming. The built-in PII, jailbreak, and prompt-injection probes are genuinely useful and not on offer in most eval platforms.
  • Config-as-code. Test cases live in your repo, get reviewed in PRs, version with the code they test. The right model for engineering-led teams.
  • CI integration. Drop into GitHub Actions in minutes.

Where it falls short

  • No managed UI. You'll either live in the terminal or build your own dashboards.
  • No pre-built test suites. You write the tests. That's by design, but it's a real cost.
  • Non-engineer accessibility. PMs won't open a YAML file. If cross-functional iteration matters, look elsewhere.

Bottom line

For engineering teams that already think config-as-code and need real red-teaming, Promptfoo is the obvious pick. Pair it with Braintrust or Langfuse for the human-facing parts (dashboards, playground, PM collaboration) and you've got a complete stack at minimal cost.

Related