What it is
Vellum is a visual builder for LLM workflows and agents, with observability and evaluation built into the canvas. You compose an agent as a graph of nodes — prompts, tools, conditionals — and the same view shows traces, scores, and A/B test results once the workflow is live. Free tier with 30 credits/month; paid plans start at $25/month.
Developer experience
The "developer experience" framing fits Vellum a bit awkwardly: a meaningful chunk of its appeal is making agent development less code-centric. Engineers can drop into custom code nodes when they need to, but the product is happiest when most of your workflow lives in the visual graph.
Where it shines
- Cross-functional collaboration. PMs can read and modify the same workflow engineers built. That's hard to overstate as a productivity unlock.
- Coherent debug-and-iterate loop. The graph used to design the agent is the same one you debug it in.
- Built-in evaluation. Online evals run against the same workflow; you don't need a separate eval product.
Where it falls short
- Code-first teams hit walls. If most of your agent is custom Python with state and side effects, the visual model fights you.
- Lock-in. Workflows live in Vellum. Migrating off is non-trivial.
- Niche. The teams it fits, it fits well. Outside that audience it's an awkward choice.
Bottom line
If your AI org includes meaningful PM or domain-expert participation in agent design, Vellum deserves a serious look. For pure-engineering teams shipping code-first agents, the SDK-based platforms (Braintrust, Langfuse) are a better fit.