What it is
MLflow is the de facto open-source standard for ML experiment tracking, model registry, and deployment — used by tens of thousands of teams. Databricks has extended it heavily into LLM territory: auto-tracing across major frameworks, LLM-as-judge built in, prompt versioning, and evaluation runs that integrate with the existing MLflow run/artifact model.
Free under Apache 2.0. Databricks-hosted MLflow comes with the Databricks platform.
Where it shines
- Footprint. MLflow is already running in your environment. That's not a feature, but it is the strongest argument for using it.
- Auto-tracing. Wrap your LLM calls with one decorator, get spans automatically. The list of frameworks supported has grown faster than expected.
- Databricks integration. If your data and ML lifecycle live in Databricks, MLflow extending into LLMs lets you keep that consolidation.
Where it falls short
- Origin shows. The data model is "runs and artifacts," which fits ML training elegantly and LLM workflows awkwardly. Prompt management and dataset workflows feel grafted on.
- Setup overhead. "Just install it and start evaluating prompts" is more steps than Braintrust's free tier.
- Cross-functional gap. PMs do not open MLflow. The eval-first platforms have invested far more in non-engineer UX.
Bottom line
If your team already runs MLflow, extending it for LLM observability is the path of least resistance and a defensible choice. If you're starting fresh and your scope is LLM-only, the specialists (Braintrust, Langfuse, Opik) will deliver more value per dollar of attention.