LLM evals and observability company acquisitions

Fourteen months ago, picking an LLM observability tool meant choosing between a dozen or so independent companies pitching variations of the same workflow. As of May 2026, the field looks meaningfully thinner. Eight of those companies have been acquired since March 2025, with the most recent (Cisco's purchase of Galileo) closing today. The buyers split into three groups: foundation model labs, infrastructure vendors, and adjacent dev-tool companies. Each group tells a different story about what the category is worth.

This is the fastest consolidation any AI tooling sub-category has gone through. The outcomes also vary widely. Some products keep shipping under new owners. Others go into maintenance mode. A few fold into the buyer's existing stack. Knowing which pattern fits which buyer is now part of the diligence work.

Eight recent acquisitions

Date	Target	Buyer	Price	Outcome
Mar 2025	Weights & Biases (incl. W&B Weave)	CoreWeave	~$1.7B	Continues operating; bundled with GPU cloud
Mar 2025	Velvet	Arize AI	Est. $5–15M	Folded into Arize platform
Aug 2025	Humanloop	Anthropic	Est. $30–60M	Product wound down; team joined Anthropic
Sep 2025	Statsig	OpenAI	$1.1B all-stock	Operates independently; founder = OpenAI CTO of Apps
Jan 2026	Langfuse	ClickHouse	Est. $100–200M	Continues as open-source product
Mar 2026	Promptfoo	OpenAI	Est. $100–150M	Tech being integrated into OpenAI Frontier
Mar 2026	Helicone	Mintlify	Est. $15–30M	Tech integrated into Mintlify; standalone product in maintenance
May 2026	Galileo	Cisco	Est. $400M–1B	Folding into Splunk Observability portfolio

Only two of the eight prices were disclosed. See pricing notes at the end of the piece for how each estimate was derived.

Analyzing these purchases, three clear patterns emerge in how these companies were acquired.

Pattern 1: Foundation labs buying the eval layer

Anthropic's Humanloop acqui-hire, OpenAI's $1.1B Statsig acquisition, and OpenAI's Promptfoo deal share a thesis: if you sell the model, you should also own the workflow developers use to test it.

The logic is straightforward. Enterprise customers don't pick a model in isolation; they pick a model and the platform they evaluate, monitor, and gate releases on. Whoever owns the eval workflow owns where developers spend their time, which makes the model choice underneath feel more like a feature than a decision. A developer whose daily driver is OpenAI's testing surface runs fewer comparative evals against Claude than they otherwise would.

The three deals differ in shape. Humanloop was a pure acqui-hire. Anthropic took the team (CEO Raza Habib, CTO Peter Hayes, CPO Jordan Burgess, plus about a dozen engineers and researchers) but explicitly not the product or IP, and Humanloop's customers received a wind-down timeline. Statsig was the opposite. $1.1B all-stock, Statsig continues operating independently out of Seattle, and founder Vijaye Raji became OpenAI's CTO of Applications. That isn't an acqui-hire; it's a bet that experimentation infrastructure (A/B tests, feature flags, online evals) is core to how OpenAI's applications business runs. Promptfoo sits in between. OpenAI is integrating the tech into OpenAI Frontier, its platform for building and operating AI coworkers; the open-source components will reportedly continue; and Promptfoo's 25% Fortune 500 footprint comes along for the ride.

What ties the three together is the recognition that the eval layer is strategically valuable. A model lab that ships a great model but cedes the testing surface to a neutral third party has handed real leverage away. Both Anthropic and OpenAI have decided that's not a tradeoff they want to keep making.

Pattern 2: Infrastructure vendors absorbing the application layer

CoreWeave's $1.7B acquisition of Weights & Biases (closed May 2025), ClickHouse's acquisition of Langfuse (January 2026), and Cisco's acquisition of Galileo (closed May 2026) run the same play on three different infrastructure layers.

CoreWeave sells GPU compute. W&B Weave is the LLMOps product that runs on top of GPU compute. Owning both lets CoreWeave sell "GPU cloud with the eval and observability layer built in," a bundle that's harder for a pure compute reseller to match. The W&B brand stays, the product keeps shipping, and the customer list (OpenAI, Meta, NVIDIA, Snowflake, Toyota, Canva, Square) becomes CoreWeave's installed base.

ClickHouse is the literal version of the same pattern. Langfuse already ran entirely on ClickHouse, both in the cloud product and in self-hosted deployments. The acquisition is the database vendor recognizing that the application layer running on top of it has crossed the threshold from "customer" to "strategic asset worth owning." Langfuse keeps its open-source license and keeps shipping. ClickHouse picks up an LLM observability product with 2,000+ paying customers, 26M+ monthly SDK installs, and 19 of the Fortune 50 on the customer list, plus a credible foothold in the AI observability stack at the moment that workload is becoming the most valuable thing running on any analytical database.

Cisco fits the same template, one layer up the stack. Cisco owns Splunk, which gives them the leading position in general-purpose observability. Galileo's online evaluation product (Luna-2 evaluators cheap enough to score live traffic) plugs straight into the Splunk Observability portfolio and gives Cisco a real answer to "what does observability look like for an agentic system." This is the most enterprise-flavored version of the pattern. Cisco isn't buying Galileo to win the developer mindshare race; they're buying it because the Fortune 500 customers Splunk already serves need an answer for monitoring AI agents, and Galileo is that answer.

The infrastructure-vendor pattern is the most customer-friendly of the three. W&B Weave and Langfuse have continued shipping post-acquisition, with the buyer's resources accelerating the roadmap rather than killing the product, and Cisco's stated plan for Galileo is integration into Splunk Observability Cloud rather than wind-down. The risk for customers is feature direction tilting toward the buyer's strategic interests (more ClickHouse-native features in Langfuse, more Splunk-native UI in Galileo), but the core observability workflow has stayed intact in every case so far.

Pattern 3: Specialist consolidation and the middle getting eaten

The remaining two deals tell the messier story: what happens to the long tail of independent observability tools.

Arize buying Velvet in March 2025 is the cleanest specialist-on-specialist move. Velvet's LLM gateway and observability product got folded into Arize's platform, and Velvet's CTO Chris Hendel joined Arize to lead platform engineering. It's a category leader bulking up by absorbing a smaller player, the kind of consolidation move mature dev-tool categories see all the time. A positive signal about Arize's ambitions, but it doesn't change the competitive map much.

Mintlify acquiring Helicone in March 2026 has more interesting context behind it. Helicone was already running the AI infrastructure powering Mintlify's own product, processing millions of AI interactions per month on behalf of Mintlify's customers. The acquisition brings the Helicone team in-house to scale that layer as Mintlify's AI features grow, with the standalone Helicone product moving into maintenance while the underlying tech integrates into Mintlify's platform. It's the natural conclusion of a deep customer relationship rather than a wind-down.

There's still a market-structure point worth noticing across these smaller deals. The biggest acquisitions (W&B, Langfuse, Statsig, Galileo) preserved their products as standalone offerings. The smaller ones (Humanloop, Helicone) folded into the buyer's larger strategy. Both are real outcomes for the team. If you're picking a tool primarily for the standalone product roadmap, though, the smaller-deal pattern is the one to watch.

Who's still standing

The notable independents, after the dust:

Braintrust — Series A, no acquisition rumors, customer list (Notion, Stripe, Vercel, Airtable, Instacart, Zapier) still growing. Currently the strongest independent in the category, and an obvious acquisition target for any of the three buyer archetypes above. As of writing, nothing has been announced.
LangChain / LangSmith — Still independent, with the eval product riding on the framework's adoption. LangChain is large enough now that any acquisition would mean a buyer absorbing a major dev-tools brand, not a quiet talent buy. Different deal physics.
Comet (Opik) — Independent. Opik is the newer LLM-focused product on top of Comet's MLOps foundation. Plausibly attractive to an infra buyer following the ClickHouse or CoreWeave playbook.
Arize — Now a buyer rather than a target, post-Velvet.
PromptLayer, Patronus, Maxim AI, Fiddler — Smaller, more specialized, less obvious strategic fit for any of the active buyers. Most likely to keep operating independently, though also the most plausible candidates for the next round of acqui-hires if the cycle continues.

The shape of the market is fundamentally different from eighteen months ago. Three of the most popular open-source tools (Langfuse, Helicone, Promptfoo) now have new corporate parents. Two of the biggest foundation labs have acquired evaluation companies. The largest infra vendor in AI compute owns the most popular LLMOps product on its stack. Cisco owns the leading high-volume online eval tool. None of this was on anyone's radar at the start of 2025.

What this means if you're picking a tool right now

Three practical implications.

1. Acquisition outcome is now part of selection. Features, pricing, and roadmap aren't the whole picture anymore. It's also worth asking: if this company gets acquired in the next twelve months, what's the most likely shape of that outcome? For products with real revenue and customer lists (Braintrust, Comet/Opik), the outcome would probably look like Langfuse or W&B, with continued product and an accelerated roadmap. For smaller tools, outcomes range from Helicone-style absorption into the buyer's platform to Humanloop-style team-only acqui-hires where the product retires. Pick with that in mind.

2. Foundation labs are now competitors to neutral observability tools. OpenAI's Frontier (post-Promptfoo) and Anthropic's enterprise tooling (post-Humanloop) will push hard on lab-native eval workflows. Tight integration with the buyer's own models, looser integration with everyone else's. If your eval suite needs to cover multiple labs comparatively, a neutral platform still has a real advantage. Expect the lab-native options to get genuinely good, though, and to be free or cheap for customers already paying the lab.

3. Open source matters more than it did six months ago. Langfuse stayed open source post-acquisition. Promptfoo's open-source components are continuing. Both commitments come from the current owner rather than the original founder, so it's worth checking where that commitment is binding and where it isn't. Tools where OSS is a load-bearing promise from the current owner are now scarcer than they used to be.

The deeper signal

The speed of this consolidation tells you how the buyers see the category. Foundation labs are spending nine and ten figures to own the eval layer. Infrastructure vendors are spending ten figures to bundle it with compute, database, or general observability. Adjacent dev-tool companies are quietly absorbing the smaller players.

That's not what a commodity category looks like. It's what a strategically valuable one looks like. Two of the most disciplined buyers in tech (OpenAI and Anthropic) have decided they need to own pieces of it. The leading infrastructure vendors (CoreWeave, ClickHouse, Cisco) have spent significant sums to lock down the application layer above them.

If you're choosing eval and observability tooling right now, the implication is the opposite of what it would have been two years ago. This isn't a category to delay decisions on while waiting for the market to settle. It's settling now. Pick a tool with either the scale to be acquired and preserved, or the position to stay independent through the cycle. Both kinds still exist; the list is just shorter than it was a year ago.

Pricing notes

Only two of the eight deals had publicly disclosed prices. The rest are best-guess ranges anchored to the target's last known funding round, comparable acqui-hire pricing for AI talent in 2025–2026, and the strategic value of the buyer's positioning.

W&B / CoreWeave (~$1.7B) is from The Information's reporting, since picked up across coverage.
Statsig / OpenAI ($1.1B all-stock) is from OpenAI's announcement.
Velvet / Arize is a Y Combinator-stage acqui-hire with no publicly reported funding rounds beyond pre-seed. Comparable acqui-hires in 2025 ranged $5–15M; Chris Hendel joining Arize in a senior engineering role makes a single-digit-millions deal most likely.
Humanloop / Anthropic raised ~$8M total (Index, YC, LocalGlobe, AlbionVC, last priced 2022). With three founders and ~12 engineers, AI talent acqui-hires in 2025 priced at $2–4M per senior engineer-equivalent put this in the $30–60M range. Anthropic took no IP, so this is pure talent.
Langfuse / ClickHouse had only ~$4.5M in seed funding but reported 2,000+ paying customers, 26M monthly SDK installs, and 19 Fortune 50 logos at acquisition, well past Series A traction without ever raising it. ClickHouse buying their largest application-layer tenant comps to enterprise SaaS acquisitions at 8–15x ARR; the $100–200M range assumes modest ARR for a freemium OSS product.
Promptfoo / OpenAI was valued at $86M after July 2025's round (PitchBook, via TechCrunch). Strategic acquisitions usually clear 1.2–2x the last priced round, with OpenAI willing to pay a premium for the 25% Fortune 500 footprint.
Helicone / Mintlify: Helicone raised ~$5M at a $25M valuation. Mintlify (~$66.7M total raised) is a strategic buyer absorbing existing infrastructure rather than a model lab or compute giant writing a check. With the standalone Helicone product moving into maintenance, the price likely lands close to or only modestly above the last priced round.
Galileo / Cisco: Galileo raised a $45M Series B in October 2024 and reported 834% YoY revenue growth with 6 Fortune 50 customers at the time. Cisco's recent enterprise acquisitions have priced at 10–20x ARR; even on conservative ARR assumptions for a fast-growing Series B, $400M–1B is the defensible range, with the upper end more likely given Cisco's pattern of paying for strategic enterprise positioning (Isovalent, Splunk itself).