LLM Observability API
See what your LLM app is actually doing
Observability for the LLM calls your application makes, powered by Brainiall LLM Observability engine. POST a trace for every model call — the prompt, the completion, the tokens, latency, cost and status — and Brainiall stores it, aggregates latency, token, cost and error-rate statistics across any time window, and scores responses with built-in heuristic evals. It is provider-agnostic — works with whatever model your app calls — with no SDK to adopt and nothing in your request path. Priced per operation, self-serve from the first call; reading stats is always free.
How we compare
LLM observability is sold as an open-source platform you self-host or its hosted cloud, as a proxy that sits in front of your traffic, or as a tracing tool tied to one orchestration framework — and the hyperscalers fold it into a broad application monitoring suite. Brainiall is a plain REST ingest API: POST a trace, GET aggregate stats, POST an eval. It is framework-agnostic, adds nothing to your request path, and shares one API key and one bill with the rest of the catalog.
| Provider | Shape | Pricing model | Approx. price | Onboarding |
|---|---|---|---|---|
| Brainiall LLM Observability | REST: /traces /stats /evals/run — ingest, aggregate, score | Per operation | $0.0002 / operation | Self-serve, instant API key |
| Langfuse | Open-source platform + hosted cloud, instrumented through an SDK | Per usage unit + per seat | Usage + seat fees | Self-host or cloud sign-up |
| Helicone | Proxy placed in front of your LLM traffic | Per request + per seat | Request + seat fees | Re-point your base URL |
| LangSmith | Tracing tied to one orchestration framework | Per trace + per seat | ~$0.0005 / trace | Self-serve, account |
Prices are list-price approximations for orientation, not quotes — vendors meter observability by the usage unit, the request, the trace or the seat, so a per-operation figure is not directly comparable. A proxy intercepts your traffic and can add latency; Brainiall's ingest model never sits in your request path. Always check each vendor's current terms.
Pricing
One per-operation price for ingesting a trace, querying traces or running an eval. Reading aggregate stats is always free, so a dashboard can poll without burning quota. The free tier is enough to instrument a real app end to end.
Free
$0/mo
1,000 operations/month · heuristic evals included · stats polling always free
Starter
$19/mo
~100,000 operations/month · filter traces by model, status and time
Pro
$79/mo
~500,000 operations/month · priority ingest · 99.5% SLA
Business
$299/mo
~2,000,000 operations/month · dedicated capacity · email + Slack support
PAYG: $0.0002 / operation (Brainiall LLM Observability engine). An operation is one trace ingested, one trace query or one eval run; reading aggregate stats is never metered. No minimum spend, no contract — the same single API key and usage-based billing as the rest of the catalog.
Three building blocks
# Authenticate every call: Authorization: Bearer brnl-...
# 1. Ingest a trace — one record per LLM call your app makes
POST https://api.brainiall.com/v1/observability/traces
body: {"name": "chat-completion", "model": "assistant-v2",
"input": "Summarize the Q3 report.",
"output": "Q3 revenue rose 12% on strong API demand.",
"input_tokens": 480, "output_tokens": 95,
"latency_ms": 1340, "cost_usd": 0.0021, "status": "ok",
"metadata": {"user": "u_8821", "session": "s_4f1c"}}
-> {"trace_id": "62a398c73c1649c7", "status": "recorded",
"engine": "Brainiall LLM Observability engine"}
# 2. Stats — aggregate health over any window (free, never metered)
GET https://api.brainiall.com/v1/observability/stats?window_s=86400
-> {"trace_count": 14820, "error_count": 63, "error_rate": 0.0043,
"latency_ms": {"p50": 910, "p95": 2480, "max": 8800},
"input_tokens": 6820400, "output_tokens": 1390250,
"cost_usd": 28.74, "traces_by_model": {"assistant-v2": 14102}}
# 3. Evals — score a response with fast, deterministic heuristic checks
# groundedness here = lexical content-word overlap, not a semantic check
POST https://api.brainiall.com/v1/evals/run
body: {"input": "Summarize the Q3 report.",
"output": "Q3 revenue rose 12% year over year.",
"reference": "Q3 revenue increased 12% YoY.",
"checks": ["relevance", "groundedness", "non_refusal", "pii_safe"]}
-> {"checks": {"groundedness": {"score": 0.6,
"reason": "response/reference content-word overlap is 60% (lexical-overlap heuristic, not a semantic check)"}},
"overall_score": 0.73, "checks_run": ["relevance", "..."]}
# List + filter: GET /v1/observability/traces?model=assistant-v2&status=error
# Fetch one: GET /v1/observability/traces/{trace_id}Every field on a trace is optional except the ones you choose to send — log only the prompt and the latency, or the full call with tokens and cost. The trace list filters by model, status and a since timestamp, so a dashboard can drill straight into the errored calls of one model.
How LLM observability works
Three plain REST calls — no SDK to adopt, no proxy to route your traffic through. You stay in control of every model call; Brainiall stores and explains what happened.
- Trace every call. After each LLM call your app makes, POST a trace with whatever you want recorded — prompt, completion, token counts, latency, cost, an
okorerrorstatus and free-form metadata tags. - Aggregate the health. One stats call rolls every trace in a window into trace count, error rate, latency percentiles, total tokens, total cost and a per-model breakdown — the numbers a dashboard or an alert needs.
- Score the responses. The eval endpoint runs fast, deterministic heuristic checks — relevance, lexical overlap with a reference, non-refusal, valid JSON, length and a PII-safety scan — and returns a score and a plain-English reason for each. The
groundednesscheck here is a lexical-overlap heuristic (shared content words), not a semantic hallucination detector; for semantic groundedness use the Content Safety Pro/v1/safety/groundednessendpoint. - Nothing in your request path. Brainiall never proxies your model traffic, so it cannot add latency or become a dependency of a live call — you POST traces after the fact, on your own schedule.
- Provider-agnostic. A trace is just data, so it works with whatever model or provider your application calls — there is no framework or vendor to match.
What it's for
- Debug production LLM apps: keep a searchable record of every call, then filter straight to the errored or slow ones when something looks wrong.
- Cost & token tracking: roll token counts and per-call cost into a running total per model, so an LLM bill never arrives as a surprise.
- Latency monitoring: watch p50, p95 and max latency over a window and alert when a model or a prompt change pushes the tail out.
- Regression testing & evals: score model responses with fast, deterministic heuristic checks in CI, and fail the build when relevance or reference overlap drops.
- RAG overlap checks: pass the retrieved context as the reference and measure how many of its content words an answer reuses — a fast, deterministic lexical signal. For a semantic check of whether the answer is actually supported by the context (with the supporting span and any contradictions called out), use the Content Safety Pro
/v1/safety/groundednessendpoint. - One bill, one key: same Brainiall API key and usage-based billing as the rest of the catalog — no separate observability vendor to procure.
Press kit & resources
What reviewers, integrators and procurement teams typically ask for.
OpenAPI spec
The machine-readable OpenAPI 3.1 definition for the whole catalog, including every observability and eval endpoint and its schemas.
View specAPI reference
OpenAPI spec, the trace and eval schemas, the stats window, error codes and rate limits.
Read docs →Compare the catalog
How Brainiall's specialty APIs line up against AWS, Azure and the specialists, use case by use case.
See the comparison →More specialty APIs
Same single API key, same usage-based pricing, different problem solved.



