Agnos Research · v51h

Know when your language model is making things up.

AgnosLogic uses a Qwen3-32B-based probe stack that reads hidden state geometry in a single forward pass. No judge LLM. No retrieval augmentation. No extra inference calls.

Single API callSub-100ms warm latency (≤192 tokens)15 free queries/day
92.6%
Recall
HaluEval QA · n=9,200
0.948
AUROC
HaluEval QA (±0.003)
91%
Pair-scoring
Legal contracts · n=100 (±5.6pp)
46ms
Compute
H100 · ≤192 tokens
4-dim
Uncertainty
Factual · Logical · OOD · Compositional
The difference

Every other tool calls a second LLM to judge the first.

LLM-as-judge approaches and retrieval grounding both add latency, cost, and another source of hallucination. AgnosLogic is different — we read the geometry of hidden states directly. One forward pass. No second model. Calibrated risk score returned with the response.

See full comparison →

# Pair-scoring: response vs reference POST /v1/score { "text_a": "Vendor warrants 90-day uptime SLA.", "text_b": "Vendor disclaims all uptime warranties." } # Response { "gap": +4.9, "verdict": "a_more_factual", "latency_ms": 87 }
How it works

Hidden state geometry, not vibes.

AgnosLogic augments a frozen language model with three auxiliary probes that read the model's internal representations. The geometry of truthful and fabricated content separates cleanly — we learn that separation once and score every query in a single forward pass.

i.

Submit a response + reference

Send your LLM's output alongside a reference: a source document, ground truth answer, or alternative response. Works for RAG pipelines, eval workflows, and A/B testing.

ii.

Three independent probes

AgnosLogic scores the pair via three auxiliary probes — FAH, CWMI, ESR — each measuring a different dimension of uncertainty in a single forward pass. No second LLM. No retrieval.

iii.

Calibrated verdict returned

You receive which text is more likely hallucinated, a confidence gap, and per-probe scores. 87ms median pair-scoring on H100 (≤192 tokens). Dedicated endpoint available.

Live API

Three endpoints. Zero complexity.

RESTful JSON. Bearer-token auth. Under 10 lines of code in any language.

# Python · Pair-scoring in a single call import requests response = requests.post( "https://agnoslogic.com/v1/score", headers={"Authorization": "Bearer YOUR_API_KEY"}, json={ "text_a": "Vendor warrants 90-day uptime SLA.", "text_b": "Vendor disclaims all uptime warranties." } ) data = response.json() print(data["verdict"]) # "a_more_factual" | "b_more_factual" print(data["gap"]) # confidence gap (float) print(data["latency_ms"]) # ~87ms on H100 (≤192 tokens)
Full API reference →
Comparison

How AgnosLogic compares.

AgnosLogicLLM-as-judgeRAG grounding
Single forward pass
Sub-100ms latency (≤192 tokens, H100)
No second LLM required
Works without retrieval corpus
Calibrated risk score
Pricing

Pay for what you use.

Start free. Upgrade when you're ready.

Explorer
Try AgnosLogic on your data
$0 / month
15 queries per day
  • All three API endpoints
  • Full verdict breakdown
  • Community support
Get free API key
Enterprise
Custom deployment + SLA
Custom
Unlimited queries
  • Everything in Builder
  • Dedicated endpoint (private workers, no shared infrastructure)
  • Custom thresholds + integration support
  • SLA + dedicated support
Contact sales

Ready to build trustworthy AI?

Sign up in 30 seconds. Free API key. No credit card required.

Start free with Google