Agnos Research · v51h

Know when your language model is making things up.

AgnosLogic uses a Qwen3-32B-based probe stack that reads hidden state geometry in a single forward pass. No judge LLM. No retrieval augmentation. No extra inference calls.

Try the live demo Read the docs

Single API callSub-100ms warm latency (≤192 tokens)15 free queries/day

92.6%

Recall

HaluEval QA · n=9,200

0.948

AUROC

HaluEval QA (±0.003)

91%

Pair-scoring

Legal contracts · n=100 (±5.6pp)

46ms

Compute

H100 · ≤192 tokens

4-dim

Uncertainty

Factual · Logical · OOD · Compositional

The difference

Every other tool calls a second LLM to judge the first.

LLM-as-judge approaches and retrieval grounding both add latency, cost, and another source of hallucination. AgnosLogic is different — we read the geometry of hidden states directly. One forward pass. No second model. Calibrated risk score returned with the response.

See full comparison →

# Pair-scoring: response vs reference POST /v1/score { "text_a": "Vendor warrants 90-day uptime SLA.", "text_b": "Vendor disclaims all uptime warranties." } # Response { "gap": +4.9, "verdict": "a_more_factual", "latency_ms": 87 }

How it works

Hidden state geometry, not vibes.

AgnosLogic augments a frozen language model with three auxiliary probes that read the model's internal representations. The geometry of truthful and fabricated content separates cleanly — we learn that separation once and score every query in a single forward pass.

Submit a response + reference

Send your LLM's output alongside a reference: a source document, ground truth answer, or alternative response. Works for RAG pipelines, eval workflows, and A/B testing.

ii.

Three independent probes

AgnosLogic scores the pair via three auxiliary probes — FAH, CWMI, ESR — each measuring a different dimension of uncertainty in a single forward pass. No second LLM. No retrieval.

iii.

Calibrated verdict returned

You receive which text is more likely hallucinated, a confidence gap, and per-probe scores. 87ms median pair-scoring on H100 (≤192 tokens). Dedicated endpoint available.

Live API

Three endpoints. Zero complexity.

RESTful JSON. Bearer-token auth. Under 10 lines of code in any language.

# Python · Pair-scoring in a single call import requests response = requests.post( "https://agnoslogic.com/v1/score", headers={"Authorization": "Bearer YOUR_API_KEY"}, json={ "text_a": "Vendor warrants 90-day uptime SLA.", "text_b": "Vendor disclaims all uptime warranties." } ) data = response.json() print(data["verdict"]) # "a_more_factual" | "b_more_factual" print(data["gap"]) # confidence gap (float) print(data["latency_ms"]) # ~87ms on H100 (≤192 tokens)

Full API reference →

Comparison

How AgnosLogic compares.

	AgnosLogic	LLM-as-judge	RAG grounding
Single forward pass	✓	—	—
Sub-100ms latency (≤192 tokens, H100)	✓	—	—
No second LLM required	✓	—	✓
Works without retrieval corpus	✓	✓	—
Calibrated risk score	✓	—	—