Drop-in proxy · No engine forks · No model patches

Works with any inference engine
that exposes token logprobs.

Geodesia G-1 is a real-time reverse proxy that speaks the OpenAI Chat Completions and Ollama protocols. Your application keeps speaking OpenAI; G-1 forwards to your engine of choice — vLLM (official, unmodified), SGLang, TensorRT-LLM, llama.cpp, Ollama, or any OpenAI-compatible cloud endpoint — and screens every turn on six axes with our proprietary multimodal model, in ~30 ms for a 1024-token prompt on a single GPU. Change one base URL. And when you build agents, the same layer drops into your stack as a Model Context Protocol (MCP) guard.

See the one-URL integration G-1 product page →

ProtocolOpenAI · Ollama · streaming

RequirementEngine exposes logprobs

Engine forkNone. Ever.

Model patchesZero. Weights untouched.

The Integration Matrix

Six engines.
All six axes. End to end.

The only signal G-1 needs from your engine is the standard logprobs: true flag on the OpenAI API. All six engines below expose it natively — including Ollama, whose recent releases ship full per-token logprob support over its OpenAI-compatible endpoint. All six detection axes run end-to-end on every engine.

Engine	Form	Logprobs	Context halluc.	Context-injection	Closed-book halluc.	Prompt / Answer safety	Jailbreak	Streaming brakes
vLLMofficial 0.21+ · unmodified	Self-hosted	✓ native	✓	✓	✓	✓	✓	✓
SGLangOpenAI-compatible · structured outputs	Self-hosted	✓ native	✓	✓	✓	✓	✓	✓
TensorRT-LLMNVIDIA · production throughput	Self-hosted	✓ native	✓	✓	✓	✓	✓	✓
llama.cppedge / GGUF · OpenAI server	Self-hosted / edge	✓ native	✓	✓	✓	✓	✓	✓
OpenAI APIcloud · GPT-4o / GPT-5 family	Cloud	✓ native	✓	✓	✓	✓	✓	✓
Ollamalocal desktop · gguf models · recent versions	Local desktop	✓ native	✓	✓	✓	✓	✓	✓

Any OpenAI-compatible endpoint that emits logprobs works. We have validated the six engines above end-to-end; the proxy is engine-agnostic, so additional engines (LMDeploy, MLC-LLM, MLX-LM, Azure OpenAI, OpenRouter, Together, Fireworks, Groq, etc.) integrate the same way.

The "Change One URL" Integration

Point your client
at our gateway. That's it.

No SDK to install. No model to retrain. No engine to fork. The Geodesia gateway speaks the OpenAI Chat Completions protocol; your existing code base only needs to flip one base URL. Every prompt is screened, every answer is scored on six axes in real time, and a compliance-grade audit chain is written on the way through.

# Before — your existing code, talking to OpenAI / vLLM / SGLang / etc.
from openai import OpenAI
client = OpenAI(base_url="https://api.openai.com/v1")
# or self-hosted: base_url="http://vllm.yourco.internal:8000/v1"

# After — same SDK, same code path, now protected + scored + compliance-logged
client = OpenAI(base_url="https://geodesia.yourco.internal/v1")

resp = client.chat.completions.create(
    model="gpt-4o",                  # or any model your engine serves
    messages=[{"role": "user", "content": "..."}],
    stream=True,
)
# Every chunk that streams back is screened on 6 axes;
# if a risk barrier is crossed mid-sentence, the gateway halts and returns a BLOCK.
# A signed audit record is written for every call. Auto-PDFs available on demand.

// Before — your existing code, talking to OpenAI / vLLM / SGLang / etc.
import OpenAI from "openai";
let client = new OpenAI({ baseURL: "https://api.openai.com/v1" });
// or self-hosted: baseURL: "http://vllm.yourco.internal:8000/v1"

// After — same SDK, same code path, now protected + scored + compliance-logged
client = new OpenAI({ baseURL: "https://geodesia.yourco.internal/v1" });

const resp = await client.chat.completions.create({
  model: "gpt-4o",                  // or any model your engine serves
  messages: [{ role: "user", content: "..." }],
  stream: true,
});
// Every chunk that streams back is screened on 6 axes;
// if a risk barrier is crossed mid-sentence, the gateway halts and returns a BLOCK.
// A signed audit record is written for every call. Auto-PDFs available on demand.

# Before — talking straight to your engine
curl https://api.openai.com/v1/chat/completions \
  -H "Authorization: Bearer $TOKEN" \
  -d '{"model":"gpt-4o","messages":[{"role":"user","content":"..."}],"stream":true}'

# After — same request, through the Geodesia gateway (protected + scored + logged)
curl https://geodesia.yourco.internal/v1/chat/completions \
  -H "Authorization: Bearer $TOKEN" \
  -d '{"model":"gpt-4o","messages":[{"role":"user","content":"..."}],"stream":true}'
# Every chunk is screened on 6 axes; a risk barrier mid-sentence halts and returns a BLOCK.
# A signed audit record is written for every call. Auto-PDFs available on demand.

01 · Context

RAG faithfulness vs the supplied passages. Token-level spans. OOD AUROC 0.881 (HaluEval).

02 · Context-injection

RAG-firewall: hostile instructions hidden inside a retrieved file. 0.995 (Gandalf).

03 · Closed-book

Confident fabrication via the model's own logprobs. Advisory, 0.769 OOD.

04 · Prompt safety

Adversarial intent + dual-concept boolean logic. 0.900 (XSTest).

05 · Answer safety

Harmful content in the response, scored as it streams. 0.922.

06 · Jailbreak

Attack structure, not keywords. 0.989 (jailbreak_cls).

Beyond chat, the same gateway exposes a streaming voice guard — WS /v1/glad/audio/stream (live PCM → verdict events), POST /v1/glad/audio/utterance (batch clip), GET /v1/glad/audio/status — and lets you bring your own policy prompt: send "constitutional_ai": false (or your own system message) and G-1 uses it instead of the Constitutional-AI prompt, grounding the hallucination check against your instructions.

Model Context Protocol

It also guards
your agents.

When your application moves from chat to agents, G-1 moves with it. The same trust layer inspects the full Model Context Protocol (MCP) lifecycle — tool discovery, tool-calls, results and resources — and returns allow / warn / block verdicts in real time, stopping tool poisoning & "rug-pull", indirect prompt-injection via results, and data exfiltration. Three ways to drop it in:

Guard Server (MCP)

A queryable MCP server that exposes analysis / verification primitives to any other MCP host. Ask it to vet a tool, a call, or a result.

Inline interceptor

Interposes between a host and its downstream MCP servers, sanitising tool traffic in transit. No application changes.

Tool-aware chat gateway

Validates tools / tool_calls / results inside your existing OpenAI chat path — a byte-identical no-op when there are no tools.

MCP policy is configurable per-application → per-axis → per-tool (action and threshold per axis; trusted / blocked / egress tools). Listening MCP ports appear in Studio Settings.

Why This Architecture

The detector lives outside the model.
By design.

Older runtime-safety stacks lived inside the inference path: a patched vLLM, an architecture-specific hook, a hidden-state extractor wired to one model family. They worked — and they locked the customer into one model and one engine build. Geodesia G-1 inverts that. The detection engine is a single ~300M multimodal companion model that runs next to the model, reads only text and standard token logprobs, and is therefore truly model- and engine-agnostic — and fast enough to score all six axes in real time, ~30 ms for a 1024-token prompt on a single GPU.

Model-agnostic

One companion encoder serves Llama, Qwen, Mistral, Gemma, DeepSeek, Phi, gpt-4o, Claude — anything your engine can serve. Fine-tuned variants included.

Engine-agnostic

The proxy speaks the OpenAI and Ollama wire protocols. Your engine of choice runs official, unmodified. Upgrade your engine without breaking your safety layer.

Stack-agnostic

Cloud OpenAI today, self-hosted vLLM tomorrow, air-gapped llama.cpp on a defence-ministry HSM the day after. Same gateway. Same audit chain. Same PDFs.

Works with any inference engine that exposes token logprobs.

Six engines. All six axes. End to end.

Point your client at our gateway. That's it.