Drop-in proxy · No engine forks · No model patches

Works with any inference engine
that exposes token logprobs.

Geodesia G-1 is a real-time reverse proxy that speaks the OpenAI Chat Completions and Ollama protocols. Your application keeps speaking OpenAI; G-1 forwards to your engine of choice — vLLM (official, unmodified), SGLang, TensorRT-LLM, llama.cpp, Ollama, or any OpenAI-compatible cloud endpoint — and screens every turn on six axes with our proprietary multimodal model, in ~30 ms for a 1024-token prompt on a single GPU. Change one base URL.

See the one-URL integration G-1 product page →
ProtocolOpenAI · Ollama · streaming
RequirementEngine exposes logprobs
Engine forkNone. Ever.
Model patchesZero. Weights untouched.

Six engines.
All six axes. End to end.

The only signal G-1 needs from your engine is the standard logprobs: true flag on the OpenAI API. All six engines below expose it natively — including Ollama, whose recent releases ship full per-token logprob support over its OpenAI-compatible endpoint. All six detection axes run end-to-end on every engine.

Engine Form Logprobs Context halluc. Context-injection Closed-book halluc. Prompt / Answer safety Jailbreak Streaming brakes
vLLMofficial 0.21+ · unmodified Self-hosted ✓ native
SGLangOpenAI-compatible · structured outputs Self-hosted ✓ native
TensorRT-LLMNVIDIA · production throughput Self-hosted ✓ native
llama.cppedge / GGUF · OpenAI server Self-hosted / edge ✓ native
OpenAI APIcloud · GPT-4o / GPT-5 family Cloud ✓ native
Ollamalocal desktop · gguf models · recent versions Local desktop ✓ native

Any OpenAI-compatible endpoint that emits logprobs works. We have validated the six engines above end-to-end; the proxy is engine-agnostic, so additional engines (LMDeploy, MLC-LLM, MLX-LM, Azure OpenAI, OpenRouter, Together, Fireworks, Groq, etc.) integrate the same way.

Point your client
at our gateway. That's it.

No SDK to install. No model to retrain. No engine to fork. The Geodesia gateway speaks the OpenAI Chat Completions protocol; your existing code base only needs to flip one base URL. Every prompt is screened, every answer is scored on six axes in real time, and a compliance-grade audit chain is written on the way through.

Python · OpenAI SDK JavaScript · OpenAI SDK cURL
# Before — your existing code, talking to OpenAI / vLLM / SGLang / etc.
from openai import OpenAI
client = OpenAI(base_url="https://api.openai.com/v1")
# or self-hosted: base_url="http://vllm.yourco.internal:8000/v1"

# After — same SDK, same code path, now protected + scored + compliance-logged
client = OpenAI(base_url="https://geodesia.yourco.internal/v1")

resp = client.chat.completions.create(
    model="gpt-4o",                  # or any model your engine serves
    messages=[{"role": "user", "content": "..."}],
    stream=True,
)
# Every chunk that streams back is screened on 6 axes;
# if a risk barrier is crossed mid-sentence, the gateway halts and returns a BLOCK.
# A signed audit record is written for every call. Auto-PDFs available on demand.
01 · Context

RAG faithfulness vs the supplied passages. Token-level spans. OOD AUROC 0.881 (HaluEval).

02 · Context-injection

RAG-firewall: hostile instructions hidden inside a retrieved file. 0.995 (Gandalf).

03 · Closed-book

Confident fabrication via the model's own logprobs. Advisory, 0.769 OOD.

04 · Prompt safety

Adversarial intent + dual-concept boolean logic. 0.900 (XSTest).

05 · Answer safety

Harmful content in the response, scored as it streams. 0.922.

06 · Jailbreak

Attack structure, not keywords. 0.989 (jailbreak_cls).

The detector lives outside the model.
By design.

Older runtime-safety stacks lived inside the inference path: a patched vLLM, an architecture-specific hook, a hidden-state extractor wired to one model family. They worked — and they locked the customer into one model and one engine build. Geodesia G-1 inverts that. The detection engine is a single ~300M multimodal companion model that runs next to the model, reads only text and standard token logprobs, and is therefore truly model- and engine-agnostic — and fast enough to score all six axes in real time, ~30 ms for a 1024-token prompt on a single GPU.

Model-agnostic

One companion encoder serves Llama, Qwen, Mistral, Gemma, DeepSeek, Phi, gpt-4o, Claude — anything your engine can serve. Fine-tuned variants included.

Engine-agnostic

The proxy speaks the OpenAI and Ollama wire protocols. Your engine of choice runs official, unmodified. Upgrade your engine without breaking your safety layer.

Stack-agnostic

Cloud OpenAI today, self-hosted vLLM tomorrow, air-gapped llama.cpp on a defence-ministry HSM the day after. Same gateway. Same audit chain. Same PDFs.

Made in Europe with

Try it on your own engine.
Your own model.

Bring your existing OpenAI / Ollama base URL, your existing model, and your existing client code. We point our gateway at your engine, you point your client at our gateway, and you measure the difference. No SDK lift, no engine fork, no commitment.