Geodesia G-1 is a real-time reverse proxy that speaks the OpenAI Chat Completions and Ollama protocols. Your application keeps speaking OpenAI; G-1 forwards to your engine of choice — vLLM (official, unmodified), SGLang, TensorRT-LLM, llama.cpp, Ollama, or any OpenAI-compatible cloud endpoint — and screens every turn on six axes with our proprietary multimodal model, in ~30 ms for a 1024-token prompt on a single GPU. Change one base URL.
The only signal G-1 needs from your engine is the standard logprobs: true flag on the OpenAI API. All six engines below expose it natively — including Ollama, whose recent releases ship full per-token logprob support over its OpenAI-compatible endpoint. All six detection axes run end-to-end on every engine.
| Engine | Form | Logprobs | Context halluc. | Context-injection | Closed-book halluc. | Prompt / Answer safety | Jailbreak | Streaming brakes |
|---|---|---|---|---|---|---|---|---|
| vLLMofficial 0.21+ · unmodified | Self-hosted | ✓ native | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| SGLangOpenAI-compatible · structured outputs | Self-hosted | ✓ native | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| TensorRT-LLMNVIDIA · production throughput | Self-hosted | ✓ native | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| llama.cppedge / GGUF · OpenAI server | Self-hosted / edge | ✓ native | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| OpenAI APIcloud · GPT-4o / GPT-5 family | Cloud | ✓ native | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| Ollamalocal desktop · gguf models · recent versions | Local desktop | ✓ native | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
Any OpenAI-compatible endpoint that emits logprobs works. We have validated the six engines above end-to-end; the proxy is engine-agnostic, so additional engines (LMDeploy, MLC-LLM, MLX-LM, Azure OpenAI, OpenRouter, Together, Fireworks, Groq, etc.) integrate the same way.
No SDK to install. No model to retrain. No engine to fork. The Geodesia gateway speaks the OpenAI Chat Completions protocol; your existing code base only needs to flip one base URL. Every prompt is screened, every answer is scored on six axes in real time, and a compliance-grade audit chain is written on the way through.
# Before — your existing code, talking to OpenAI / vLLM / SGLang / etc. from openai import OpenAI client = OpenAI(base_url="https://api.openai.com/v1") # or self-hosted: base_url="http://vllm.yourco.internal:8000/v1" # After — same SDK, same code path, now protected + scored + compliance-logged client = OpenAI(base_url="https://geodesia.yourco.internal/v1") resp = client.chat.completions.create( model="gpt-4o", # or any model your engine serves messages=[{"role": "user", "content": "..."}], stream=True, ) # Every chunk that streams back is screened on 6 axes; # if a risk barrier is crossed mid-sentence, the gateway halts and returns a BLOCK. # A signed audit record is written for every call. Auto-PDFs available on demand.
RAG faithfulness vs the supplied passages. Token-level spans. OOD AUROC 0.881 (HaluEval).
RAG-firewall: hostile instructions hidden inside a retrieved file. 0.995 (Gandalf).
Confident fabrication via the model's own logprobs. Advisory, 0.769 OOD.
Adversarial intent + dual-concept boolean logic. 0.900 (XSTest).
Harmful content in the response, scored as it streams. 0.922.
Attack structure, not keywords. 0.989 (jailbreak_cls).
Older runtime-safety stacks lived inside the inference path: a patched vLLM, an architecture-specific hook, a hidden-state extractor wired to one model family. They worked — and they locked the customer into one model and one engine build. Geodesia G-1 inverts that. The detection engine is a single ~300M multimodal companion model that runs next to the model, reads only text and standard token logprobs, and is therefore truly model- and engine-agnostic — and fast enough to score all six axes in real time, ~30 ms for a 1024-token prompt on a single GPU.
One companion encoder serves Llama, Qwen, Mistral, Gemma, DeepSeek, Phi, gpt-4o, Claude — anything your engine can serve. Fine-tuned variants included.
The proxy speaks the OpenAI and Ollama wire protocols. Your engine of choice runs official, unmodified. Upgrade your engine without breaking your safety layer.
Cloud OpenAI today, self-hosted vLLM tomorrow, air-gapped llama.cpp on a defence-ministry HSM the day after. Same gateway. Same audit chain. Same PDFs.