How does the gateway handle authentication?

Server-side requests use x-api-key with gateway-issued keys (prefix tsg-). Browser-side requests use PKCE-flow exchange tokens so no provider keys are ever exposed in client code. Admin operations use a separate X-Admin-Key on the admin API.

What policy controls are available?

Per-org model allowlists, per-team rate limits, per-request budget caps, prompt logging policies, and data classification enforcement. Policies are evaluated server-side at request time and are immutable per request once committed.

LLM Gateway by ThreadSync · For Engineering Teams

One API.
Five frontier providers.

Multi-provider gateway for governed AI access. Route across Claude, GPT, Gemini, Grok, and Sonar through one governed access layer — with OpenAI-compatible request shapes, provider-key isolation, policy controls, per-request audit, and cost tracking. PKCE-flow browser sessions mean no provider keys ever leak into client code.

Request access → Book a 30-min walkthrough

Provider coverage

Governed access across the frontier model ecosystem.

Use provider diversity without creating five separate key-management, policy, cost, and audit surfaces. Route requests across leading AI providers through one governed access layer — with policy controls, per-request audit, cost tracking, and provider-key isolation. As provider model catalogs change, policies and routing can be updated centrally instead of forcing every application team to chase model-name changes.

Anthropic

Claude access governed by policy, keys, cost, and audit.

OpenAI

GPT-family routing with scoped access and spend visibility.

Google Gemini

Gemini access through the same governed request plane.

xAI

Grok-family access with central policy and audit controls.

Perplexity

Sonar access for search-grounded and reasoning workflows.

Provider availability, model access, and routing policies are configured per engagement and may vary by contract, region, and provider terms.

For platform teams

One endpoint, one policy layer, one integration path.

Add or swap providers without changing application code. Drop-in OpenAI-compatible request shape; provider routing is handled by the gateway, not by each service.

For security teams

Scoped keys, model allowlists, audit records, provider-key isolation.

API keys never reach client code. Per-team and per-environment policies. Audit trail per request with model, verdict, cost, and correlation IDs. Redaction and retention are configurable.

For finance teams

Per-request cost, budget caps, team and org allocation, usage export.

Spend visibility before procurement asks. Caps that stop runaway costs. Chargeback-ready usage exports across providers, teams, and environments.

Five providers. One audit gap.

Every team using AI right now has the same problem: one project on Anthropic, another on OpenAI, a third experiment with Gemini, a Slack bot calling xAI. Five vendor relationships, five sets of API keys, five different cost dashboards, no unified audit trail. When the CISO asks "what's our AI exposure?", you can't answer without piecing together five spreadsheets.

One endpoint. Five provider ecosystems. Governed.

LLM Gateway is OpenAI-compatible at the request shape, so existing code works with a single environment-variable change. What you gain on top of "switch base URL" is governance.

⚡

Policy-based routing + manual pinning

Policy-based routing can select models based on cost, capability, and availability rules. Pin a specific model when a workflow requires it. Failover behavior is configured per engagement.

🔑

Per-org keys + model allowlists

Issue scoped tsg-* keys per org or per team. Restrict which models each key can call. Rotate or revoke without touching upstream provider keys.

📊

Per-request cost tracking + budgets

Every request returns flattened token usage and cost. Cap monthly spend per key, per team, or per org. Usage dashboards where configured; no need to wait for provider invoices to understand spend.

🛡️

Per-request audit log

Records capture model, latency, cost, team, policy verdict, and correlation IDs. Prompt/response capture, retention, and SIEM export are configured per engagement. Hash-chained logs provide tamper-evident sequencing for audit records where configured.

🌐

PKCE browser-safe sessions

Browser apps exchange short-lived PKCE tokens server-side. Provider API keys never reach client code. Compliant with the way modern OAuth identity flows work.

🔁

Idempotent requests + memory

Optional idempotency keys deduplicate retries. Optional conversation memory persists context server-side so you don't pay to resend it on every turn.

Architecture at a glance

One request plane. Five provider ecosystems. Governance in the middle.

App / Workspace

LLM Gateway

Policy·Budget·Audit·Routing

Anthropic OpenAI Google Gemini xAI Perplexity

Drop-in compatible

Two equivalent request shapes. Pick whichever matches your existing code. The response shape is the same.

If your code already speaks OpenAI's chat-completions shape, send a system role message — exactly like you would to api.openai.com. Only the base URL and auth header change.

POST https://llmgateway.threadsync.io/v1/chat/completions

curl -X POST https://llmgateway.threadsync.io/v1/chat/completions \
  -H "x-api-key: tsg-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "max_tokens": 1024,
    "messages": [
      {"role": "system", "content": "You are a careful assistant."},
      {"role": "user",   "content": "Summarize this contract..."}
    ]
  }'

If you prefer the Anthropic-style shape (separate system field), the gateway accepts that too. Equivalent semantics; pick whichever your code already uses.

POST https://llmgateway.threadsync.io/v1/chat/completions

curl -X POST https://llmgateway.threadsync.io/v1/chat/completions \
  -H "x-api-key: tsg-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-opus-4-7",
    "max_tokens": 1024,
    "system": "You are a careful assistant.",
    "messages": [
      {"role": "user", "content": "Summarize this contract..."}
    ]
  }'

Response (normalized across providers)

{
  "content": "...",
  "model": "claude-opus-4-7",
  "provider": "anthropic",
  "usage": {"input_tokens": 412, "output_tokens": 184, "cost_usd": 0.0093}
}

Model names are examples. Allowed providers and models are configured per engagement and may vary by provider terms, region, and customer policy.

Both modes are valid for Claude, GPT, Gemini, Grok, and Sonar. The gateway flattens response payloads so your code reads data.content regardless of provider.

Built on ThreadSync's governance engine

LLM Gateway is the public API for the same governance engine that powers Magic Runtime and the Lift workspace. Policy evaluation, audit logging, and access control are equivalent across all three; LLM Gateway is the developer-facing entry point for teams that need governed AI access without adopting a full workspace product. Security overview →

TLS 1.3 in transit (where supported) AES-256 at rest (where storage is involved) Per-request audit log Controls mapped to SOC 2 TSC Hash-chained logs

Pricing

LLM Gateway is delivered as part of a ThreadSync platform engagement. Scope, capacity, and terms are set per engagement — talk to us about access.

Request access → See the architecture

From engagement to first request — same day

Once partner access and provider credentials are provisioned, the developer flow can run same-day:

Step 1 · Provision keys

Issue `tsg-*` key + set policy

Use the admin API or the workspace UI to create a key, scope it to an org, and define which models and monthly budget it can use. Keys are hot-rotatable; revocation takes effect on the next request after propagation completes.

Step 2 · Wire upstream

Provide your provider keys to the gateway

Anthropic / OpenAI / Google / xAI / Perplexity keys live server-side in the gateway, never in your client code. Per-provider quota and routing rules are configured once.

Step 3 · Switch base URL

Point your code at `llmgateway.threadsync.io`

Existing OpenAI-compatible code works as-is — just swap the base URL and auth header. Use role-based system messages in OpenAI-compatible mode, or the top-level system field in ThreadSync normalized mode. Response is flattened to data.content.

Step 4 · Watch the dashboard

Cost, latency, audit — live

Every request shows provider, model, tokens, cost, latency, and policy verdict. Filter by key, team, or org. SIEM webhook and scheduled S3 export are available where configured per engagement.

How LLM Gateway fits in the platform

LLM Gateway is the developer-facing surface. Magic Runtime uses the gateway for AI calls inside its sandboxed execution layer. Lift exposes the gateway through a workspace UI for design-partner teams that prefer a UI to API code. All three share the same governance engine — engagement scope determines which surface a partner uses.

See how the products fit together →

One API. Five providers. Governed.

OpenAI-compatible. Provider-governed. Request-level audit records. Access is engagement-only.