OmniInference

Enterprise-grade, cloud-native AI gateway, abstraction layer, and observability proxy. OmniInference sits between application code and upstream LLM providers — routing, authenticating, observing, and failing over inference calls without any vendor SDK touching your application.

Six Architectural Pillars

#	Pillar	Mechanism
1	Provider Agnostic	Apps call a single unified API; `core.Provider` interface is the only contract
2	Comprehensive Observability	Every call emits a structured JSON log — tokens, latency, model version, hashed input
3	Decoupled Auth	Bearer-token enforcement at the gateway edge; consuming apps never handle credentials
4	Explicit Data Paths	Wire ↔ internal translation happens in one place (`providers/openai`); API keys are never logged
5	Built-in Resiliency	Named route chains with automatic failover on rate-limit / timeout / 5xx errors
6	Cost Transparency	Per-request `omni_metadata` (team, feature, env) propagated through all telemetry

Component Diagram

graph TB
    subgraph Clients
        APP[Application / SDK]
    end

    subgraph OmniInference Gateway
        direction TB
        MW_RID[Middleware: RequestID]
        MW_AUTH[Middleware: Auth]
        MW_OBS[Middleware: ObservabilityLog]
        HANDLER[Handler: /v1/chat/completions]
        ROUTER[Router + Fallback Engine]

        MW_RID --> MW_AUTH --> MW_OBS --> HANDLER --> ROUTER
    end

    subgraph Core Domain
        TYPES[core/types.go<br/>InferenceRequest / Response]
        IFACE[core/provider.go<br/>Provider interface]
        ERRS[core/errors.go<br/>ProviderError taxonomy]
    end

    subgraph Provider Registry
        REG[providers/Registry]
        OAI[providers/openai<br/>OpenAI · Azure OAI · vLLM]
        FUTURE[providers/...<br/>Bedrock · Vertex · Anthropic]
    end

    subgraph Observability
        HASH[internal/observability<br/>SHA-256 input hash]
        LOGS[Structured JSON Logs<br/>slog — stdout]
    end

    subgraph Upstream LLMs
        LLM1[OpenAI / Azure OpenAI]
        LLM2[Local vLLM]
        LLM3[AWS Bedrock ·<br/>Vertex AI · Anthropic]
    end

    APP -->|POST /v1/chat/completions<br/>Bearer token| MW_RID
    ROUTER --> REG
    REG --> OAI
    REG -.->|future| FUTURE
    OAI --> LLM1
    OAI --> LLM2
    FUTURE -.-> LLM3
    HANDLER --> HASH
    HASH --> LOGS
    MW_OBS --> LOGS

    TYPES -.-> HANDLER
    IFACE -.-> OAI
    ERRS -.-> ROUTER

Request Sequence Diagram

The sequence below shows a request that hits a rate-limit on the primary provider and automatically fails over to a secondary provider.

sequenceDiagram
    autonumber
    actor Client
    participant GW as Gateway<br/>(HTTP Server)
    participant MW_RID as Middleware<br/>RequestID
    participant MW_AUTH as Middleware<br/>Auth
    participant MW_OBS as Middleware<br/>ObsLog
    participant Handler as Handler<br/>/v1/chat/completions
    participant Router as Router<br/>Fallback Engine
    participant Hash as Observability<br/>SHA-256 Hash
    participant P1 as Provider<br/>azure-openai-east
    participant P2 as Provider<br/>azure-openai-west

    Client->>GW: POST /v1/chat/completions<br/>Authorization: Bearer <token>

    GW->>MW_RID: forward request
    MW_RID->>MW_RID: generate / reuse X-Request-ID
    MW_RID-->>GW: inject ID into context + response header

    GW->>MW_AUTH: forward request
    MW_AUTH->>MW_AUTH: validate Bearer token
    alt Invalid token
        MW_AUTH-->>Client: 401 Unauthorized
    end

    GW->>MW_OBS: forward request
    MW_OBS->>MW_OBS: record start time

    MW_OBS->>Handler: forward request

    Handler->>Handler: decode JSON body<br/>build InferenceRequest
    Handler->>Hash: HashMessages(req.Messages)
    Hash-->>Handler: SHA-256 hex digest<br/>(no raw PII stored)

    Handler->>Router: Complete(ctx, req, omni_route, omni_provider)

    Router->>Router: resolve provider chain<br/>["azure-openai-east", "azure-openai-west"]

    Router->>P1: Complete(ctx, req) — attempt 1
    P1-->>Router: ProviderError{Kind: rate_limit, HTTP: 429}

    Note over Router: ErrKind.IsRetryable() == true<br/>advance to next provider

    Router->>P2: Complete(ctx, req) — attempt 2 (fallback)
    P2-->>Router: InferenceResponse{choices, usage, telemetry}

    Router->>Router: set FallbackOccurred=true<br/>RoutedProvider="azure-openai-west"
    Router-->>Handler: InferenceResponse

    Handler->>Handler: stamp GatewayLatency + InputHash
    Handler->>MW_OBS: emit InferenceLog<br/>{request_id, routed_provider, input_hash,<br/>prompt_tokens, completion_tokens,<br/>provider_latency, gateway_latency,<br/>fallback_occurred, metadata}

    Handler-->>MW_OBS: 200 OK + JSON response body
    MW_OBS->>MW_OBS: emit request log<br/>{method, path, status, gateway_latency}
    MW_OBS-->>Client: 200 OK + InferenceResponse<br/>X-Request-ID: <id>

Project Layout

OmniInference/
├── cmd/
│   └── omniinference/
│       └── main.go              # Entrypoint — reads env config, starts server
├── core/
│   ├── types.go                 # InferenceRequest, InferenceResponse, ModelRef, Usage, Telemetry
│   ├── provider.go              # Provider interface + InferenceStreamChunk
│   └── errors.go                # ProviderError, ErrKind taxonomy
├── providers/
│   ├── registry.go              # Registry factory (NewRegistry, NewRegistryFromProviders)
│   └── openai/
│       └── adapter.go           # OpenAI-compatible adapter (OpenAI · Azure OAI · vLLM)
├── gateway/
│   ├── config.go                # Config, ConfigFromEnv, MarshalSafe (redacts secrets)
│   ├── server.go                # Server — HTTP lifecycle + middleware chain assembly
│   ├── handler.go               # /v1/chat/completions handler
│   ├── middleware/
│   │   ├── request_id.go        # X-Request-ID injection
│   │   ├── auth.go              # Bearer-token gate (Pillar 3)
│   │   └── observability.go     # Structured slog request + inference telemetry
│   └── router/
│       └── router.go            # Router — named route chains, automatic failover (Pillar 5)
└── internal/
    └── observability/
        └── hash.go              # SHA-256 input hashing (Pillar 4)

Quick Start

Environment Variables

Variable	Required	Description
`OMNI_OPENAI_API_KEY`	yes	Bearer token for the OpenAI-compatible provider
`OMNI_OPENAI_BASE_URL`	no	Override endpoint (Azure OAI, vLLM, etc.) Default: `https://api.openai.com/v1`
`OMNI_AUTH_TOKENS`	no	Comma-separated list of valid client bearer tokens. Empty = auth disabled
`OMNI_DEFAULT_PROVIDER`	no	Provider name to use when no route is specified. Default: `openai`
`OMNI_PORT`	no	Listen port. Default: `8080`
`OMNI_LOG_LEVEL`	no	`debug` \| `info` \| `warn` \| `error`. Default: `info`

Run Locally

export OMNI_OPENAI_API_KEY=sk-...
export OMNI_AUTH_TOKENS=my-local-token
go run ./cmd/omniinference

Send a Request

curl http://localhost:8080/v1/chat/completions \
  -H "Authorization: Bearer my-local-token" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello"}],
    "omni_metadata": {"team": "platform", "feature": "chat"}
  }'

Route with Automatic Failover

Use omni_route to select a named route chain configured on the gateway:

curl http://localhost:8080/v1/chat/completions \
  -H "Authorization: Bearer my-local-token" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello"}],
    "omni_route": "primary-chain",
    "omni_metadata": {"team": "platform", "feature": "chat"}
  }'

Pin to a Specific Provider

curl http://localhost:8080/v1/chat/completions \
  -H "Authorization: Bearer my-local-token" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello"}],
    "omni_provider": "vllm-local"
  }'

Health Check

curl http://localhost:8080/healthz
# {"status":"ok"}

Observability Output

Every inference call emits two structured JSON log lines to stdout.

Request log (emitted by ObservabilityLog middleware):

{
  "time": "2026-06-25T12:00:00Z",
  "level": "INFO",
  "msg": "request",
  "request_id": "a3f1...",
  "method": "POST",
  "path": "/v1/chat/completions",
  "status": 200,
  "response_bytes": 512,
  "gateway_latency": 423000000
}

Inference log (emitted by the handler after dispatch):

{
  "time": "2026-06-25T12:00:00Z",
  "level": "INFO",
  "msg": "inference",
  "request_id": "a3f1...",
  "routed_provider": "azure-openai-west",
  "input_hash": "e3b0c44298fc1c14...",
  "prompt_tokens": 42,
  "completion_tokens": 87,
  "total_tokens": 129,
  "provider_latency": 380000000,
  "gateway_latency": 423000000,
  "fallback_occurred": true,
  "metadata": "{\"team\":\"platform\",\"feature\":\"chat\"}"
}

Running Tests

go test ./...
go vet ./...

Roadmap

Additional provider adapters: AWS Bedrock, Vertex AI, Anthropic
Retry backoff with jitter (exponential, configurable per route)
Per-key / per-team rate limiting
Streaming SSE passthrough (/v1/chat/completions with stream: true)
YAML config file loader (supplement env-var config)
Prometheus metrics exporter
DB-backed audit log persistence
Admin API: live route config reload, provider health status

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OmniInference

Six Architectural Pillars

Component Diagram

Request Sequence Diagram

Project Layout

Quick Start

Environment Variables

Run Locally

Send a Request

Route with Automatic Failover

Pin to a Specific Provider

Health Check

Observability Output

Running Tests

Roadmap

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
cmd/omniinference		cmd/omniinference
core		core
gateway		gateway
internal/observability		internal/observability
providers		providers
README.md		README.md
go.mod		go.mod

Folders and files

Latest commit

History

Repository files navigation

OmniInference

Six Architectural Pillars

Component Diagram

Request Sequence Diagram

Project Layout

Quick Start

Environment Variables

Run Locally

Send a Request

Route with Automatic Failover

Pin to a Specific Provider

Health Check

Observability Output

Running Tests

Roadmap

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages