Skip to main content
Documentation

Router API Reference

The router is the data-plane HTTP surface typically exposed through Envoy.

Version: Latest

Router API Reference

The router is the data-plane HTTP surface typically exposed through Envoy.

For control-plane endpoints such as health, config, and discovery, see Router Apiserver API.

Entry Points

SurfaceDefault PortPurpose
Envoy public ingress8801Client-facing routed HTTP APIs
ExtProc gRPC50051Internal Envoy external processing hook
Router apiserver8080Control and utility APIs such as /v1/models, /health, and /config/router

Frontend API

API surfacePublic pathStatusNotes
OpenAI Chat CompletionsPOST /v1/chat/completionsSupportedPrimary routed inference interface
OpenAI Responses APIPOST /v1/responsesSupportedInternally translated to Chat Completions
OpenAI Responses API retrievalGET /v1/responses/{id}SupportedRequires Response API service/store
OpenAI Responses API deleteDELETE /v1/responses/{id}SupportedRequires Response API service/store
OpenAI Responses API input itemsGET /v1/responses/{id}/input_itemsSupportedRequires Response API service/store
OpenAI Models APIGET /v1/modelsSupported on apiserverServed by :8080; can be re-exposed through Envoy if desired

Backend Model API

These are upstream model protocols the router can target after routing. They are backend-facing integrations, not necessarily public client ingress paths.

Backend model APIUpstream pathStatusNotes
OpenAI-compatible Chat Completions/chat/completionsSupportedDefault family used for OpenAI-compatible backends
Anthropic Messages API/v1/messagesSupportedRouter converts OpenAI-style requests to Anthropic format before forwarding
vLLM Omni Chat Completions/chat/completionsSupportedUsed for omni and image-generation backends such as vllm_omni

Provider families with OpenAI-compatible chat-completions defaults include openai, azure-openai, bedrock, gemini, and vertex-ai.

Frontend Behavior

OpenAI Chat Completions

  • Public request path: POST /v1/chat/completions
  • This is the main router ingress for routed inference.
  • Works with explicit model names or the router auto-model name such as MoM or auto.

Minimal request:

{
"model": "auto",
"messages": [
{
"role": "user",
"content": "What is the derivative of x^2?"
}
]
}

OpenAI Responses API

  • Public request paths:
    • POST /v1/responses
    • GET /v1/responses/{id}
    • DELETE /v1/responses/{id}
    • GET /v1/responses/{id}/input_items
  • The router translates POST /v1/responses into Chat Completions internally, then translates the backend response back into Responses API format.
  • Retrieval and delete paths require the Response API service/store to be enabled.

Minimal request:

{
"model": "auto",
"input": "Summarize the benefits of retrieval-augmented generation."
}

Backend Behavior

Anthropic API

  • The router can target Anthropic-backed models when a model is configured with api_format: anthropic.
  • Anthropic support lives in the backend model API layer.
  • Client ingress is still OpenAI-style Chat Completions or Responses API, not POST /v1/messages.
  • The router converts the upstream request to Anthropic POST /v1/messages and converts the response back to OpenAI-compatible output.
  • Streaming is not supported for Anthropic-backed routing.

vLLM Omni and Multimodal/Image Generation

  • The router supports multimodal/image-generation routing with omni models and image-generation backends.
  • vllm_omni is a supported image-generation backend type.
  • When a modality decision resolves to an omni model:
    • Chat Completions requests return the raw omni Chat Completions response.
    • Responses API requests are normalized into Responses API output items, including image_generation_call items when images are produced.
  • This is the path used for multimodal or image-generation decisions rather than a separate public protocol.

Configuration Linkage

Upstream targets and provider-specific behavior come from the standard router config:

providers:
models:
- name: claude-sonnet
api_format: anthropic
pricing:
currency: USD
prompt_per_1m: 3.0
completion_per_1m: 15.0
backend_refs:
- base_url: https://api.anthropic.com
provider: anthropic
  • Upstream routing targets are configured under providers.models[].backend_refs[].
  • Optional cost-aware policies can use pricing:.
  • Response API behavior is configured under global.services.response_api.
  • Modality and image-generation behavior is configured through routing decisions and image-generation backends such as vllm_omni.