Installation
This guide will help you install and run the vLLM Semantic Router. The router runs entirely on CPU and does not require GPU for inference.
System Requirements
No GPU required - the router runs efficiently on CPU using optimized BERT models.
Requirements:
- Python: 3.10 or higher
- Container Runtime: Docker (required for running the router container)
Quick Start
1. Use the one-line installer (macOS/Linux)
curl -fsSL https://vllm-semantic-router.com/install.sh | bash
The installer:
- Detects Python 3.10 or newer
- Installs
vllm-srinto~/.local/share/vllm-sr - Writes a launcher to
~/.local/bin/vllm-sr - Prepares Docker for
vllm-sr serveunless you opt out - Starts
vllm-sr serveautomatically and opens the dashboard when possible - Prints dashboard access and remote-server hints if a browser cannot be opened
Useful variants:
# Install only the CLI
curl -fsSL https://vllm-semantic-router.com/install.sh | bash -s -- --mode cli
# Skip runtime bootstrap and start Docker yourself later
curl -fsSL https://vllm-semantic-router.com/install.sh | bash -s -- --runtime skip
# Force the first launch onto the AMD/ROCm path
curl -fsSL https://vllm-semantic-router.com/install.sh | bash -s -- --platform amd
# Install without auto-starting serve + dashboard
curl -fsSL https://vllm-semantic-router.com/install.sh | bash -s -- --no-launch
# Skip runtime bootstrap and keep only userland install steps
curl -fsSL https://vllm-semantic-router.com/install.sh | bash -s -- --runtime skip
If ~/.local/bin is not already on your PATH, the installer prints the export line to add it.
Agent install (OpenClaw and similar agents)
Use the agent flow when another local agent should perform the install on your behalf. Keep the same supported installer, but tell the agent to use the no-launch path so it does not auto-start vllm-sr serve or open a browser during the handoff.
Preferred long-term path: publish or install the repo-managed openclaw-vsr-bridge skill from this repository (dashboard/backend/skillpacks/openclaw-vsr-bridge/SKILL.md). Until that registry entry exists, use the hosted prompt files below so homepage and agent copy stay short.
Hosted prompt files:
- CLI-only prompt: https://vllm-semantic-router.com/install/agent/vllm-sr-cli.md
- Install-and-bridge prompt: https://vllm-semantic-router.com/install/agent/openclaw-vsr-bridge.md
CLI-only prompt:
Fetch and follow https://vllm-semantic-router.com/install/agent/vllm-sr-cli.md.
Install-and-bridge prompt:
Fetch and follow https://vllm-semantic-router.com/install/agent/openclaw-vsr-bridge.md.
The hosted Markdown files still point back to the same supported installer. OpenClaw handoff still happens later through the bridge skill or vllm-sr config import --from openclaw, not through a second installer.
Windows users should use the manual PyPI flow below.
2. Manual PyPI install
# Create a virtual environment (recommended)
python -m venv vsr
source vsr/bin/activate # On Windows: vsr\Scripts\activate
# Install from PyPI
pip install vllm-sr
Verify installation:
vllm-sr --version
3. Restart vllm-sr later
vllm-sr serve
If you skipped --no-launch, the installer already ran one vllm-sr serve for you.
If config.yaml does not exist yet in the current directory, vllm-sr serve bootstraps a minimal setup config and starts the dashboard in setup mode.
The router will:
- Automatically download required ML models (~1.5GB, one-time)
- Start the dashboard on port 8700
- Start the
vllm-sr-simsidecar on port 8810 - Start Envoy proxy on port 8888 after activation
- Start the semantic router service after activation
- Enable metrics on port 9190
4. Open the Dashboard
Open http://localhost:8700 in your browser.
If you ran the installer on a remote server and the browser did not open automatically, use the URL and SSH tunnel hint printed by the installer.
For first-run setup:
- Configure one or more models.
- Choose a routing preset or keep the single-model baseline.
- Activate the generated config.
After activation, config.yaml is written to the current directory and the router exits setup mode.
5. Test the Router
curl http://localhost:8888/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "MoM",
"messages": [{"role": "user", "content": "Hello!"}]
}'
6. Optional: open the dashboard from the CLI
vllm-sr dashboard
Common Commands
# View logs
vllm-sr logs router # Router logs
vllm-sr logs envoy # Envoy logs
vllm-sr logs simulator # Fleet simulator sidecar logs
vllm-sr logs router -f # Follow logs
# Check status
vllm-sr status # Includes simulator sidecar state
# Stop the router
vllm-sr stop
Advanced Configuration
YAML-first workflow
If you prefer to edit YAML directly instead of using the dashboard setup flow:
# Validate your canonical config before serving
vllm-sr validate config.yaml
vllm-sr init was removed in v0.3. Create config.yaml directly with the canonical version/listeners/providers/routing/global layout, migrate an older file with vllm-sr config migrate --config old-config.yaml, or import supported OpenClaw model providers with vllm-sr config import --from openclaw.
HuggingFace Settings
Set environment variables before starting:
export HF_ENDPOINT=https://huggingface.co # Or mirror: https://hf-mirror.com
export HF_TOKEN=your_token_here # Only for gated models
export HF_HOME=/path/to/cache # Custom cache directory
vllm-sr serve
Custom Options
# Use custom config file
vllm-sr serve --config my-config.yaml
# Use custom Docker image
vllm-sr serve --image ghcr.io/vllm-project/semantic-router/vllm-sr:latest
# Control image pull policy
vllm-sr serve --image-pull-policy always
Kubernetes Deployment
For production deployments on Kubernetes or OpenShift, use the Kubernetes Operator: