vLLM Semantic Router
System-Level Intelligence for Mixture-of-Models (MoM) - An intelligent routing layer that brings collective intelligence to LLM systems. Acting as an Envoy External Processor (ExtProc), it uses a signal-driven decision engine and plugin chain architecture to capture missing signals, make better routing decisions, and secure your LLM infrastructure.
Project Goals
We are building the System Level Intelligence for Mixture-of-Models (MoM), bringing Collective Intelligence into LLM systems, answering:
- How to capture the missing signals in request, response and context?
- How to combine the signals to make better decisions?
- How to collaborate more efficiently between different models?
- How to secure the real world and LLM system from jailbreaks, PII leaks, hallucinations?
- How to collect valuable signals and build a self-learning system?
Core Architecture
Signal-Driven Decision Engine
Captures and combines 6 types of signals to make intelligent routing decisions:
| Signal Type | Description | Use Case |
|---|---|---|
| keyword | Pattern matching with AND/OR operators | Fast rule-based routing for specific terms |
| embedding | Semantic similarity using embeddings | Intent detection and semantic understanding |
| domain | MMLU domain classification (14 categories) | Academic and professional domain routing |
| fact_check | ML-based fact-checking requirement detection | Identify queries needing fact verification |
| user_feedback | User satisfaction and feedback classification | Handle follow-up messages and corrections |
| preference | LLM-based route preference matching | Complex intent analysis via external LLM |
How it works: Signals are extracted from requests, combined using AND/OR operators in decision rules, and used to select the best model and configuration.