Milvus Semantic Cache
This guide covers deploying Milvus as the semantic cache backend for the Semantic Router in Kubernetes. Milvus provides persistent, scalable vector storage compared to the default in-memory cache.
Milvus is optional. The router works with the default memory backend out of the box. Use Milvus when you need persistence, horizontal scaling, or cache sharing across router replicas.
Deployment Options
Two approaches are available:
- Helm: Quick start and parameterized deployments
- Milvus Operator: Production-grade lifecycle management, rolling upgrades, health checks, and dependency orchestration
Prerequisites
- Kubernetes cluster with
kubectlconfigured - Default
StorageClassavailable - Helm 3.x installed
The default Helm values enable ServiceMonitor for Prometheus metrics collection, which requires Prometheus Operator to be installed first.
For testing without Prometheus Operator, disable ServiceMonitor using --set metrics.serviceMonitor.enabled=false (see deployment commands below).
Deploy with Helm
Standalone Mode
Suitable for development and small-scale deployments:
helm repo add milvus https://zilliztech.github.io/milvus-helm/
helm repo update
Without Prometheus Operator (for testing/development):
helm install milvus-semantic-cache milvus/milvus \
--set cluster.enabled=false \
--set etcd.replicaCount=1 \
--set minio.mode=standalone \
--set pulsar.enabled=false \
--set metrics.serviceMonitor.enabled=false \
--namespace vllm-semantic-router-system --create-namespace
With Prometheus Operator (production with monitoring):
helm install milvus-semantic-cache milvus/milvus \
--set cluster.enabled=false \
--set etcd.replicaCount=1 \
--set minio.mode=standalone \
--set pulsar.enabled=false \
--namespace vllm-semantic-router-system --create-namespace
Cluster Mode
Recommended for production with high availability:
helm repo add milvus https://zilliztech.github.io/milvus-helm/
helm repo update
Milvus 2.4+ uses Pulsar v3 by default. The values below disable the old Pulsar to avoid conflicts.
Without Prometheus Operator (for testing):
helm install milvus-semantic-cache milvus/milvus \
--set cluster.enabled=true \
--set etcd.replicaCount=3 \
--set minio.mode=distributed \
--set pulsar.enabled=false \
--set pulsarv3.enabled=true \
--set metrics.serviceMonitor.enabled=false \
--namespace vllm-semantic-router-system --create-namespace
With Prometheus Operator (production with monitoring):
helm install milvus-semantic-cache milvus/milvus \
--set cluster.enabled=true \
--set etcd.replicaCount=3 \
--set minio.mode=distributed \
--set pulsar.enabled=false \
--set pulsarv3.enabled=true \
--namespace vllm-semantic-router-system --create-namespace
Deploy with Milvus Operator
-
Install Milvus Operator following the official instructions
-
Apply the Custom Resource:
Standalone:
kubectl apply -n vllm-semantic-router-system -f - <<EOF
apiVersion: milvus.io/v1beta1
kind: Milvus
metadata:
name: milvus-standalone
spec:
mode: standalone
components:
disableMetrics: false
dependencies:
storage:
inCluster:
values:
mode: standalone
deletionPolicy: Delete
pvcDeletion: true
etcd:
inCluster:
values:
replicaCount: 1
config: {}
EOF
Cluster:
kubectl apply -n vllm-semantic-router-system -f - <<EOF
apiVersion: milvus.io/v1beta1
kind: Milvus
metadata:
name: milvus-cluster
spec:
mode: cluster
components:
disableMetrics: false
dependencies:
storage:
inCluster:
values:
mode: distributed
deletionPolicy: Retain
pvcDeletion: false
etcd:
inCluster:
values:
replicaCount: 3
pulsar:
inCluster:
values:
broker:
replicaCount: 1
config: {}
EOF
Configure Semantic Router
Apply Milvus Client Config
kubectl apply -n vllm-semantic-router-system -f - <<EOF
apiVersion: v1
kind: ConfigMap
metadata:
name: milvus-client-config
data:
milvus.yaml: |
connection:
host: "milvus-semantic-cache.vllm-semantic-router-system.svc.cluster.local"
port: 19530
timeout: 60
auth:
enabled: false
tls:
enabled: false
collection:
name: "semantic_cache"
description: "Semantic cache"
vector_field:
name: "embedding"
dimension: 384
metric_type: "IP"
index:
type: "HNSW"
params:
M: 16
efConstruction: 64
search:
params:
ef: 64
topk: 10
consistency_level: "Session"
development:
auto_create_collection: true
verbose_errors: true
EOF
Update Router Config
Ensure these settings in your canonical router configuration:
global:
stores:
semantic_cache:
enabled: true
backend_type: "milvus"
milvus:
connection:
host: "milvus"
port: 19530
collection:
name: "semantic_cache"
Networking and Security
Network Policy
Restrict access to Milvus:
kubectl apply -n vllm-semantic-router-system -f - <<EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-router-to-milvus
spec:
podSelector:
matchLabels:
app.kubernetes.io/name: milvus
policyTypes:
- Ingress
ingress:
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: vllm-semantic-router-system
podSelector:
matchLabels:
app.kubernetes.io/name: semantic-router
ports:
- protocol: TCP
port: 19530
EOF
TLS and Authentication
- Create secrets for credentials and certificates:
# Auth credentials
kubectl create secret generic milvus-auth -n vllm-semantic-router-system \
--from-literal=username="YOUR_USERNAME" \
--from-literal=password="YOUR_PASSWORD"
# TLS certificates
kubectl create secret generic milvus-tls -n vllm-semantic-router-system \
--from-file=ca.crt=/path/to/ca.crt \
--from-file=client.crt=/path/to/client.crt \
--from-file=client.key=/path/to/client.key
- Update Milvus client configuration:
connection:
host: "milvus-cluster.vllm-semantic-router-system.svc.cluster.local"
port: 19530
timeout: 60
auth:
enabled: true
username: "${MILVUS_USERNAME}"
password: "${MILVUS_PASSWORD}"
tls:
enabled: true
Wire environment variables or projected Secret volumes to the router deployment and reference them in the config.
Storage
Ensure a default StorageClass exists. Milvus Helm chart and Operator automatically create necessary PVCs for etcd and MinIO.