Agentic Memory
Executive Summary
This document describes a Proof of Concept for Agentic Memory in the Semantic Router. Agentic Memory enables AI agents to remember information across sessions, providing continuity and personalization.
⚠️ POC Scope: This is a proof of concept, not a production design. The goal is to validate the core memory flow (retrieve → inject → extract → store) with acceptable accuracy. Production hardening (error handling, scaling, monitoring) is out of scope.
Core Capabilities
| Capability | Description |
|---|---|
| Memory Retrieval | Embedding-based search with simple pre-filtering |
| Memory Saving | LLM-based extraction of facts and procedures |
| Cross-Session Persistence | Memories stored in Milvus (survives restarts; production backup/HA not tested) |
| User Isolation | Memories scoped per user_id (see note below) |
⚠️ User Isolation - Milvus Performance Note:
Approach POC Production (10K+ users) Simple filter ✅ Filter by user_idafter search❌ Degrades: searches all users, then filters Partition Key ❌ Overkill ✅ Physical separation, O(log N) per user Scalar Index ❌ Overkill ✅ Index on user_idfor fast filteringPOC: Uses simple metadata filtering (sufficient for testing).
Production: Configureuser_idas Partition Key or Scalar Indexed Field in Milvus schema.
Key Design Principles
- Simple pre-filter decides if query should search memory
- Context window from history for query disambiguation
- LLM extracts facts and classifies type when saving
- Threshold-based filtering on search results
Explicit Assumptions (POC)
| Assumption | Implication | Risk if Wrong |
|---|---|---|
| LLM extraction is reasonably accurate | Some incorrect facts may be stored | Memory contamination (fixable via Forget API) |
| 0.6 similarity threshold is a starting point | May need tuning (miss relevant or include irrelevant) | Adjustable based on retrieval quality logs |
| Milvus is available and configured | Feature disabled if down | Graceful degradation (no crash) |
| Embedding model produces 384-dim vectors | Must match Milvus schema | Startup failure (detectable) |
| History available via Response API chain | Required for context | Skip memory if unavailable |