Features
- Long-term Persistence: Stores conversations, facts, rules, and skills in vector databases (Pinecone, Upstash).
- Intelligent Retrieval:
- Auto Mode: Fast, category-based retrieval for general context.
- Conscious Mode: LLM-driven dynamic queries that filter by category, lifespan, importance, and status.
- High-Performance Caching:
- Multi-Level Caching: In-memory (local) and Redis (shared) support.
- Smart Invalidation: Caches rules, facts, skills, and context separately with configurable TTLs.
- Parallel Fetching: Optimizes latency by fetching from cache and vector stores concurrently.
- Automatic Ingestion: Asynchronously processes conversation history to extract and store new memories.
Installation
Configuration
Ensure the following environment variables are set:OPENAI_KEY: Required for embeddings and memory processing.PINECONE_API_KEY/UPSTASH_VECTOR_REST_URL: Depending on your chosen vector service.REDIS_URL: (Optional) For shared caching.
Initialization
Initialize the memory system with a KarmaAI client, a user ID, and an optional scope (e.g., project ID, session ID).Usage
1. Simple Usage (Managed Flow)
The easiest way to use Karma Memory is via the built-inChatCompletion methods. These handle context retrieval, prompt augmentation, and history updates automatically.
2. Advanced Usage (Manual Control)
For integration with existing chat loops or custom LLM calls, you can manually retrieve context and update history. Step 1: Retrieve Context Before sending a prompt to your LLM, fetch relevant context.Caching
Caching significantly reduces latency and vector database costs.In-Memory Cache
Best for single-instance deployments.Redis Cache
Best for distributed deployments where multiple instances share the same memory state.Retrieval Modes
You can switch between retrieval modes based on your application’s needs:RetrievalModeAuto(Default):- Fast and cost-effective.
- Always retrieves active rules, facts, skills, and context.
- Uses the raw user prompt for vector search.
RetrievalModeConscious:- Smarter but slightly higher latency.
- Uses an LLM to analyze the user’s prompt and generate a dynamic search query.
- Filters memories by specific categories, lifespans, or importance levels relevant to the current query.