How to set up vector memory for agents¶

This guide shows you how to enable your Xaibo agents to store and retrieve information using vector embeddings, allowing them to remember and search through previous conversations and documents.

Install memory dependencies¶

Install the required dependencies for local embeddings:

pip install xaibo[local]

This includes sentence-transformers, tiktoken, and other memory-related packages.

Configure basic vector memory¶

Add vector memory to your agent configuration:

# agents/memory_agent.yml
id: memory-agent
description: An agent with vector memory capabilities
modules:
  - module: xaibo.primitives.modules.llm.OpenAILLM
    id: llm
    config:
      model: gpt-4.1-nano

  # Text chunker for splitting documents
  - module: xaibo.primitives.modules.memory.TokenChunker
    id: chunker
    config:
      window_size: 512
      window_overlap: 50
      encoding_name: "cl100k_base"

  # Embedder for converting text to vectors
  - module: xaibo.primitives.modules.memory.SentenceTransformerEmbedder
    id: embedder
    config:
      model_name: "all-MiniLM-L6-v2"

  # Vector index for storage and retrieval
  - module: xaibo.primitives.modules.memory.NumpyVectorIndex
    id: vector_index
    config:
      storage_dir: "./memory_storage"

  # Main vector memory module
  - module: xaibo.primitives.modules.memory.VectorMemory
    id: memory
    config:
      memory_file_path: "./agent_memory.pkl"

  - module: xaibo.primitives.modules.orchestrator.SimpleToolOrchestrator
    id: orchestrator
    config:
      system_prompt: |
        You are a helpful assistant with memory capabilities.
        You can remember information from previous conversations.

exchange:
  # Connect memory components
  - module: memory
    protocol: ChunkerProtocol
    provider: chunker
  - module: memory
    protocol: EmbedderProtocol
    provider: embedder
  - module: memory
    protocol: VectorIndexProtocol
    provider: vector_index
  - module: orchestrator
    protocol: MemoryProtocol
    provider: memory

Use OpenAI embeddings¶

Configure OpenAI embeddings for higher quality vectors:

pip install xaibo[openai]

# Replace the embedder module with OpenAI
modules:
  - module: xaibo.primitives.modules.memory.OpenAIEmbedder
    id: embedder
    config:
      model: "text-embedding-3-small"
      api_key: ${OPENAI_API_KEY}
      dimensions: 1536

Set your API key:

export OPENAI_API_KEY=your_openai_api_key_here

Configure Hugging Face embeddings¶

Use Hugging Face models for embeddings:

modules:
  - module: xaibo.primitives.modules.memory.HuggingFaceEmbedder
    id: embedder
    config:
      model_name: "sentence-transformers/all-mpnet-base-v2"
      device: "cuda"  # Use "cpu" if no GPU available
      max_length: 512
      pooling_strategy: "mean"

Popular Hugging Face embedding models:

sentence-transformers/all-mpnet-base-v2 - High quality, balanced
sentence-transformers/all-MiniLM-L6-v2 - Fast and lightweight
sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 - Multilingual support

Customize text chunking¶

Configure chunking strategy for your content:

modules:
  - module: xaibo.primitives.modules.memory.TokenChunker
    id: chunker
    config:
      window_size: 1024      # Larger chunks for documents
      window_overlap: 100    # More overlap for context
      encoding_name: "cl100k_base"  # GPT-4 tokenizer

Chunking strategies:

Small chunks (256-512 tokens): Better for precise retrieval
Medium chunks (512-1024 tokens): Balanced approach
Large chunks (1024-2048 tokens): Better context preservation

Store memory persistently¶

Configure persistent storage for your vector memory:

modules:
  - module: xaibo.primitives.modules.memory.NumpyVectorIndex
    id: vector_index
    config:
      storage_dir: "/path/to/persistent/storage"

  - module: xaibo.primitives.modules.memory.VectorMemory
    id: memory
    config:
      memory_file_path: "/path/to/persistent/memory.pkl"

Create the storage directory:

mkdir -p /path/to/persistent/storage

Best practices¶

Embedding model selection¶

Use OpenAI embeddings for highest quality
Use local models for privacy and cost control
Choose model size based on performance requirements

Chunking strategy¶

Smaller chunks for precise retrieval
Larger chunks for better context
Adjust overlap based on content type

Storage management¶

Use persistent storage for production
Monitor storage size and performance
Implement cleanup strategies for old data

Performance optimization¶

Use GPU acceleration when available
Cache frequently accessed vectors
Batch process large document collections

Troubleshooting¶

Memory not persisting¶

Check file permissions for storage directories
Verify storage paths are absolute and accessible
Ensure sufficient disk space

Poor retrieval quality¶

Experiment with different embedding models
Adjust similarity thresholds
Review chunking strategy for your content

Performance issues¶

Monitor memory usage and optimize chunk sizes
Use faster embedding models for real-time applications
Consider GPU acceleration for large collections

Import errors¶

Verify all memory dependencies are installed
Check that storage directories exist
Ensure proper module configuration in agent YAML