Agent Memory Providers Compared — Honcho, Mem0, Hindsight, and Five More

Eight pluggable backends for persistent agent memory.

Page content

Modern assistants still forget everything when you close the tab unless something persists beyond the context window. Agent memory providers are services or libraries that hold facts and summaries across sessions — often wired in as plugins so the framework stays thin while memory scales.

This guide compares eight backends that ship as Hermes Agent external memory plugins — Honcho, OpenViking, Mem0, Hindsight, Holographic, RetainDB, ByteRover, and Supermemory — and explains how they fit into broader AI systems stacks. The same vendors appear in OpenClaw and other agent tooling via community or official integrations. The AI Systems Memory hub lists this article alongside Cognee and related guides.

For Hermes-specific bounded core memory (MEMORY.md and USER.md), freezing behaviour, and triggers, see Hermes Agent Memory System.


Hermes Agent lists eight external memory provider plugins for persistent, cross-session knowledge. Only one external provider can be active at a time. Built-in MEMORY.md and USER.md stay loaded alongside it — additive, not replacement.

External dependencies. Every external provider except Holographic requires at least one external service call — an LLM for memory extraction, an embedding model for semantic search, or a database like PostgreSQL for storage. These dependencies have direct implications for privacy, cost, and whether your memory stack can run fully self-hosted. Hindsight and ByteRover bundle or eliminate the most dependencies; Honcho, Mem0, and Supermemory require the most moving parts. Where a provider supports Ollama or any OpenAI-compatible endpoint, you can route LLM and embedding calls to a local model and keep data off third-party servers entirely.

Activation with Hermes Agent

hermes memory setup   # Interactive picker + configuration
hermes memory status  # Check what's active
hermes memory off     # Disable external provider

Or manually in ~/.hermes/config.yaml:

memory:
  provider: openviking  # or honcho, mem0, hindsight, holographic, retaindb, byterover, supermemory

Provider Comparison

Provider Storage Cost External Dependencies Self-hostable Unique Feature
Honcho Cloud/Self-hosted Paid/Free LLM + Embedding model + PostgreSQL/pgvector + Redis Yes — Docker / K3s / Fly.io Dialectic user modeling + session-scoped context
OpenViking Self-hosted Free LLM (VLM) + Embedding model Yes — local server; Ollama-native init wizard Filesystem hierarchy + tiered loading
Mem0 Cloud/Self-hosted Paid/Free OSS LLM + Embedding model + Vector store (Qdrant or pgvector) Yes — Docker Compose OSS; fully local possible Server-side LLM extraction
Hindsight Cloud/Local Free/Paid LLM + bundled PostgreSQL + built-in embedder + built-in reranker Yes — Docker or embedded Python; fully local with Ollama Knowledge graph + reflect synthesis
Holographic Local Free None Native — no infra required HRR algebra + trust scoring
RetainDB Cloud $20/mo Cloud-managed (LLM + retrieval on RetainDB servers) No Delta compression
ByteRover Local/Cloud Free/Paid LLM only — no embedding model, no DB Yes — local-first by default; Ollama supported File-based context tree; no embedding pipeline
Supermemory Cloud Paid LLM + PostgreSQL/pgvector (enterprise Cloudflare deploy) Enterprise plan only Context fencing + session graph ingest

Detailed Breakdown

Honcho

Best for: multi-agent systems, cross-session context, user-agent alignment.

Honcho runs alongside existing memory — USER.md stays as-is, and Honcho adds an additional layer of context. It models conversations as peers exchanging messages — one user peer plus one AI peer per Hermes profile, all sharing a workspace.

External dependencies: Honcho requires an LLM for session summarisation, user-representation derivation, and dialectic reasoning; an embedding model for semantic search across observations; PostgreSQL with the pgvector extension for vector storage; and Redis for caching. The managed cloud at api.honcho.dev handles all of this for you. For self-hosted deployments (Docker, K3s, or Fly.io), you supply your own credentials. The LLM slot accepts any OpenAI-compatible endpoint, including Ollama and vLLM, so inference can stay on-premises. The embedding slot defaults to openai/text-embedding-3-small but supports configurable providers via LLM_EMBEDDING_API_KEY and LLM_EMBEDDING_BASE_URL — any OpenAI-compatible embedding server works, including local options like vLLM with a BGE model.

Tools: honcho_profile (read/update peer card), honcho_search (semantic search), honcho_context (session context — summary, representation, card, messages), honcho_reasoning (LLM-synthesized), honcho_conclude (create/delete conclusions).

Key config knobs:

  • contextCadence (default 1): Minimum turns between base layer refresh
  • dialecticCadence (default 2): Minimum turns between peer.chat() LLM calls (1-5 recommended)
  • dialecticDepth (default 1): .chat() passes per invocation (clamped 1-3)
  • recallMode (default ‘hybrid’): hybrid (auto+tools), context (inject only), tools (tools only)
  • writeFrequency (default ‘async’): Flush timing: async, turn, session, or integer N
  • observationMode (default ‘directional’): directional (all on) or unified (shared pool)

Architecture: Two-layer context injection — base layer (session summary + representation + peer card) + dialectic supplement (LLM reasoning). Automatically selects cold-start vs warm prompts.

Multi-peer mapping: Workspace is a shared environment across profiles. User peer (peerName) is a global human identity. AI peer (aiPeer) is one per Hermes profile (hermes default, hermes.<profile> for others).

Setup:

hermes memory setup  # select "honcho"
# or legacy: hermes honcho setup

Config: $HERMES_HOME/honcho.json (profile-local) or ~/.honcho/config.json (global).

Profile management:

hermes profile create coder --clone  # Creates hermes.coder with shared workspace
hermes honcho sync                   # Backfills AI peers for existing profiles

OpenViking

Best for: self-hosted knowledge management with structured browsing.

OpenViking provides a filesystem hierarchy with tiered loading. It’s free, self-hosted, and gives you full control over your memory storage.

External dependencies: OpenViking requires a VLM (vision-language model) for semantic processing and memory extraction, and an embedding model for vector search — both are mandatory. Supported VLM providers include OpenAI, Anthropic, DeepSeek, Gemini, Moonshot, and vLLM (for local deployment). For embeddings, supported providers include OpenAI, Volcengine (Doubao), Jina, Voyage, and — via Ollama — any locally served embedding model. The openviking-server init interactive wizard can detect available RAM and recommend suitable Ollama models (e.g. Qwen3-Embedding 8B for embeddings, Gemma 4 27B for VLM) and configure everything automatically for a fully local, zero-API-key setup. No external database is required; OpenViking stores memory in the filesystem.

Tools: viking_search, viking_read (tiered), viking_browse, viking_remember, viking_add_resource.

Setup:

pip install openviking
openviking-server init   # interactive wizard (recommends Ollama models for local setup)
openviking-server
hermes memory setup  # select "openviking"
echo "OPENVIKING_ENDPOINT=http://localhost:1933" >> ~/.hermes/.env

Mem0

Best for: hands-off memory management with auto extraction.

Mem0 handles memory extraction server-side via an LLM call on every add operation — it reads the conversation, extracts discrete facts, deduplicates, and stores them. The managed cloud API handles all infrastructure. The open-source library and self-hosted server give you full control.

External dependencies: Mem0 requires an LLM for memory extraction (default: OpenAI gpt-4.1-nano; 20 providers supported, including Ollama, vLLM, and LM Studio for local models) and an embedding model for retrieval (default: OpenAI text-embedding-3-small; 10 providers supported, including Ollama and HuggingFace for local models). Storage uses Qdrant at /tmp/qdrant in library mode, or PostgreSQL with pgvector in self-hosted server mode — both can run locally. A fully local, zero-cloud Mem0 stack is achievable: Ollama for LLM, Ollama for embeddings, and a local Qdrant instance, all configured via Memory.from_config.

Tools: mem0_profile, mem0_search, mem0_conclude.

Setup:

pip install mem0ai
hermes memory setup  # select "mem0"
echo "MEM0_API_KEY=your-key" >> ~/.hermes/.env

Config: $HERMES_HOME/mem0.json (user_id: hermes-user, agent_id: hermes).

Hindsight

Best for: knowledge graph-based recall with entity relationships.

Hindsight builds a knowledge graph of your memory, extracting entities and relationships. Its unique reflect tool performs cross-memory synthesis — combining multiple memories into new insights. Recall runs four retrieval strategies in parallel (semantic, keyword/BM25, graph traversal, temporal), then merges and re-orders results using reciprocal rank fusion.

External dependencies: Hindsight requires an LLM for fact and entity extraction on retain calls, and for synthesis on reflect calls (default: OpenAI; supported providers include Anthropic, Gemini, Groq, Ollama, LM Studio, and any OpenAI-compatible endpoint). The embedding model and cross-encoder reranking model are bundled inside Hindsight itself — they run locally within the hindsight-all package and require no external API. PostgreSQL is also bundled with the embedded Python installation via a managed pg0 data directory; you can alternatively point Hindsight at an external PostgreSQL instance. For a fully local, zero-cloud setup, set HINDSIGHT_API_LLM_PROVIDER=ollama and point it at a local Ollama model — retain and recall work fully; reflect requires a tool-calling-capable model (e.g. qwen3:8b).

Tools: hindsight_retain, hindsight_recall, hindsight_reflect (unique cross-memory synthesis).

Setup:

hermes memory setup  # select "hindsight"
echo "HINDSIGHT_API_KEY=your-key" >> ~/.hermes/.env

Auto-installs hindsight-client (cloud) or hindsight-all (local). Requires >= 0.4.22.

Config: $HERMES_HOME/hindsight/config.json

  • mode: cloud or local
  • recall_budget: low / mid / high
  • memory_mode: hybrid / context / tools
  • auto_retain / auto_recall: true (default)

Local UI: hindsight-embed -p hermes ui start

Holographic

Best for: privacy-focused setups with local-only storage.

Holographic uses HRR (Holographic Reduced Representation) algebra for memory encoding, with trust scoring for memory reliability. No cloud dependency — everything runs locally on your own hardware.

External dependencies: None. Holographic requires no LLM, no embedding model, no database, and no network connection. Memory encoding is done entirely through HRR algebra running in-process. This makes it unique among all eight providers — it is the only one that operates with zero external calls. The trade-off is that retrieval quality is lower than embedding-based semantic search, and there is no cross-memory synthesis like Hindsight’s reflect. For users where privacy and zero-dependency operation are non-negotiable, Holographic is the only option that delivers that unconditionally.

Tools: 2 tools for memory operations via HRR algebra.

Setup:

hermes memory setup  # select "holographic"

RetainDB

Best for: high-frequency updates with delta compression.

RetainDB uses delta compression to efficiently store memory updates and hybrid retrieval (vector + BM25 + reranking) to surface relevant context. It’s cloud-based with a $20/month cost, with all memory processing handled server-side.

External dependencies: RetainDB’s LLM calls, embedding pipeline, and reranking all run on RetainDB’s own cloud infrastructure — you supply only a RETAINDB_KEY. Memory extraction uses Claude Sonnet server-side. There is no self-hosting option and no local mode. All conversation data is sent to RetainDB servers for processing and storage. If data sovereignty or offline operation matters for your use case, this provider is not suitable.

Tools: retaindb_profile (user profile), retaindb_search (semantic search), retaindb_context (task-relevant context), retaindb_remember (store with type + importance), retaindb_forget (delete memories).

Setup:

hermes memory setup  # select "retaindb"

ByteRover

Best for: local-first memory with human-readable, auditable storage.

ByteRover stores memory as a structured markdown context tree — a hierarchy of domain, topic, and subtopic files — rather than embedding vectors or a database. An LLM reads source content, reasons about it, and places extracted knowledge into the right location in the hierarchy. Retrieval is MiniSearch full-text search with tiered fallback to LLM-powered search, with no vector database required.

External dependencies: ByteRover requires an LLM for memory curation and search (18 providers supported, including Anthropic, OpenAI, Google, Ollama, and any OpenAI-compatible endpoint via the openai-compatible provider slot). It requires no embedding model and no database — the context tree is a local directory of plain markdown files. Cloud sync is optional and used only for team collaboration; everything works fully offline by default. For a fully self-contained local setup, connect Ollama as the provider (brv providers connect openai-compatible --base-url http://localhost:11434/v1) and no data leaves your machine.

Tools: 3 tools for memory operations.

Setup:

hermes memory setup  # select "byterover"

Supermemory

Best for: enterprise workflows with context fencing and session graph ingest.

Supermemory provides context fencing (isolating memory by context) and session graph ingest (importing entire conversation histories). It automatically extracts memories, builds user profiles, and runs hybrid retrieval combining semantic and keyword search. The managed cloud API is the primary deployment target.

External dependencies: Supermemory’s cloud service handles all LLM inference and embedding server-side — you supply only a Supermemory API key. Self-hosting is available exclusively as an enterprise plan add-on and deploys to Cloudflare Workers; it requires you to provide PostgreSQL with the pgvector extension (for vector storage) and an OpenAI API key (mandatory, with Anthropic and Gemini as optional additions). There is no Docker-based or local self-hosting path — the architecture is tightly coupled to Cloudflare Workers edge compute. For users who need full data sovereignty without an enterprise contract, this provider is not the right choice.

Tools: 4 tools for memory operations.

Setup:

hermes memory setup  # select "supermemory"

How to Choose

  • Need multi-agent support? Honcho
  • Want self-hosted and free? OpenViking or Holographic
  • Want zero-config? Mem0
  • Want knowledge graphs? Hindsight
  • Want delta compression? RetainDB
  • Want bandwidth efficiency? ByteRover
  • Want enterprise features? Supermemory
  • Want privacy (local only)? Holographic
  • Want fully local with zero external services? Holographic (no dependencies at all) or Hindsight/Mem0/ByteRover with Ollama
  • Want human-readable, auditable memory with no embedding pipeline? ByteRover

For full profile-by-profile provider configurations and real-world workflow patterns, see Hermes Agent production setup.


Subscribe

Get new posts on AI systems, Infrastructure, and AI engineering.