What is an agent memory provider?

An agent memory provider stores and retrieves facts across sessions for AI assistants. Agent frameworks expose them as plugins so preferences, project facts, and distilled conversation insights persist beyond a single chat thread.

Which frameworks support these eight providers?

Hermes Agent integrates all eight via hermes memory setup. OpenClaw and related ecosystems ship plugins for several of the same backends including Honcho and Supermemory. Other stacks may integrate via HTTP APIs or MCP depending on the vendor.

Which provider runs with zero LLM or embedding calls?

Holographic uses holographic reduced representation algebra only — no LLM, no embeddings, no database, and no network calls. Every other provider in this comparison depends on at least one LLM call or remote service for extraction or search.

Can external memory run fully on-premises?

Several providers support fully local stacks using Ollama or OpenAI-compatible endpoints for inference plus local PostgreSQL, Qdrant, or bundled databases when applicable. RetainDB and the default Supermemory cloud services do not offer local-only deployment with comparable guarantees.

Agent Memory Providers Compared — Honcho, Mem0, Hindsight, and Five More

Eight pluggable backends for persistent agent memory.

Page content

Modern assistants still forget everything when you close the tab unless something persists beyond the context window. Agent memory providers are services or libraries that hold facts and summaries across sessions — often wired in as plugins so the framework stays thin while memory scales.

This guide compares eight backends that ship as Hermes Agent external memory plugins — Honcho, OpenViking, Mem0, Hindsight, Holographic, RetainDB, ByteRover, and Supermemory — and explains how they fit into broader AI systems stacks. The same vendors appear in OpenClaw and other agent tooling via community or official integrations. The AI Systems Memory hub lists this article alongside Cognee and related guides.

For Hermes-specific bounded core memory (MEMORY.md and USER.md), freezing behaviour, and triggers, see Hermes Agent Memory System.

Hermes Agent lists eight external memory provider plugins for persistent, cross-session knowledge. Only one external provider can be active at a time. Built-in MEMORY.md and USER.md stay loaded alongside it — additive, not replacement.

External dependencies. Every external provider except Holographic requires at least one external service call — an LLM for memory extraction, an embedding model for semantic search, or a database like PostgreSQL for storage. These dependencies have direct implications for privacy, cost, and whether your memory stack can run fully self-hosted. Hindsight and ByteRover bundle or eliminate the most dependencies; Honcho, Mem0, and Supermemory require the most moving parts. Where a provider supports Ollama or any OpenAI-compatible endpoint, you can route LLM and embedding calls to a local model and keep data off third-party servers entirely.

Activation with Hermes Agent

hermes memory setup   # Interactive picker + configuration
hermes memory status  # Check what's active
hermes memory off     # Disable external provider

Or manually in ~/.hermes/config.yaml:

memory:
  provider: openviking  # or honcho, mem0, hindsight, holographic, retaindb, byterover, supermemory

Provider Comparison

Provider	Storage	Cost	External Dependencies	Self-hostable	Unique Feature
Honcho	Cloud/Self-hosted	Paid/Free	LLM + Embedding model + PostgreSQL/pgvector + Redis	Yes — Docker / K3s / Fly.io	Dialectic user modeling + session-scoped context
OpenViking	Self-hosted	Free	LLM (VLM) + Embedding model	Yes — local server; Ollama-native init wizard	Filesystem hierarchy + tiered loading
Mem0	Cloud/Self-hosted	Paid/Free OSS	LLM + Embedding model + Vector store (Qdrant or pgvector)	Yes — Docker Compose OSS; fully local possible	Server-side LLM extraction
Hindsight	Cloud/Local	Free/Paid	LLM + bundled PostgreSQL + built-in embedder + built-in reranker	Yes — Docker or embedded Python; fully local with Ollama	Knowledge graph + `reflect` synthesis
Holographic	Local	Free	None	Native — no infra required	HRR algebra + trust scoring
RetainDB	Cloud	$20/mo	Cloud-managed (LLM + retrieval on RetainDB servers)	No	Delta compression
ByteRover	Local/Cloud	Free/Paid	LLM only — no embedding model, no DB	Yes — local-first by default; Ollama supported	File-based context tree; no embedding pipeline
Supermemory	Cloud	Paid	LLM + PostgreSQL/pgvector (enterprise Cloudflare deploy)	Enterprise plan only	Context fencing + session graph ingest

Detailed Breakdown

Honcho

Best for: multi-agent systems, cross-session context, user-agent alignment.

Honcho runs alongside existing memory — USER.md stays as-is, and Honcho adds an additional layer of context. It models conversations as peers exchanging messages — one user peer plus one AI peer per Hermes profile, all sharing a workspace.

External dependencies: Honcho requires an LLM for session summarisation, user-representation derivation, and dialectic reasoning; an embedding model for semantic search across observations; PostgreSQL with the pgvector extension for vector storage; and Redis for caching. The managed cloud at api.honcho.dev handles all of this for you. For self-hosted deployments (Docker, K3s, or Fly.io), you supply your own credentials. The LLM slot accepts any OpenAI-compatible endpoint, including Ollama and vLLM, so inference can stay on-premises. The embedding slot defaults to openai/text-embedding-3-small but supports configurable providers via LLM_EMBEDDING_API_KEY and LLM_EMBEDDING_BASE_URL — any OpenAI-compatible embedding server works, including local options like vLLM with a BGE model.

Tools: honcho_profile (read/update peer card), honcho_search (semantic search), honcho_context (session context — summary, representation, card, messages), honcho_reasoning (LLM-synthesized), honcho_conclude (create/delete conclusions).

Key config knobs:

contextCadence (default 1): Minimum turns between base layer refresh
dialecticCadence (default 2): Minimum turns between peer.chat() LLM calls (1-5 recommended)
dialecticDepth (default 1): .chat() passes per invocation (clamped 1-3)
recallMode (default ‘hybrid’): hybrid (auto+tools), context (inject only), tools (tools only)
writeFrequency (default ‘async’): Flush timing: async, turn, session, or integer N
observationMode (default ‘directional’): directional (all on) or unified (shared pool)

Architecture: Two-layer context injection — base layer (session summary + representation + peer card) + dialectic supplement (LLM reasoning). Automatically selects cold-start vs warm prompts.

Multi-peer mapping: Workspace is a shared environment across profiles. User peer (peerName) is a global human identity. AI peer (aiPeer) is one per Hermes profile (hermes default, hermes.<profile> for others).

Setup:

hermes memory setup  # select "honcho"
# or legacy: hermes honcho setup

Config: $HERMES_HOME/honcho.json (profile-local) or ~/.honcho/config.json (global).

Profile management:

hermes profile create coder --clone  # Creates hermes.coder with shared workspace
hermes honcho sync                   # Backfills AI peers for existing profiles

OpenViking

Best for: self-hosted knowledge management with structured browsing.

OpenViking provides a filesystem hierarchy with tiered loading. It’s free, self-hosted, and gives you full control over your memory storage.

External dependencies: OpenViking requires a VLM (vision-language model) for semantic processing and memory extraction, and an embedding model for vector search — both are mandatory. Supported VLM providers include OpenAI, Anthropic, DeepSeek, Gemini, Moonshot, and vLLM (for local deployment). For embeddings, supported providers include OpenAI, Volcengine (Doubao), Jina, Voyage, and — via Ollama — any locally served embedding model. The openviking-server init interactive wizard can detect available RAM and recommend suitable Ollama models (e.g. Qwen3-Embedding 8B for embeddings, Gemma 4 27B for VLM) and configure everything automatically for a fully local, zero-API-key setup. No external database is required; OpenViking stores memory in the filesystem.

Tools: viking_search, viking_read (tiered), viking_browse, viking_remember, viking_add_resource.

Setup:

pip install openviking
openviking-server init   # interactive wizard (recommends Ollama models for local setup)
openviking-server
hermes memory setup  # select "openviking"
echo "OPENVIKING_ENDPOINT=http://localhost:1933" >> ~/.hermes/.env

Mem0

Best for: hands-off memory management with auto extraction.

Mem0 handles memory extraction server-side via an LLM call on every add operation — it reads the conversation, extracts discrete facts, deduplicates, and stores them. The managed cloud API handles all infrastructure. The open-source library and self-hosted server give you full control.

External dependencies: Mem0 requires an LLM for memory extraction (default: OpenAI gpt-4.1-nano; 20 providers supported, including Ollama, vLLM, and LM Studio for local models) and an embedding model for retrieval (default: OpenAI text-embedding-3-small; 10 providers supported, including Ollama and HuggingFace for local models). Storage uses Qdrant at /tmp/qdrant in library mode, or PostgreSQL with pgvector in self-hosted server mode — both can run locally. A fully local, zero-cloud Mem0 stack is achievable: Ollama for LLM, Ollama for embeddings, and a local Qdrant instance, all configured via Memory.from_config.

Tools: mem0_profile, mem0_search, mem0_conclude.

Setup:

pip install mem0ai
hermes memory setup  # select "mem0"
echo "MEM0_API_KEY=your-key" >> ~/.hermes/.env

Config: $HERMES_HOME/mem0.json (user_id: hermes-user, agent_id: hermes).

Hindsight

Best for: knowledge graph-based recall with entity relationships.

Hindsight builds a knowledge graph of your memory, extracting entities and relationships. Its unique reflect tool performs cross-memory synthesis — combining multiple memories into new insights. Recall runs four retrieval strategies in parallel (semantic, keyword/BM25, graph traversal, temporal), then merges and re-orders results using reciprocal rank fusion.

External dependencies: Hindsight requires an LLM for fact and entity extraction on retain calls, and for synthesis on reflect calls (default: OpenAI; supported providers include Anthropic, Gemini, Groq, Ollama, LM Studio, and any OpenAI-compatible endpoint). The embedding model and cross-encoder reranking model are bundled inside Hindsight itself — they run locally within the hindsight-all package and require no external API. PostgreSQL is also bundled with the embedded Python installation via a managed pg0 data directory; you can alternatively point Hindsight at an external PostgreSQL instance. For a fully local, zero-cloud setup, set HINDSIGHT_API_LLM_PROVIDER=ollama and point it at a local Ollama model — retain and recall work fully; reflect requires a tool-calling-capable model (e.g. qwen3:8b).

Tools: hindsight_retain, hindsight_recall, hindsight_reflect (unique cross-memory synthesis).

Setup:

hermes memory setup  # select "hindsight"
echo "HINDSIGHT_API_KEY=your-key" >> ~/.hermes/.env

Auto-installs hindsight-client (cloud) or hindsight-all (local). Requires >= 0.4.22.

Config: $HERMES_HOME/hindsight/config.json

mode: cloud or local
recall_budget: low / mid / high
memory_mode: hybrid / context / tools
auto_retain / auto_recall: true (default)

Local UI: hindsight-embed -p hermes ui start

Holographic

Best for: privacy-focused setups with local-only storage.

Holographic uses HRR (Holographic Reduced Representation) algebra for memory encoding, with trust scoring for memory reliability. No cloud dependency — everything runs locally on your own hardware.

External dependencies: None. Holographic requires no LLM, no embedding model, no database, and no network connection. Memory encoding is done entirely through HRR algebra running in-process. This makes it unique among all eight providers — it is the only one that operates with zero external calls. The trade-off is that retrieval quality is lower than embedding-based semantic search, and there is no cross-memory synthesis like Hindsight’s reflect. For users where privacy and zero-dependency operation are non-negotiable, Holographic is the only option that delivers that unconditionally.

Tools: 2 tools for memory operations via HRR algebra.

Setup:

hermes memory setup  # select "holographic"

RetainDB

Best for: high-frequency updates with delta compression.

RetainDB uses delta compression to efficiently store memory updates and hybrid retrieval (vector + BM25 + reranking) to surface relevant context. It’s cloud-based with a $20/month cost, with all memory processing handled server-side.

External dependencies: RetainDB’s LLM calls, embedding pipeline, and reranking all run on RetainDB’s own cloud infrastructure — you supply only a RETAINDB_KEY. Memory extraction uses Claude Sonnet server-side. There is no self-hosting option and no local mode. All conversation data is sent to RetainDB servers for processing and storage. If data sovereignty or offline operation matters for your use case, this provider is not suitable.

Tools: retaindb_profile (user profile), retaindb_search (semantic search), retaindb_context (task-relevant context), retaindb_remember (store with type + importance), retaindb_forget (delete memories).

Setup:

hermes memory setup  # select "retaindb"

ByteRover

Best for: local-first memory with human-readable, auditable storage.

ByteRover stores memory as a structured markdown context tree — a hierarchy of domain, topic, and subtopic files — rather than embedding vectors or a database. An LLM reads source content, reasons about it, and places extracted knowledge into the right location in the hierarchy. Retrieval is MiniSearch full-text search with tiered fallback to LLM-powered search, with no vector database required.

External dependencies: ByteRover requires an LLM for memory curation and search (18 providers supported, including Anthropic, OpenAI, Google, Ollama, and any OpenAI-compatible endpoint via the openai-compatible provider slot). It requires no embedding model and no database — the context tree is a local directory of plain markdown files. Cloud sync is optional and used only for team collaboration; everything works fully offline by default. For a fully self-contained local setup, connect Ollama as the provider (brv providers connect openai-compatible --base-url http://localhost:11434/v1) and no data leaves your machine.

Tools: 3 tools for memory operations.

Setup:

hermes memory setup  # select "byterover"

Supermemory

Best for: enterprise workflows with context fencing and session graph ingest.

Supermemory provides context fencing (isolating memory by context) and session graph ingest (importing entire conversation histories). It automatically extracts memories, builds user profiles, and runs hybrid retrieval combining semantic and keyword search. The managed cloud API is the primary deployment target.

External dependencies: Supermemory’s cloud service handles all LLM inference and embedding server-side — you supply only a Supermemory API key. Self-hosting is available exclusively as an enterprise plan add-on and deploys to Cloudflare Workers; it requires you to provide PostgreSQL with the pgvector extension (for vector storage) and an OpenAI API key (mandatory, with Anthropic and Gemini as optional additions). There is no Docker-based or local self-hosting path — the architecture is tightly coupled to Cloudflare Workers edge compute. For users who need full data sovereignty without an enterprise contract, this provider is not the right choice.

Tools: 4 tools for memory operations.

Setup:

hermes memory setup  # select "supermemory"

How to Choose

Need multi-agent support? Honcho
Want self-hosted and free? OpenViking or Holographic
Want zero-config? Mem0
Want knowledge graphs? Hindsight
Want delta compression? RetainDB
Want bandwidth efficiency? ByteRover
Want enterprise features? Supermemory
Want privacy (local only)? Holographic
Want fully local with zero external services? Holographic (no dependencies at all) or Hindsight/Mem0/ByteRover with Ollama
Want human-readable, auditable memory with no embedding pipeline? ByteRover

For full profile-by-profile provider configurations and real-world workflow patterns, see Hermes Agent production setup.

AI Systems Memory hub — scope of this subcluster and links to Cognee guides
Hermes Agent Memory System — core two-file memory before plugins
Hermes Agent production setup — profile wiring for providers in practice

Activation with Hermes Agent

Provider Comparison

Detailed Breakdown

Honcho

OpenViking

Mem0

Hindsight

Holographic

RetainDB

ByteRover

Supermemory

How to Choose

Related guides

Subscribe