Mnemo MCP Server

mcp-name: io.github.n24q02m/mnemo-mcp

Persistent AI memory with hybrid search and embedded sync. Open, free, unlimited.

Features

Hybrid search: FTS5 full-text + sqlite-vec semantic + reranking for precision
Reranking: Dual-backend — Jina/Cohere cloud or Qwen3 local cross-encoder
Knowledge graph: Automatic entity extraction and relation tracking across memories
Importance scoring: LLM-scored 0.0-1.0 per memory for smarter retrieval
Auto-archive: Configurable age + importance threshold to keep memory clean
STM-to-LTM consolidation: LLM summarization of related memories in a category
Duplicate detection: Warns before adding semantically similar memories
Configurable temporal decay: Tune recency bias via RECENCY_HALF_LIFE_DAYS
Zero config mode: Works out of the box — local embedding + reranking, no API keys needed
Auto-detect providers: Set API_KEYS for cloud embedding/reranking, auto-fallback to local
Embedded sync: rclone auto-downloaded and managed as subprocess
Multi-machine: JSONL-based merge sync via rclone (Google Drive, S3, etc.)
Proactive memory: Tool descriptions guide AI to save preferences, decisions, facts

Quick Start

The recommended way to run this server is via uvx:

uvx mnemo-mcp@latest

Alternatively, you can use pipx run mnemo-mcp.

Option 1: uvx (Recommended)

{
  "mcpServers": {
    "mnemo": {
      "command": "uvx",
      "args": ["mnemo-mcp@latest"],
      "env": {
        // -- optional: LiteLLM Proxy (production, selfhosted gateway)
        // "LITELLM_PROXY_URL": "http://10.0.0.20:4000",
        // "LITELLM_PROXY_KEY": "sk-your-virtual-key",
        // -- optional: cloud embedding + reranking (Jina > Gemini > OpenAI > Cohere)
        // -- without this, uses built-in local Qwen3 ONNX models (CPU)
        // -- first run downloads ~570MB model per backend, cached for subsequent runs
        // -- Jina AI recommended: single key for both embedding and reranking
        "API_KEYS": "JINA_AI_API_KEY:jina_...",
        // -- optional: sync memories across machines via rclone
        // -- on first sync, a browser opens for OAuth (auto, no manual setup)
        "SYNC_ENABLED": "true",                    // optional, default: false
        "SYNC_INTERVAL": "300"                     // optional, auto-sync every 5min (0 = manual only)
        // "SYNC_REMOTE": "gdrive",                 // optional, default: gdrive
        // "SYNC_PROVIDER": "drive",                // optional, default: drive (Google Drive)
      }
    }
  }
}

Option 2: Docker

{
  "mcpServers": {
    "mnemo": {
      "command": "docker",
      "args": [
        "run", "-i", "--rm",
        "--name", "mcp-mnemo",
        "-v", "mnemo-data:/data",                  // persists memories across restarts
        "-e", "LITELLM_PROXY_URL",                 // optional: pass-through from env below
        "-e", "LITELLM_PROXY_KEY",                 // optional: pass-through from env below
        "-e", "API_KEYS",                          // optional: pass-through from env below
        "-e", "SYNC_ENABLED",                      // optional: pass-through from env below
        "-e", "SYNC_INTERVAL",                     // optional: pass-through from env below
        "n24q02m/mnemo-mcp:latest"
      ],
      "env": {
        // -- optional: LiteLLM Proxy (production, selfhosted gateway)
        // "LITELLM_PROXY_URL": "http://10.0.0.20:4000",
        // "LITELLM_PROXY_KEY": "sk-your-virtual-key",
        // -- optional: cloud embedding + reranking (Jina > Gemini > OpenAI > Cohere)
        // -- without this, uses built-in local Qwen3 ONNX models (CPU)
        "API_KEYS": "JINA_AI_API_KEY:jina_...",
        // -- optional: sync memories across machines via rclone
        "SYNC_ENABLED": "true",                    // optional, default: false
        "SYNC_INTERVAL": "300"                     // optional, auto-sync every 5min (0 = manual only)
      }
    }
  }
}

Pre-install (optional)

Pre-download dependencies before adding to your MCP client config. This avoids slow first-run startup:

# Pre-download embedding model (~570MB) and validate API keys
uvx mnemo-mcp warmup

# With cloud embedding (validates API key, skips local download if cloud works)
API_KEYS="JINA_AI_API_KEY:jina_..." uvx mnemo-mcp warmup

Sync setup

Sync is fully automatic. Just set SYNC_ENABLED=true and the server handles everything:

First sync: rclone is auto-downloaded, a browser opens for OAuth authentication
Token saved: OAuth token is stored locally at ~/.mnemo-mcp/tokens/ (600 permissions)
Subsequent runs: Token is loaded automatically — no manual steps needed

For non-Google Drive providers, set SYNC_PROVIDER and SYNC_REMOTE:

{
  "SYNC_ENABLED": "true",
  "SYNC_PROVIDER": "dropbox",        // rclone provider type
  "SYNC_REMOTE": "dropbox"           // rclone remote name
}

Advanced: You can also run uvx mnemo-mcp setup-sync drive to pre-authenticate before first use, but this is optional.

Configuration

Variable	Default	Description
`DB_PATH`	`~/.mnemo-mcp/memories.db`	Database location
`LITELLM_PROXY_URL`	—	LiteLLM Proxy URL (e.g. `http://10.0.0.20:4000`). Enables proxy mode
`LITELLM_PROXY_KEY`	—	LiteLLM Proxy virtual key (e.g. `sk-...`)
`API_KEYS`	—	API keys (`ENV:key,ENV:key`). Enables cloud embedding + reranking (SDK mode)
`EMBEDDING_BACKEND`	(auto-detect)	`litellm` (cloud API) or `local` (Qwen3). Auto: API_KEYS -> litellm, else local
`EMBEDDING_MODEL`	auto-detect	LiteLLM model name (optional)
`EMBEDDING_DIMS`	`0` (auto=768)	Embedding dimensions (0 = auto-detect, default 768)
`RERANK_ENABLED`	`true`	Enable reranking (improves search precision)
`RERANK_BACKEND`	(auto-detect)	`litellm` (cloud) or `local` (Qwen3). Auto: API_KEYS -> litellm, else local
`RERANK_MODEL`	auto-detect	LiteLLM reranker model name (optional)
`ARCHIVE_ENABLED`	`true`	Enable auto-archiving of old low-importance memories
`ARCHIVE_AFTER_DAYS`	`90`	Days before a memory is eligible for auto-archive
`ARCHIVE_IMPORTANCE_THRESHOLD`	`0.3`	Memories below this importance score are auto-archived
`DEDUP_THRESHOLD`	`0.9`	Similarity threshold to block duplicate memories
`DEDUP_WARN_THRESHOLD`	`0.7`	Similarity threshold to warn about similar memories
`RECENCY_HALF_LIFE_DAYS`	`7`	Half-life for temporal decay in search scoring
`LLM_MODELS`	`gemini/gemini-3-flash-preview`	LLM model for graph extraction, importance scoring, consolidation
`SYNC_ENABLED`	`false`	Enable rclone sync
`SYNC_PROVIDER`	`drive`	rclone provider type (drive, dropbox, s3, etc.)
`SYNC_REMOTE`	`gdrive`	rclone remote name
`SYNC_FOLDER`	`mnemo-mcp`	Remote folder
`SYNC_INTERVAL`	`300`	Auto-sync seconds (0=manual)
`LOG_LEVEL`	`INFO`	Log level

Embedding & Reranking (2-Mode Architecture)

Embedding and reranking are always available — local models are built-in and require no configuration.

Both embedding and reranking support 2 modes, resolved by priority:

Priority	Mode	Config	Use case
1	Proxy / SDK	`LITELLM_PROXY_URL` + `LITELLM_PROXY_KEY` or `API_KEYS`	Production or dev with cloud APIs
2	Local	Nothing needed	Offline, always available as fallback

No cross-mode fallback — if proxy is configured but unreachable, calls fail (no silent fallback to direct API).

Local mode: Qwen3-Embedding-0.6B + Qwen3-Reranker-0.6B, always available with zero config.
GPU auto-detection: If GPU is available (CUDA/DirectML) and llama-cpp-python is installed, automatically uses GGUF models instead of ONNX for better performance.
All embeddings stored at 768 dims (default). Switching providers never breaks the vector table.
Override with EMBEDDING_BACKEND=local or RERANK_BACKEND=local to force local even with API keys.

API_KEYS supports multiple providers in a single string:

API_KEYS=JINA_AI_API_KEY:jina_...,GOOGLE_API_KEY:AIza...,OPENAI_API_KEY:sk-...,COHERE_API_KEY:co-...

Jina AI is the recommended provider — a single JINA_AI_API_KEY enables both embedding (jina-embeddings-v5-text-small) and reranking (jina-reranker-v3), giving you the best search quality with one key.

Cloud embedding providers (auto-detected from API_KEYS, priority order):

Priority	Env Var	Model	Native Dims	Stored
1	`JINA_AI_API_KEY`	`jina-embeddings-v5-text-small`	1024	768
2	`GEMINI_API_KEY`	`gemini/gemini-embedding-001`	3072	768
3	`OPENAI_API_KEY`	`text-embedding-3-large`	3072	768
4	`COHERE_API_KEY`	`embed-multilingual-v3.0`	1024	768

Cloud reranking providers (auto-detected from API_KEYS, priority order):

Priority	Env Var	Model
1	`JINA_AI_API_KEY`	`jina-reranker-v3`
2	`COHERE_API_KEY`	`rerank-multilingual-v3.0`

All embeddings are truncated to 768 dims (default) for storage. This ensures switching models never breaks the vector table. Override with EMBEDDING_DIMS if needed.

API_KEYS format maps your env var to LiteLLM's expected var (e.g., GOOGLE_API_KEY:key auto-sets GEMINI_API_KEY). Set EMBEDDING_MODEL or RERANK_MODEL explicitly for other providers.

MCP Tools

`memory` — Core memory operations

Action	Required	Optional
`add`	`content`	`category`, `tags`
`search`	`query`	`category`, `tags`, `limit`
`list`	—	`category`, `limit`
`update`	`memory_id`	`content`, `category`, `tags`
`delete`	`memory_id`	—
`export`	—	—
`import`	`data` (JSONL)	`mode` (merge/replace)
`stats`	—	—
`restore`	`memory_id`	—
`archived`	—	`limit`
`consolidate`	`category`	—

`config` — Server configuration

Action	Required	Optional
`status`	—	—
`sync`	—	—
`set`	`key`, `value`	—

`help` — Full documentation

help(topic="memory")  # or "config"

MCP Resources

URI	Description
`mnemo://stats`	Database statistics and server status
`mnemo://recent`	10 most recently updated memories

MCP Prompts

Prompt	Parameters	Description
`save_summary`	`summary`	Generate prompt to save a conversation summary as memory
`recall_context`	`topic`	Generate prompt to recall relevant memories about a topic

Architecture

                  MCP Client (Claude, Cursor, etc.)
                         |
                    FastMCP Server
                   /      |       \
             memory    config    help
                |         |        |
            MemoryDB   Settings  docs/
           /   |   \
       FTS5  sqlite-vec  KnowledgeGraph
                |              |
          EmbeddingBackend   LLM (entity extraction,
          /            \      importance, consolidation)
     LiteLLM        Qwen3 ONNX
        |           (local CPU)      RerankerBackend
  Jina / Gemini /                    /            \
  OpenAI / Cohere               LiteLLM      Qwen3 ONNX
                              Jina/Cohere    (local CPU)

        Sync: rclone (embedded) -> Google Drive / S3 / ...

Development

# Install
uv sync

# Run
uv run mnemo-mcp

# Lint
uv run ruff check src/
uv run ty check src/

# Test
uv run pytest

Compatible With

Also by n24q02m

Server	Description	Install
better-notion-mcp	Notion API for AI agents	`npx -y @n24q02m/better-notion-mcp@latest`
wet-mcp	Web search, content extraction, library docs	`uvx --python 3.13 wet-mcp@latest`
better-email-mcp	Email (IMAP/SMTP) for AI agents	`npx -y @n24q02m/better-email-mcp@latest`
better-godot-mcp	Godot Engine for AI agents	`npx -y @n24q02m/better-godot-mcp@latest`
better-telegram-mcp	Telegram Bot API + MTProto for AI agents	`uvx --python 3.13 better-telegram-mcp@latest`

Related Projects

modalcom-ai-workers — GPU-accelerated AI workers on Modal.com (embedding, reranking)
qwen3-embed — Local embedding/reranking library used by mnemo-mcp

Contributing

See CONTRIBUTING.md

License

MIT - See LICENSE

Name		Name	Last commit message	Last commit date
Latest commit History 241 Commits
.github		.github
docs/superpowers/specs		docs/superpowers/specs
scripts		scripts
src/mnemo_mcp		src/mnemo_mcp
tests		tests
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.infisical.json		.infisical.json
.mise.toml		.mise.toml
.pr_agent.toml		.pr_agent.toml
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
glama.json		glama.json
pyproject.toml		pyproject.toml
renovate.json		renovate.json
server.json		server.json
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mnemo MCP Server

Features

Quick Start

Option 1: uvx (Recommended)

Option 2: Docker

Pre-install (optional)

Sync setup

Configuration

Embedding & Reranking (2-Mode Architecture)

MCP Tools

`memory` — Core memory operations

`config` — Server configuration

`help` — Full documentation

MCP Resources

MCP Prompts

Architecture

Development

Compatible With

Also by n24q02m

Related Projects

Contributing

License

About

Uh oh!

Releases 50

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Mnemo MCP Server

Features

Quick Start

Option 1: uvx (Recommended)

Option 2: Docker

Pre-install (optional)

Sync setup

Configuration

Embedding & Reranking (2-Mode Architecture)

MCP Tools

memory — Core memory operations

config — Server configuration

help — Full documentation

MCP Resources

MCP Prompts

Architecture

Development

Compatible With

Also by n24q02m

Related Projects

Contributing

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 50

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

`memory` — Core memory operations

`config` — Server configuration

`help` — Full documentation

Packages