Reddit r/LocalLLaMA Post¶
Title: Open-source graph-based memory for AI coding agents — spreading activation instead of RAG (no API keys needed)
Body:
I've been working on an alternative approach to giving AI agents persistent memory. Instead of the usual RAG pipeline (embed → vector search → return chunks), I built a system that stores memories as a neural graph and retrieves them through spreading activation.
The problem with RAG for agent memory¶
RAG treats memory as a search problem: "find text similar to this query." It works, but it loses structure. When you ask "why did the outage happen?", RAG returns "JWT caused the outage" — but not why you chose JWT, who suggested it, or what it replaced.
Spreading activation approach¶
Neural Memory stores everything as typed neurons connected by typed synapses (CAUSED_BY, LEADS_TO, SUGGESTED_BY, RESOLVED_BY, etc.). Recall works by:
- Activate seed neurons matching your query
- Activation spreads through synapses (weighted, with decay)
- Most-activated neurons form the response context
This gives you multi-hop reasoning for free. "Why did the outage happen?" traces: outage ← CAUSED_BY ← JWT ← SUGGESTED_BY ← Alice ← DECIDED_AT ← Tuesday meeting.
No embedding API calls needed. Core recall is pure graph traversal with O(n) complexity on the local subgraph. Embeddings are optional (supports Ollama, sentence-transformers, Gemini free tier, OpenAI).
Technical details¶
- Storage: SQLite via aiosqlite (async), FTS5 for text search
- Graph: 14 neuron types, 24 synapse types, fiber bundles for episodic grouping
- Retrieval: Spreading activation with configurable decay, threshold, and max hops
- Consolidation: Memory lifecycle — decay, reinforcement, pruning of orphan nodes
- Extraction: Entity/keyword/temporal extraction, Vietnamese NLP support
- MCP server: 55 tools (incl. cognitive reasoning), stdio + HTTP transport, works with Claude Code, Cursor, etc.
- Tests: 3,976+, 67% coverage, CI with mypy + ruff + pytest
What makes it interesting for local LLM users¶
- Zero API cost: core recall doesn't need embeddings or LLM calls
- Ollama integration: if you want embeddings, use your local Ollama instance
- Works with any MCP client: not locked to Claude — any agent that speaks MCP
- Light: single SQLite file, ~23MB for 1000+ memories with full graph
Install¶
Config for Ollama embeddings:
# ~/.neuralmemory/config.toml
[embedding]
enabled = true
provider = "ollama"
model = "nomic-embed-text"
GitHub: https://github.com/nhadaututtheky/neural-memory Architecture docs: https://nhadaututtheky.github.io/neural-memory/
The codebase is MIT licensed. Contributions welcome — especially around alternative graph backends (Neo4j, FalkorDB adapters exist but are early).