Cognitive Reasoning Guide¶
Neural Memory's Cognitive Reasoning Layer lets agents form hypotheses, gather evidence, make predictions, and verify outcomes — a structured scientific reasoning loop built on top of the memory graph.
Quick Start¶
nmem_hypothesize → nmem_evidence → nmem_predict → nmem_verify
↑ |
└──────── nmem_schema (evolve) ←───────────────┘
The core loop:
- Hypothesize — Form a belief about something uncertain
- Evidence — Add supporting or contradicting evidence
- Predict — Make a falsifiable prediction based on a hypothesis
- Verify — Check if the prediction was correct or wrong
- Evolve — Update the hypothesis when your understanding changes
Supporting tools:
- nmem_cognitive — Dashboard: hot index + calibration score
- nmem_gaps — Track what you don't know
- nmem_explain — Trace connections between concepts
Tool Reference¶
nmem_hypothesize¶
Create, list, or inspect hypotheses.
Actions:
| Action | Parameters | Description |
|---|---|---|
create |
content, confidence (0.01-0.99, default 0.5), priority (0-10), tags |
Create a new hypothesis |
list |
status (active/confirmed/refuted/superseded), limit (max 100) |
List hypotheses |
get |
hypothesis_id |
Get full state + all evidence |
Example:
nmem_hypothesize(
action="create",
content="Redis session store is causing the 500ms latency spike on /api/users",
confidence=0.6,
tags=["performance", "redis", "api"]
)
# Returns: { hypothesis_id: "abc123", fiber_id: "...", neurons_created: 5 }
nmem_evidence¶
Add evidence for or against a hypothesis. Each piece of evidence is a real memory neuron linked via synapse.
| Parameter | Required | Description |
|---|---|---|
hypothesis_id |
Yes | Which hypothesis to update |
content |
Yes | The evidence text |
type |
Yes | "for" or "against" |
weight |
No | Strength 0.1-1.0 (default 0.5) |
priority, tags |
No | Standard memory metadata |
Returns: confidence_before, confidence_after, confidence_delta, evidence counts, and auto_resolved if threshold was hit.
Example:
nmem_evidence(
hypothesis_id="abc123",
content="Redis SLOWLOG shows 450ms KEYS command during spike window",
type="for",
weight=0.8
)
# Returns: { confidence_before: 0.6, confidence_after: 0.69, confidence_delta: +0.09, ... }
nmem_predict¶
Make a falsifiable prediction, optionally linked to a hypothesis.
| Action | Parameters | Description |
|---|---|---|
create |
content, confidence (default 0.7), deadline (ISO datetime), hypothesis_id |
Create prediction |
list |
status, limit |
List predictions with calibration stats |
get |
prediction_id |
Get prediction details |
Example:
nmem_predict(
action="create",
content="Replacing KEYS with SCAN will reduce p99 latency below 100ms",
confidence=0.8,
deadline="2026-03-15T00:00:00",
hypothesis_id="abc123"
)
nmem_verify¶
Verify a prediction outcome. Automatically propagates to linked hypothesis.
| Parameter | Required | Description |
|---|---|---|
prediction_id |
Yes | Which prediction to verify |
outcome |
Yes | "correct" or "wrong" |
content |
No | Observation/evidence text |
Propagation: If the prediction is linked to a hypothesis:
- correct → adds evidence_for (weight=0.6) to hypothesis
- wrong → adds evidence_against (weight=0.6) to hypothesis
Example:
nmem_verify(
prediction_id="pred456",
outcome="correct",
content="After SCAN migration, p99 dropped to 45ms. Confirmed via Grafana dashboard."
)
# Returns: { calibration_score: 0.75, propagated_to_hypothesis: { id: "abc123", new_confidence: 0.78 } }
nmem_schema¶
Evolve hypotheses when your understanding changes.
| Action | Parameters | Description |
|---|---|---|
evolve |
hypothesis_id, content, confidence, reason |
Create new version, supersede old |
history |
hypothesis_id |
Walk version chain |
compare |
hypothesis_id, other_id |
Side-by-side comparison |
Example:
nmem_schema(
action="evolve",
hypothesis_id="abc123",
content="Latency spike is caused by KEYS + connection pool exhaustion together, not KEYS alone",
reason="SCAN fix reduced latency but didn't eliminate spikes completely"
)
# Returns: { new_hypothesis_id: "def789", schema_version: 2, old_status: "superseded" }
nmem_cognitive¶
Dashboard view of your cognitive state.
| Action | Description |
|---|---|
summary |
Hot index (top 20 items by urgency) + calibration + top gaps |
refresh |
Recompute hot index from scratch (O(n), use sparingly) |
nmem_gaps¶
Track knowledge gaps — things you don't know.
| Action | Parameters | Description |
|---|---|---|
detect |
topic, source, priority, related_neuron_ids |
Register a gap |
list |
include_resolved, limit |
List unresolved gaps |
resolve |
gap_id, resolved_by_neuron_id |
Mark gap as resolved |
get |
gap_id |
Get gap details |
Detection sources (with default priority):
| Source | Priority | When to use |
|---|---|---|
contradicting_evidence |
0.8 | Two pieces of evidence conflict |
low_confidence_hypothesis |
0.7 | Hypothesis stuck at ~0.5 |
user_flagged |
0.6 | Agent or user explicitly marks unknown |
recall_miss |
0.5 | Recall returned no results for a topic |
stale_schema |
0.4 | Hypothesis hasn't been updated in a long time |
Bayesian Confidence Formula¶
Confidence updates use a surprise-weighted Bayesian-inspired formula:
direction = +1.0 (evidence_for) or -1.0 (evidence_against)
surprise = (1.0 - confidence) if direction > 0 # confirming strong belief = low surprise
confidence if direction < 0 # contradicting strong belief = high surprise
dampening = 1.0 / (1.0 + 0.1 * total_evidence_count)
shift = direction * weight * surprise * dampening * 0.3
new_confidence = clamp(confidence + shift, 0.01, 0.99)
Key properties:
| Property | Effect |
|---|---|
| Surprise weighting | Contradicting a strong belief moves confidence more than confirming it |
| Dampening | More evidence accumulated = smaller individual updates (posterior stability) |
| Soft scaling (0.3) | Prevents wild swings from single evidence |
| Bounds [0.01, 0.99] | Confidence never reaches 0 or 1 — always revisable |
Worked example:
Starting: confidence=0.5, for=0, against=0
Add evidence_for (weight=0.7):
surprise = 1.0 - 0.5 = 0.5
dampening = 1.0 / (1.0 + 0.1 * 0) = 1.0
shift = 1.0 * 0.7 * 0.5 * 1.0 * 0.3 = 0.105
new_confidence = 0.605
Add evidence_for (weight=0.8):
surprise = 1.0 - 0.605 = 0.395
dampening = 1.0 / (1.0 + 0.1 * 1) = 0.909
shift = 1.0 * 0.8 * 0.395 * 0.909 * 0.3 = 0.086
new_confidence = 0.691
Add evidence_against (weight=0.9):
surprise = 0.691 (contradicting strong belief = high surprise)
dampening = 1.0 / (1.0 + 0.1 * 2) = 0.833
shift = -1.0 * 0.9 * 0.691 * 0.833 * 0.3 = -0.155
new_confidence = 0.536
Notice how the single against evidence (weight=0.9) nearly undid two for pieces — that's surprise weighting in action.
Auto-Resolution¶
A hypothesis auto-resolves when both conditions are met:
| Status | Confidence | Evidence count |
|---|---|---|
confirmed |
>= 0.9 | evidence_for >= 3 |
refuted |
<= 0.1 | evidence_against >= 3 |
Requiring both prevents false positives from a single high-weight evidence.
Hypothesis Lifecycle¶
create
|
v
[active] ──── add evidence ────┐
| |
| (auto-resolve?)
| / \
| confirmed refuted
| (conf>=0.9 (conf<=0.1
| & for>=3) & against>=3)
|
schema evolve
/ \
[superseded] [new active v2]
Valid statuses: active, confirmed, refuted, superseded, pending, expired
Hot Index Scoring¶
The hot index ranks items by urgency. nmem_cognitive(action="summary") returns the top 20.
Hypothesis score:
confidence_interest = 1.0 - abs(confidence - 0.5) * 2.0 # Mid-confidence = most interesting
evidence_factor = min(evidence_count / 5.0, 1.0) # More evidence = more developed
recency = 1.0 / (1.0 + age_days / 30.0) # Recent = more relevant
score = confidence_interest * 3 + evidence_factor * 4 + recency * 3
# Range: ~[0, 10]
Prediction score:
if overdue: score = 10.0 # Overdue = most urgent
else: score = 10.0 / (1.0 + days_until_deadline / 3.0)
Calibration score: correct_count / total_resolved (0.5 if no data).
End-to-End Examples¶
Example 1: Debugging a Performance Issue¶
# 1. Form hypothesis
nmem_hypothesize(
action="create",
content="The /api/orders endpoint is slow because of N+1 queries in the OrderSerializer",
confidence=0.6,
tags=["performance", "api", "database"]
)
# → hypothesis_id: "h1"
# 2. Add evidence supporting it
nmem_evidence(
hypothesis_id="h1",
content="Django Debug Toolbar shows 47 SQL queries for a single /api/orders?limit=10 request",
type="for",
weight=0.8
)
# → confidence: 0.6 → 0.69
# 3. Add evidence against it
nmem_evidence(
hypothesis_id="h1",
content="Adding select_related('customer') reduced queries to 12 but response time only improved by 15%",
type="against",
weight=0.6
)
# → confidence: 0.69 → 0.60
# 4. Make a testable prediction
nmem_predict(
action="create",
content="If I add prefetch_related for order_items and apply pagination, response time will drop below 200ms",
confidence=0.7,
hypothesis_id="h1",
deadline="2026-03-10T00:00:00"
)
# → prediction_id: "p1"
# 5. Implement the fix, then verify
nmem_verify(
prediction_id="p1",
outcome="wrong",
content="prefetch_related helped (350ms→180ms for small pages) but large pages still 800ms — the real bottleneck is JSON serialization of nested objects, not DB queries"
)
# → prediction refuted, hypothesis h1 gets evidence_against, confidence drops
# 6. Evolve the hypothesis with new understanding
nmem_schema(
action="evolve",
hypothesis_id="h1",
content="The /api/orders slowness is primarily caused by deep JSON serialization of nested OrderItem objects, with N+1 queries as a secondary factor",
reason="DB optimization helped but didn't solve it — profiling shows 60% time in serializer"
)
# → h1 superseded, new hypothesis h2 (version 2) created
# 7. Track what we still don't know
nmem_gaps(
action="detect",
topic="Best serialization strategy for deeply nested order data",
source="low_confidence_hypothesis"
)
Example 2: Architecture Decision¶
# 1. Hypothesis about tech choice
nmem_hypothesize(
action="create",
content="Moving from REST to GraphQL will reduce mobile app data fetching by 60% because clients can request exactly the fields they need",
confidence=0.5,
tags=["architecture", "graphql", "mobile"]
)
# → hypothesis_id: "h_gql"
# 2. Research evidence
nmem_evidence(
hypothesis_id="h_gql",
content="Analyzed 50 mobile API calls — 38 of them fetch >5 unused fields, averaging 40% payload waste",
type="for", weight=0.7
)
nmem_evidence(
hypothesis_id="h_gql",
content="GraphQL introduces N+1 at resolver level — DataLoader needed, adds complexity",
type="against", weight=0.5
)
nmem_evidence(
hypothesis_id="h_gql",
content="Team has zero GraphQL experience — estimated 3-week learning curve from senior dev",
type="against", weight=0.6
)
# 3. Check cognitive dashboard
nmem_cognitive(action="summary")
# → Shows h_gql with mid-confidence (still uncertain), 3 evidence pieces
# 4. Make prediction before committing
nmem_predict(
action="create",
content="A proof-of-concept GraphQL endpoint for /orders will show >=40% payload reduction in 2 days of work",
confidence=0.7,
hypothesis_id="h_gql",
deadline="2026-03-12T00:00:00"
)
# 5. After PoC...
nmem_verify(
prediction_id="...",
outcome="correct",
content="PoC showed 52% payload reduction. But took 4 days not 2 (DataLoader complexity)."
)
# → h_gql confidence increases, but we note the timeline was off
# 6. Evolve with nuance
nmem_schema(
action="evolve",
hypothesis_id="h_gql",
content="GraphQL reduces payload by ~50% but implementation cost is 2x estimated due to DataLoader complexity — worth it only for high-traffic endpoints",
reason="PoC confirmed payload savings but revealed hidden complexity cost"
)
Example 3: Tracking Prediction Accuracy¶
# After multiple predict/verify cycles, check calibration
nmem_predict(action="list")
# Returns:
# {
# calibration: {
# score: 0.67, # You're right 67% of the time
# correct: 4,
# wrong: 2,
# total_resolved: 6,
# pending: 3
# },
# predictions: [...]
# }
# If calibration is low, register a gap
nmem_gaps(
action="detect",
topic="Improving prediction accuracy for performance estimates",
source="user_flagged",
priority=0.7
)
# Review all active cognitive items
nmem_cognitive(action="refresh") # Recompute rankings
nmem_cognitive(action="summary") # View dashboard
Synapse Types¶
The cognitive layer creates these synapse types automatically:
| Synapse Type | Direction | Created by |
|---|---|---|
EVIDENCE_FOR |
evidence → hypothesis | nmem_evidence(type="for") |
EVIDENCE_AGAINST |
evidence → hypothesis | nmem_evidence(type="against") |
PREDICTED |
prediction → hypothesis | nmem_predict(hypothesis_id=...) |
VERIFIED_BY |
prediction → observation | nmem_verify(outcome="correct") |
FALSIFIED_BY |
prediction → observation | nmem_verify(outcome="wrong") |
SUPERSEDES |
new hypothesis → old | nmem_schema(action="evolve") |
These synapses are traversable via nmem_explain — you can trace the full reasoning chain from evidence through hypothesis to prediction to verification.
Best Practices¶
-
Start at 0.5 confidence unless you have prior knowledge. This gives equal room to move in either direction.
-
Use weight to express evidence strength. A log file showing exact error = weight 0.9. A hunch from a teammate = weight 0.3.
-
Make predictions falsifiable. "The app will be faster" is bad. "Response time will drop below 200ms after adding an index" is good.
-
Set deadlines on predictions. This creates urgency in the hot index and prevents forgotten predictions.
-
Evolve, don't delete. When a hypothesis is partially wrong, use
nmem_schema(evolve)instead of creating a new one. This preserves the reasoning chain. -
Use
nmem_gapsproactively. When you notice uncertainty, register it. Gaps surface innmem_cognitive(summary)so they don't get forgotten. -
Check calibration regularly. If your prediction accuracy is below 50%, you may be overconfident — lower your default confidence.
-
Refresh the hot index after a batch of evidence/verification updates. It's O(n) so don't call it after every single operation.