Cognitive Reasoning Guide¶

Neural Memory's Cognitive Reasoning Layer lets agents form hypotheses, gather evidence, make predictions, and verify outcomes — a structured scientific reasoning loop built on top of the memory graph.

Quick Start¶

nmem_hypothesize → nmem_evidence → nmem_predict → nmem_verify
       ↑                                              |
       └──────── nmem_schema (evolve) ←───────────────┘

The core loop:

Hypothesize — Form a belief about something uncertain
Evidence — Add supporting or contradicting evidence
Predict — Make a falsifiable prediction based on a hypothesis
Verify — Check if the prediction was correct or wrong
Evolve — Update the hypothesis when your understanding changes

Supporting tools:

nmem_cognitive — Dashboard: hot index + calibration score
nmem_gaps — Track what you don't know
nmem_explain — Trace connections between concepts

Tool Reference¶

nmem_hypothesize¶

Create, list, or inspect hypotheses.

Actions:

Action	Parameters	Description
`create`	`content`, `confidence` (0.01-0.99, default 0.5), `priority` (0-10), `tags`	Create a new hypothesis
`list`	`status` (active/confirmed/refuted/superseded), `limit` (max 100)	List hypotheses
`get`	`hypothesis_id`	Get full state + all evidence

Example:

nmem_hypothesize(
  action="create",
  content="Redis session store is causing the 500ms latency spike on /api/users",
  confidence=0.6,
  tags=["performance", "redis", "api"]
)
# Returns: { hypothesis_id: "abc123", fiber_id: "...", neurons_created: 5 }

nmem_evidence¶

Add evidence for or against a hypothesis. Each piece of evidence is a real memory neuron linked via synapse.

Parameter	Required	Description
`hypothesis_id`	Yes	Which hypothesis to update
`content`	Yes	The evidence text
`type`	Yes	`"for"` or `"against"`
`weight`	No	Strength 0.1-1.0 (default 0.5)
`priority`, `tags`	No	Standard memory metadata

Returns: confidence_before, confidence_after, confidence_delta, evidence counts, and auto_resolved if threshold was hit.

Example:

nmem_evidence(
  hypothesis_id="abc123",
  content="Redis SLOWLOG shows 450ms KEYS command during spike window",
  type="for",
  weight=0.8
)
# Returns: { confidence_before: 0.6, confidence_after: 0.69, confidence_delta: +0.09, ... }

nmem_predict¶

Make a falsifiable prediction, optionally linked to a hypothesis.

Action	Parameters	Description
`create`	`content`, `confidence` (default 0.7), `deadline` (ISO datetime), `hypothesis_id`	Create prediction
`list`	`status`, `limit`	List predictions with calibration stats
`get`	`prediction_id`	Get prediction details

Example:

nmem_predict(
  action="create",
  content="Replacing KEYS with SCAN will reduce p99 latency below 100ms",
  confidence=0.8,
  deadline="2026-03-15T00:00:00",
  hypothesis_id="abc123"
)

nmem_verify¶

Verify a prediction outcome. Automatically propagates to linked hypothesis.

Parameter	Required	Description
`prediction_id`	Yes	Which prediction to verify
`outcome`	Yes	`"correct"` or `"wrong"`
`content`	No	Observation/evidence text

Propagation: If the prediction is linked to a hypothesis: - correct → adds evidence_for (weight=0.6) to hypothesis - wrong → adds evidence_against (weight=0.6) to hypothesis

Example:

nmem_verify(
  prediction_id="pred456",
  outcome="correct",
  content="After SCAN migration, p99 dropped to 45ms. Confirmed via Grafana dashboard."
)
# Returns: { calibration_score: 0.75, propagated_to_hypothesis: { id: "abc123", new_confidence: 0.78 } }

nmem_schema¶

Evolve hypotheses when your understanding changes.

Action	Parameters	Description
`evolve`	`hypothesis_id`, `content`, `confidence`, `reason`	Create new version, supersede old
`history`	`hypothesis_id`	Walk version chain
`compare`	`hypothesis_id`, `other_id`	Side-by-side comparison

Example:

nmem_schema(
  action="evolve",
  hypothesis_id="abc123",
  content="Latency spike is caused by KEYS + connection pool exhaustion together, not KEYS alone",
  reason="SCAN fix reduced latency but didn't eliminate spikes completely"
)
# Returns: { new_hypothesis_id: "def789", schema_version: 2, old_status: "superseded" }

nmem_cognitive¶

Dashboard view of your cognitive state.

Action	Description
`summary`	Hot index (top 20 items by urgency) + calibration + top gaps
`refresh`	Recompute hot index from scratch (O(n), use sparingly)

nmem_gaps¶

Track knowledge gaps — things you don't know.

Action	Parameters	Description
`detect`	`topic`, `source`, `priority`, `related_neuron_ids`	Register a gap
`list`	`include_resolved`, `limit`	List unresolved gaps
`resolve`	`gap_id`, `resolved_by_neuron_id`	Mark gap as resolved
`get`	`gap_id`	Get gap details

Detection sources (with default priority):

Source	Priority	When to use
`contradicting_evidence`	0.8	Two pieces of evidence conflict
`low_confidence_hypothesis`	0.7	Hypothesis stuck at ~0.5
`user_flagged`	0.6	Agent or user explicitly marks unknown
`recall_miss`	0.5	Recall returned no results for a topic
`stale_schema`	0.4	Hypothesis hasn't been updated in a long time

Bayesian Confidence Formula¶

Confidence updates use a surprise-weighted Bayesian-inspired formula:

direction = +1.0 (evidence_for) or -1.0 (evidence_against)

surprise = (1.0 - confidence)    if direction > 0    # confirming strong belief = low surprise
           confidence             if direction < 0    # contradicting strong belief = high surprise

dampening = 1.0 / (1.0 + 0.1 * total_evidence_count)

shift = direction * weight * surprise * dampening * 0.3

new_confidence = clamp(confidence + shift, 0.01, 0.99)

Key properties:

Property	Effect
Surprise weighting	Contradicting a strong belief moves confidence more than confirming it
Dampening	More evidence accumulated = smaller individual updates (posterior stability)
Soft scaling (0.3)	Prevents wild swings from single evidence
Bounds [0.01, 0.99]	Confidence never reaches 0 or 1 — always revisable

Worked example:

Starting: confidence=0.5, for=0, against=0

Add evidence_for (weight=0.7):
  surprise = 1.0 - 0.5 = 0.5
  dampening = 1.0 / (1.0 + 0.1 * 0) = 1.0
  shift = 1.0 * 0.7 * 0.5 * 1.0 * 0.3 = 0.105
  new_confidence = 0.605

Add evidence_for (weight=0.8):
  surprise = 1.0 - 0.605 = 0.395
  dampening = 1.0 / (1.0 + 0.1 * 1) = 0.909
  shift = 1.0 * 0.8 * 0.395 * 0.909 * 0.3 = 0.086
  new_confidence = 0.691

Add evidence_against (weight=0.9):
  surprise = 0.691 (contradicting strong belief = high surprise)
  dampening = 1.0 / (1.0 + 0.1 * 2) = 0.833
  shift = -1.0 * 0.9 * 0.691 * 0.833 * 0.3 = -0.155
  new_confidence = 0.536

Notice how the single against evidence (weight=0.9) nearly undid two for pieces — that's surprise weighting in action.

Auto-Resolution¶

A hypothesis auto-resolves when both conditions are met:

Status	Confidence	Evidence count
`confirmed`	>= 0.9	`evidence_for` >= 3
`refuted`	<= 0.1	`evidence_against` >= 3

Requiring both prevents false positives from a single high-weight evidence.

Hypothesis Lifecycle¶

                    create
                      |
                      v
                   [active] ──── add evidence ────┐
                      |                           |
                      |                    (auto-resolve?)
                      |                     /          \
                      |              confirmed       refuted
                      |             (conf>=0.9      (conf<=0.1
                      |              & for>=3)       & against>=3)
                      |
                 schema evolve
                   /        \
            [superseded]   [new active v2]

Valid statuses: active, confirmed, refuted, superseded, pending, expired

Hot Index Scoring¶

The hot index ranks items by urgency. nmem_cognitive(action="summary") returns the top 20.

Hypothesis score:

confidence_interest = 1.0 - abs(confidence - 0.5) * 2.0    # Mid-confidence = most interesting
evidence_factor = min(evidence_count / 5.0, 1.0)            # More evidence = more developed
recency = 1.0 / (1.0 + age_days / 30.0)                     # Recent = more relevant

score = confidence_interest * 3 + evidence_factor * 4 + recency * 3
# Range: ~[0, 10]

Prediction score:

if overdue:    score = 10.0                                  # Overdue = most urgent
else:          score = 10.0 / (1.0 + days_until_deadline / 3.0)

Calibration score: correct_count / total_resolved (0.5 if no data).

End-to-End Examples¶

Example 1: Debugging a Performance Issue¶

# 1. Form hypothesis
nmem_hypothesize(
  action="create",
  content="The /api/orders endpoint is slow because of N+1 queries in the OrderSerializer",
  confidence=0.6,
  tags=["performance", "api", "database"]
)
# → hypothesis_id: "h1"

# 2. Add evidence supporting it
nmem_evidence(
  hypothesis_id="h1",
  content="Django Debug Toolbar shows 47 SQL queries for a single /api/orders?limit=10 request",
  type="for",
  weight=0.8
)
# → confidence: 0.6 → 0.69

# 3. Add evidence against it
nmem_evidence(
  hypothesis_id="h1",
  content="Adding select_related('customer') reduced queries to 12 but response time only improved by 15%",
  type="against",
  weight=0.6
)
# → confidence: 0.69 → 0.60

# 4. Make a testable prediction
nmem_predict(
  action="create",
  content="If I add prefetch_related for order_items and apply pagination, response time will drop below 200ms",
  confidence=0.7,
  hypothesis_id="h1",
  deadline="2026-03-10T00:00:00"
)
# → prediction_id: "p1"

# 5. Implement the fix, then verify
nmem_verify(
  prediction_id="p1",
  outcome="wrong",
  content="prefetch_related helped (350ms→180ms for small pages) but large pages still 800ms — the real bottleneck is JSON serialization of nested objects, not DB queries"
)
# → prediction refuted, hypothesis h1 gets evidence_against, confidence drops

# 6. Evolve the hypothesis with new understanding
nmem_schema(
  action="evolve",
  hypothesis_id="h1",
  content="The /api/orders slowness is primarily caused by deep JSON serialization of nested OrderItem objects, with N+1 queries as a secondary factor",
  reason="DB optimization helped but didn't solve it — profiling shows 60% time in serializer"
)
# → h1 superseded, new hypothesis h2 (version 2) created

# 7. Track what we still don't know
nmem_gaps(
  action="detect",
  topic="Best serialization strategy for deeply nested order data",
  source="low_confidence_hypothesis"
)

Example 2: Architecture Decision¶

# 1. Hypothesis about tech choice
nmem_hypothesize(
  action="create",
  content="Moving from REST to GraphQL will reduce mobile app data fetching by 60% because clients can request exactly the fields they need",
  confidence=0.5,
  tags=["architecture", "graphql", "mobile"]
)
# → hypothesis_id: "h_gql"

# 2. Research evidence
nmem_evidence(
  hypothesis_id="h_gql",
  content="Analyzed 50 mobile API calls — 38 of them fetch >5 unused fields, averaging 40% payload waste",
  type="for", weight=0.7
)

nmem_evidence(
  hypothesis_id="h_gql",
  content="GraphQL introduces N+1 at resolver level — DataLoader needed, adds complexity",
  type="against", weight=0.5
)

nmem_evidence(
  hypothesis_id="h_gql",
  content="Team has zero GraphQL experience — estimated 3-week learning curve from senior dev",
  type="against", weight=0.6
)

# 3. Check cognitive dashboard
nmem_cognitive(action="summary")
# → Shows h_gql with mid-confidence (still uncertain), 3 evidence pieces

# 4. Make prediction before committing
nmem_predict(
  action="create",
  content="A proof-of-concept GraphQL endpoint for /orders will show >=40% payload reduction in 2 days of work",
  confidence=0.7,
  hypothesis_id="h_gql",
  deadline="2026-03-12T00:00:00"
)

# 5. After PoC...
nmem_verify(
  prediction_id="...",
  outcome="correct",
  content="PoC showed 52% payload reduction. But took 4 days not 2 (DataLoader complexity)."
)
# → h_gql confidence increases, but we note the timeline was off

# 6. Evolve with nuance
nmem_schema(
  action="evolve",
  hypothesis_id="h_gql",
  content="GraphQL reduces payload by ~50% but implementation cost is 2x estimated due to DataLoader complexity — worth it only for high-traffic endpoints",
  reason="PoC confirmed payload savings but revealed hidden complexity cost"
)

Example 3: Tracking Prediction Accuracy¶

# After multiple predict/verify cycles, check calibration
nmem_predict(action="list")
# Returns:
# {
#   calibration: {
#     score: 0.67,         # You're right 67% of the time
#     correct: 4,
#     wrong: 2,
#     total_resolved: 6,
#     pending: 3
#   },
#   predictions: [...]
# }

# If calibration is low, register a gap
nmem_gaps(
  action="detect",
  topic="Improving prediction accuracy for performance estimates",
  source="user_flagged",
  priority=0.7
)

# Review all active cognitive items
nmem_cognitive(action="refresh")  # Recompute rankings
nmem_cognitive(action="summary")  # View dashboard

Synapse Types¶

The cognitive layer creates these synapse types automatically:

Synapse Type	Direction	Created by
`EVIDENCE_FOR`	evidence → hypothesis	`nmem_evidence(type="for")`
`EVIDENCE_AGAINST`	evidence → hypothesis	`nmem_evidence(type="against")`
`PREDICTED`	prediction → hypothesis	`nmem_predict(hypothesis_id=...)`
`VERIFIED_BY`	prediction → observation	`nmem_verify(outcome="correct")`
`FALSIFIED_BY`	prediction → observation	`nmem_verify(outcome="wrong")`
`SUPERSEDES`	new hypothesis → old	`nmem_schema(action="evolve")`

These synapses are traversable via nmem_explain — you can trace the full reasoning chain from evidence through hypothesis to prediction to verification.

Best Practices¶

Start at 0.5 confidence unless you have prior knowledge. This gives equal room to move in either direction.
Use weight to express evidence strength. A log file showing exact error = weight 0.9. A hunch from a teammate = weight 0.3.
Make predictions falsifiable. "The app will be faster" is bad. "Response time will drop below 200ms after adding an index" is good.
Set deadlines on predictions. This creates urgency in the hot index and prevents forgotten predictions.
Evolve, don't delete. When a hypothesis is partially wrong, use nmem_schema(evolve) instead of creating a new one. This preserves the reasoning chain.
Use nmem_gaps proactively. When you notice uncertainty, register it. Gaps surface in nmem_cognitive(summary) so they don't get forgotten.
Check calibration regularly. If your prediction accuracy is below 50%, you may be overconfident — lower your default confidence.
Refresh the hot index after a batch of evidence/verification updates. It's O(n) so don't call it after every single operation.