LocalDiplomacy/MEMORY_SYSTEM_PLAN.md

# LocalDiplomacy Memory System Plan

## Goal

LocalDiplomacy should support deep NPC roleplay, persistent save-specific memory, world events, lore awareness, backstories, and background NPC activity while remaining usable with local 4B-32B models.

The core design rule is:

```text
Store the world outside the model. Retrieve only the tiny slice needed for the current turn.
```

The C# Bannerlord mod should stay focused on game integration:

- collect current game state
- send conversation/world/action-result packets to Python
- receive assistant text and proposed game actions
- validate and execute game actions

The Python agent should own:

- memory storage
- save/playthrough scoping
- lore indexing
- NPC profiles and generated backstories
- world events
- task records
- background debate summaries
- prompt dossier construction
- semantic retrieval

## Architecture

Use both SQLite and Qdrant.

```text
SQLite = source of truth
Qdrant = semantic search index
Ollama embedding model = turns text into searchable vectors
Ollama dialogue model = roleplay and reasoning over retrieved context
```

SQLite remains mandatory because it is easy to inspect, migrate, back up, and query exactly. Qdrant should be rebuildable from SQLite at any time.

Qdrant should never be the only copy of important data.

## Qdrant Operating Modes

LocalDiplomacy should not require Docker for normal users.

Support these modes:

```text
disabled
embedded
managed_server
```

### disabled

Use SQLite and SQLite FTS5 only.

This mode is useful for early development, tests, and users who want the simplest possible install.

### embedded

Use Qdrant through the Python client local mode:

```python
QdrantClient(path="./data/qdrant")
```

This should become the default vector mode. It persists the vector index to disk without a separate Qdrant server process.

Benefits:

- no Docker required
- no extra server setup
- easy mod install story
- good enough for many local campaigns

SQLite remains the source of truth. The embedded Qdrant index can be deleted and rebuilt from SQLite.

### managed_server

Python starts and supervises a bundled or user-installed `qdrant.exe` process.

This mode is for larger campaigns or heavier background simulation where a real Qdrant server process is useful.

Responsibilities:

- check whether Qdrant is already reachable on the configured host/port
- start `qdrant.exe` when `autostart` is enabled
- pass a local storage/config path
- wait for health check success
- stop the child process when the Python agent exits
- fall back to embedded or SQLite-only mode if configured to do so

This mode should still be local-first and should not require Docker.

## Data Ownership

### C# Mod

The C# side should not manage AI memory. It should send enough facts for Python to update memory and make decisions.

Responsibilities:

- send `campaign_id`, `save_id`, current day, player, NPC, scene, nearby parties, settlements, kingdom state, and recent game diffs
- execute validated `GameAction` proposals
- report action results back to Python
- send world ticks and important state changes

### Python Agent

The Python agent should be an always-on local service.

Responsibilities:

- persist all AI memory under the current `save_id`
- retrieve relevant facts before model calls
- import and index world lore markdown files
- generate first-meeting NPC profiles
- summarize conversations and background debates
- decide what memories should be stored
- return compact prompts to local models

## Save-Scoped Storage

All playthrough-specific records must include:

```text
save_id
campaign_id
```

This prevents one campaign's Derthert, Caladog, or custom mod NPC from leaking into another playthrough.

Suggested SQLite path:

```text
data/localdiplomacy.sqlite3
```

## SQLite Schema Plan

### saves

Tracks known playthroughs.

```text
id
save_id
campaign_id
name
mod_profile
active_lore_source_id
created_at
last_seen_at
metadata_json
```

### characters

Stores known game characters for a save.

```text
id
save_id
campaign_id
character_id
name
clan_id
kingdom_id
culture_id
occupation
traits_json
last_seen_day
last_seen_at
metadata_json
```

### npc_profiles

Stores generated and evolving NPC identity.

```text
id
save_id
campaign_id
character_id
backstory
personality_json
speech_style
goals_json
fears_json
loyalties_json
relationship_to_player_json
known_history_summary
created_day
updated_day
created_at
updated_at
```

### memories

Stores durable character/world facts.

```text
id
save_id
campaign_id
subject_character_id
related_character_id
player_id
kingdom_id
location_id
category
importance
confidence
visibility
text
summary
tags_json
created_day
created_at
last_accessed_at
qdrant_point_id
metadata_json
```

Memory categories should include:

```text
conversation
promise
secret
known_info
relationship
event
personality
backstory
speech_pattern
romance
death_history
visit
mentioned_entity
lie_detection
debate
task
```

### world_events

Stores objective, rumored, or localized world events.

```text
id
save_id
campaign_id
event_type
title
summary
location_id
actor_character_id
target_character_id
actor_faction_id
target_faction_id
importance
visibility
known_by_character_id
known_by_faction_id
created_day
expires_day
created_at
updated_at
qdrant_point_id
metadata_json
```

Visibility examples:

```text
private
local
faction
global
rumor
```

### tasks

Stores NPC commitments and ongoing assignments.

```text
id
save_id
campaign_id
task_id
assignee_character_id
issuer_character_id
task_type
target_id
status
priority
created_day
due_day
completed_day
summary
constraints_json
result_json
created_at
updated_at
```

Task statuses:

```text
proposed
active
completed
failed
cancelled
rejected
expired
```

### conversation_turns

Stores raw audit/debug conversation data.

```text
id
save_id
campaign_id
turn_id
player_id
npc_id
location_id
player_message
assistant_text
created_day
created_at
metadata_json
```

Raw turns should not usually go into prompts except for the most recent turns.

### conversation_summaries

Stores compressed relationship/context history.

```text
id
save_id
campaign_id
player_id
npc_id
summary
turn_count
last_turn_day
updated_at
qdrant_point_id
```

### lore_sources

Stores available lore files.

```text
id
source_key
name
path
content_hash
active
created_at
updated_at
metadata_json
```

Examples:

```text
base_bannerlord
realm_of_thrones
ancient_greece
```

### lore_chunks

Stores indexed markdown chunks.

```text
id
lore_source_id
chunk_key
heading_path
title
text
summary
tags_json
entities_json
qdrant_point_id
created_at
updated_at
```

### background_debates

Stores summaries of NPC-to-NPC reasoning or faction debate.

```text
id
save_id
campaign_id
debate_id
topic
participants_json
faction_ids_json
location_id
summary
outcome
importance
created_day
created_at
qdrant_point_id
metadata_json
```

## SQLite Indexes

Create indexes for exact filters first.

```sql
CREATE INDEX idx_memories_scope
ON memories(save_id, campaign_id, subject_character_id);

CREATE INDEX idx_memories_related
ON memories(save_id, related_character_id);

CREATE INDEX idx_memories_faction
ON memories(save_id, kingdom_id);

CREATE INDEX idx_memories_location
ON memories(save_id, location_id);

CREATE INDEX idx_memories_category
ON memories(save_id, category);

CREATE INDEX idx_world_events_scope
ON world_events(save_id, campaign_id);

CREATE INDEX idx_world_events_location
ON world_events(save_id, location_id);

CREATE INDEX idx_world_events_factions
ON world_events(save_id, actor_faction_id, target_faction_id);

CREATE INDEX idx_tasks_assignee
ON tasks(save_id, assignee_character_id, status);

CREATE INDEX idx_profiles_character
ON npc_profiles(save_id, character_id);
```

Use SQLite FTS5 for fast keyword search:

```text
memories_fts
world_events_fts
lore_chunks_fts
conversation_summaries_fts
background_debates_fts
```

FTS should index compact searchable text, not huge JSON blobs.

## Qdrant Collections

Use Qdrant for semantic retrieval once data grows.

Suggested collections:

```text
localdiplomacy_memories
localdiplomacy_world_events
localdiplomacy_lore
localdiplomacy_conversation_summaries
localdiplomacy_background_debates
```

Each point payload should contain enough metadata for filtering:

```json
{
  "sqlite_table": "memories",
  "sqlite_id": 123,
  "save_id": "save_abc",
  "campaign_id": "campaign_001",
  "character_id": "lord_derthert",
  "kingdom_id": "kingdom_vlandia",
  "location_id": "town_sargot",
  "category": "promise",
  "importance": 8,
  "created_day": 72.4
}
```

Search pattern:

```text
1. Embed current query.
2. Search Qdrant with metadata filters.
3. Return candidate SQLite IDs.
4. Load full records from SQLite.
5. Rerank with local scoring.
6. Build compact prompt dossier.
```

## Embeddings

Embeddings convert text into vectors for semantic search.

Use a local embedding model so the system stays offline/local. Good initial target:

```text
Ollama + nomic-embed-text
```

Embeddings should be created when data is written:

- lore import
- memory creation
- world event creation
- conversation summary update
- background debate summary creation

At runtime, only the current query usually needs a fresh embedding.

## Retrieval Dossier

Before every conversation response, Python should build a compact dossier.

Inputs:

```text
save_id
campaign_id
player_id
npc_id
location_id
player_message
current_day
scene
nearby parties
nearby settlements
kingdom state
recent game diffs
```

Retrieve:

```text
1. NPC profile
2. first-meeting backstory if needed
3. last 2-6 raw turns with this NPC
4. conversation summary for player+npc
5. top 3-8 relevant memories
6. top 2-5 relevant world events
7. active tasks for this NPC/player/location
8. top 2-5 relevant lore chunks
9. relevant background debate summaries
```

The model should receive a concise dossier, not raw database dumps.

Example prompt section:

```text
NPC PROFILE
Derthert is proud, pragmatic, protective of Vlandia, and sensitive to noble honor.

RELEVANT MEMORIES
- The player promised Derthert they would defend Sargot if Battania attacked.
- Derthert distrusts the player's sympathy toward Battania.

RECENT WORLD EVENTS
- Battanian raiders burned farms near Sargot on day 72.

RELEVANT LORE
- Vlandian nobles value feudal oaths, cavalry service, inheritance, and military honor.

CURRENT SCENE
The player is speaking with Derthert in Sargot after border raids.
```

## Token Budgets

For local models, use hard budgets.

Target for 8k context:

```text
system instructions: 400 tokens
NPC profile: 250 tokens
current scene/game state: 500 tokens
memories: 500 tokens
world events: 400 tokens
lore: 500 tokens
recent dialogue: 500 tokens
tools/action rules: 400 tokens
response budget: 500-800 tokens
```

Prefer 2k-4k total prompt tokens for normal turns.

For 4B-7B models, use smaller dossiers. Smaller models often perform better with cleaner, shorter context.

## Lore Import

Users should be able to select a world lore markdown file.

Examples:

```text
lore/base_bannerlord.md
lore/realm_of_thrones.md
lore/ancient_greece.md
```

Import flow:

```text
1. Read markdown file.
2. Hash contents.
3. If unchanged, skip reimport.
4. Split by heading hierarchy.
5. Create 100-300 word chunks.
6. Extract headings, tags, and entity names.
7. Store chunks in SQLite.
8. Add chunks to FTS.
9. Embed chunks.
10. Upsert vectors into Qdrant.
```

At runtime, lore retrieval should consider:

- player message
- NPC culture
- NPC kingdom/faction
- location
- mentioned entities
- current event type
- active mod profile

Only retrieved lore chunks should enter the prompt.

## First-Meeting Backstory Generation

When the player meets an NPC for the first time in a save:

```text
1. Check npc_profiles for save_id + character_id.
2. If missing, gather current NPC game stats.
3. Retrieve relevant lore chunks.
4. Retrieve recent world events affecting their faction/location.
5. Generate compact backstory/profile JSON.
6. Store it in npc_profiles.
7. Use it in future prompts.
```

Generation input should be small and grounded:

```text
NPC:
- name
- clan
- kingdom
- culture
- occupation
- traits
- relation_to_player

Relevant lore:
- retrieved lore chunks only

Recent world events:
- retrieved world events only
```

Generated output:

```json
{
  "backstory": "...",
  "personality": ["proud", "cautious", "honor-bound"],
  "speech_style": "formal, martial, terse",
  "goals": ["protect Vlandia", "secure clan prestige"],
  "fears": ["dishonor", "border collapse"],
  "loyalties": ["kingdom_vlandia", "clan_dey_meroc"],
  "relationship_seed": {
    "trust": 15,
    "respect": 20,
    "suspicion": 5
  }
}
```

Backstories should be generated once per save unless explicitly regenerated.

## Memory Write Flow

After each conversation:

```text
1. Store raw turn in conversation_turns.
2. Ask model or deterministic extractor what facts matter.
3. Store important facts in memories.
4. Update conversation summary if needed.
5. Update NPC profile if relationship/personality changed.
6. Embed new memory/summary.
7. Upsert vector into Qdrant.
```

Do not store every sentence as a long-term memory.

Store atomic, useful facts:

```text
Good:
The player promised Derthert they would defend Sargot from Battania.

Bad:
The player said "I shall stand beside you if the storm comes, my lord..."
```

## World Event Flow

World events can come from:

- C# world ticks
- executed game actions
- rejected or failed action results
- AI-proposed events
- background debates
- major relationship/task changes

Flow:

```text
1. Receive event or diff.
2. Normalize into structured world_event.
3. Store in SQLite.
4. Embed summary.
5. Upsert to Qdrant.
6. Make it visible only to plausible characters/factions.
```

NPCs should not know all events automatically.

Use visibility:

```text
private
local
faction
global
rumor
```

## Background NPC Debates

For performance, background debates should usually be summaries, not full chat transcripts.

Example:

```text
Topic: Peace with Battania
Participants: Derthert, Erdurand, local Vlandian nobles
Summary: Derthert opposed peace unless Battania pays tribute. Erdurand argued the border villages cannot survive another campaign.
Outcome: Vlandian nobles are split but open to tribute-backed peace.
```

Store the summary and outcome. Retrieve it when the player discusses related diplomacy.

## Task System

AI-created tasks should be structured records.

The model may propose:

```text
assign_npc_task
cancel_npc_task
update_task
```

But C# should validate and execute game-affecting changes.

Python stores:

- requested task
- who assigned it
- who accepted it
- current status
- result
- related memories/events

Task results should feed memory:

```text
Derthert completed the player's request to patrol near Sargot.
Derthert failed to arrive before the raid and feels ashamed.
```

## Prompt Construction Rules

Never concatenate entire files or full databases into prompts.

Allowed:

- compact current scene
- compact NPC profile
- selected memories
- selected world events
- selected lore chunks
- selected tasks
- recent short dialogue window

Forbidden:

- full lore file
- full conversation history
- all world events
- all NPC memories
- raw JSON dumps larger than the budget

## Retrieval Scoring

Use hybrid retrieval.

Candidate sources:

```text
SQLite exact filters
SQLite FTS5 keyword search
Qdrant semantic search
recency/importance scoring
```

Example scoring:

```text
+50 same NPC
+35 directly related NPC
+30 same kingdom/faction
+25 same location
+25 exact entity mention
+20 active task involved
+20 high importance
+15 recent
+semantic similarity score
-20 expired/stale
-30 wrong visibility
```

Final prompt entries should be deduplicated and summarized if too long.

## Configuration

Extend Python config with:

```yaml
memory:
  provider: "sqlite"
  sqlite_path: "./data/localdiplomacy.sqlite3"
  embedding_provider: "ollama"
  embedding_model: "nomic-embed-text"
  embedding_auto_pull: true
  max_prompt_memories: 8
  max_prompt_lore_chunks: 5
  max_prompt_world_events: 5

vector_index:
  mode: "embedded" # disabled | embedded | managed_server
  path: "./data/qdrant"
  host: "127.0.0.1"
  port: 6333
  executable_path: "./qdrant/qdrant.exe"
  autostart: false
  startup_timeout_seconds: 30
  fallback_mode: "embedded" # embedded | disabled

lore:
  active_source: "base_bannerlord"
  sources:
    - key: "base_bannerlord"
      name: "Base Bannerlord"
      path: "./lore/base_bannerlord.md"
```

Ollama should be the default local model interface:

```yaml
ollama:
  base_url: "http://127.0.0.1:11434"
  chat_path: "/v1/chat/completions"
  model: "llama3.1:8b"
  timeout_seconds: 120
  auto_pull_models: true
```

If the configured chat model is not installed, the Python agent should ask Ollama to download it through `/api/pull`. If that pull fails and another local model is already installed, the agent may fall back to the first installed model. If the configured embedding model is not installed, the embedding layer should also ask Ollama to pull it; if embeddings remain unavailable, it should fall back to deterministic hashing so memory continues working.

## Implementation Phases

### Phase 1: Persistent SQLite Memory

- Add SQLite-backed memory store.
- Add migrations.
- Replace in-process fallback list.
- Store memory writes across restarts.
- Add tests for save-scoped memory isolation.

Status: implemented for basic long-term memories.

### Phase 2: Embedded Qdrant Index

- Add `vector_index.mode = "embedded"`.
- Use `QdrantClient(path="./data/qdrant")`.
- Keep SQLite as the canonical record store.
- Store Qdrant point IDs on SQLite records.
- Add rebuild-index command that recreates embedded Qdrant from SQLite.
- Add tests for embedded Qdrant persistence across Python process restarts.

Status: initial embedded Qdrant integration is implemented for memories. Rebuild support exists on `MemoryStore`; command-line/admin wiring still needs to be added.

### Phase 3: Managed Qdrant Server

- Add `vector_index.mode = "managed_server"`.
- Add a small Qdrant process manager for `qdrant.exe`.
- Check health before starting a new process.
- Start Qdrant when `autostart` is enabled.
- Use configured storage/config paths.
- Stop the child process on Python agent shutdown.
- Fall back to embedded or SQLite-only mode based on config.
- Add tests around process command construction and fallback behavior.

Status: initial managed-server scaffolding is implemented, including reachability checks, optional autostart, process shutdown, and fallback to embedded mode. Real bundled-binary packaging still needs to be decided.

### Phase 4: Lore Importer

- Add markdown lore source config.
- Chunk lore by headings.
- Store `lore_sources` and `lore_chunks`.
- Add FTS5 indexing.
- Add search endpoint/tool for lore retrieval.

Status: initial markdown lore import and embedded-Qdrant retrieval are implemented for tests. FTS indexing, config-driven file loading, and agent/tool integration still need to be added.

### Phase 5: Retrieval Dossier

- Add retrieval planner before model calls.
- Include NPC profile, memories, world events, lore, tasks, and recent turns.
- Enforce token budgets.
- Add tests for prompt size limits.

### Phase 6: NPC Profiles And Backstories

- Add `npc_profiles`.
- Generate first-meeting profiles from game stats, lore, and recent events.
- Store generated profile per save.
- Add profile update logic after important interactions.

### Phase 7: World Events And Tasks

- Store world ticks as normalized world events.
- Store action results as world events/memories.
- Add task records.
- Retrieve active tasks for prompts.

### Phase 8: Semantic Index Integration

- Add local embedding provider.
- Add Qdrant client wrapper.
- Upsert embeddings for memories, lore, events, summaries, and debates.
- Search Qdrant with save/faction/location filters.
- Load canonical records from SQLite.

### Phase 9: Background Debates

- Add background debate summaries.
- Store outcomes as world events and memories.
- Retrieve debate summaries for diplomacy conversations.

### Phase 10: Maintenance Jobs

- Summarize old conversation turns.
- Decay low-importance memories.
- Mark stale events expired.
- Rebuild missing embeddings.
- Add dashboard/debug views for memory retrieval.

## Testing Strategy

Add tests for:

- save isolation
- memory persistence across `MemoryStore` instances
- lore import chunking
- FTS search
- Qdrant payload filters
- retrieval dossier token limits
- first-meeting backstory only generates once per save
- world event visibility
- task lifecycle
- rebuild Qdrant index from SQLite

## Current Repo Gaps

The current implementation has a useful scaffold and these remaining gaps:

- `MemoryStore` now persists basic long-term memories in SQLite, but broader tables for profiles, lore, tasks, world events, summaries, and debates still need to be added.
- Embedded Qdrant memory indexing and initial managed `qdrant.exe` supervision exist; bundled-binary packaging/install UX still needs implementation.
- Ollama is now the active LLM interface for chat and the preferred embedding provider; the agent can ask Ollama to pull missing chat/embedding models, and deterministic hashing remains as an embedding fallback.
- Python returns `memory_writes`, but C# `ConversationResponse` does not currently model that field.
- Event log persists audit data but is not a full memory system.
- Lore can be imported/indexed through the initial `LoreStore`; config-driven world-file loading and prompt/tool integration still need implementation.
- NPC profiles/backstories are not implemented yet.
- Qdrant is currently integrated for memories only, not lore/events/summaries/debates yet.

## Key Principle

Depth should live in storage and retrieval, not in prompt length.

The local model should receive:

```text
the right 20 facts
```

not:

```text
every fact the mod has ever seen
```

That is how LocalDiplomacy can support AI Influence-style depth while remaining practical for local models.