22 KiB
LocalDiplomacy Memory System Plan
Goal
LocalDiplomacy should support deep NPC roleplay, persistent save-specific memory, world events, lore awareness, backstories, and background NPC activity while remaining usable with local 4B-32B models.
The core design rule is:
Store the world outside the model. Retrieve only the tiny slice needed for the current turn.
The C# Bannerlord mod should stay focused on game integration:
- collect current game state
- send conversation/world/action-result packets to Python
- receive assistant text and proposed game actions
- validate and execute game actions
The Python agent should own:
- memory storage
- save/playthrough scoping
- lore indexing
- NPC profiles and generated backstories
- world events
- task records
- background debate summaries
- prompt dossier construction
- semantic retrieval
Architecture
Use both SQLite and Qdrant.
SQLite = source of truth
Qdrant = semantic search index
Ollama embedding model = turns text into searchable vectors
Ollama dialogue model = roleplay and reasoning over retrieved context
SQLite remains mandatory because it is easy to inspect, migrate, back up, and query exactly. Qdrant should be rebuildable from SQLite at any time.
Qdrant should never be the only copy of important data.
Qdrant Operating Modes
LocalDiplomacy should not require Docker for normal users.
Support these modes:
disabled
embedded
managed_server
disabled
Use SQLite and SQLite FTS5 only.
This mode is useful for early development, tests, and users who want the simplest possible install.
embedded
Use Qdrant through the Python client local mode:
QdrantClient(path="./data/qdrant")
This should become the default vector mode. It persists the vector index to disk without a separate Qdrant server process.
Benefits:
- no Docker required
- no extra server setup
- easy mod install story
- good enough for many local campaigns
SQLite remains the source of truth. The embedded Qdrant index can be deleted and rebuilt from SQLite.
managed_server
Python starts and supervises a bundled or user-installed qdrant.exe process.
This mode is for larger campaigns or heavier background simulation where a real Qdrant server process is useful.
Responsibilities:
- check whether Qdrant is already reachable on the configured host/port
- start
qdrant.exewhenautostartis enabled - pass a local storage/config path
- wait for health check success
- stop the child process when the Python agent exits
- fall back to embedded or SQLite-only mode if configured to do so
This mode should still be local-first and should not require Docker.
Data Ownership
C# Mod
The C# side should not manage AI memory. It should send enough facts for Python to update memory and make decisions.
Responsibilities:
- send
campaign_id,save_id, current day, player, NPC, scene, nearby parties, settlements, kingdom state, and recent game diffs - execute validated
GameActionproposals - report action results back to Python
- send world ticks and important state changes
Python Agent
The Python agent should be an always-on local service.
Responsibilities:
- persist all AI memory under the current
save_id - retrieve relevant facts before model calls
- import and index world lore markdown files
- generate first-meeting NPC profiles
- summarize conversations and background debates
- decide what memories should be stored
- return compact prompts to local models
Save-Scoped Storage
All playthrough-specific records must include:
save_id
campaign_id
This prevents one campaign's Derthert, Caladog, or custom mod NPC from leaking into another playthrough.
Suggested SQLite path:
data/localdiplomacy.sqlite3
SQLite Schema Plan
saves
Tracks known playthroughs.
id
save_id
campaign_id
name
mod_profile
active_lore_source_id
created_at
last_seen_at
metadata_json
characters
Stores known game characters for a save.
id
save_id
campaign_id
character_id
name
clan_id
kingdom_id
culture_id
occupation
traits_json
last_seen_day
last_seen_at
metadata_json
npc_profiles
Stores generated and evolving NPC identity.
id
save_id
campaign_id
character_id
backstory
personality_json
speech_style
goals_json
fears_json
loyalties_json
relationship_to_player_json
known_history_summary
created_day
updated_day
created_at
updated_at
memories
Stores durable character/world facts.
id
save_id
campaign_id
subject_character_id
related_character_id
player_id
kingdom_id
location_id
category
importance
confidence
visibility
text
summary
tags_json
created_day
created_at
last_accessed_at
qdrant_point_id
metadata_json
Memory categories should include:
conversation
promise
secret
known_info
relationship
event
personality
backstory
speech_pattern
romance
death_history
visit
mentioned_entity
lie_detection
debate
task
world_events
Stores objective, rumored, or localized world events.
id
save_id
campaign_id
event_type
title
summary
location_id
actor_character_id
target_character_id
actor_faction_id
target_faction_id
importance
visibility
known_by_character_id
known_by_faction_id
created_day
expires_day
created_at
updated_at
qdrant_point_id
metadata_json
Visibility examples:
private
local
faction
global
rumor
tasks
Stores NPC commitments and ongoing assignments.
id
save_id
campaign_id
task_id
assignee_character_id
issuer_character_id
task_type
target_id
status
priority
created_day
due_day
completed_day
summary
constraints_json
result_json
created_at
updated_at
Task statuses:
proposed
active
completed
failed
cancelled
rejected
expired
conversation_turns
Stores raw audit/debug conversation data.
id
save_id
campaign_id
turn_id
player_id
npc_id
location_id
player_message
assistant_text
created_day
created_at
metadata_json
Raw turns should not usually go into prompts except for the most recent turns.
conversation_summaries
Stores compressed relationship/context history.
id
save_id
campaign_id
player_id
npc_id
summary
turn_count
last_turn_day
updated_at
qdrant_point_id
lore_sources
Stores available lore files.
id
source_key
name
path
content_hash
active
created_at
updated_at
metadata_json
Examples:
base_bannerlord
realm_of_thrones
ancient_greece
lore_chunks
Stores indexed markdown chunks.
id
lore_source_id
chunk_key
heading_path
title
text
summary
tags_json
entities_json
qdrant_point_id
created_at
updated_at
background_debates
Stores summaries of NPC-to-NPC reasoning or faction debate.
id
save_id
campaign_id
debate_id
topic
participants_json
faction_ids_json
location_id
summary
outcome
importance
created_day
created_at
qdrant_point_id
metadata_json
SQLite Indexes
Create indexes for exact filters first.
CREATE INDEX idx_memories_scope
ON memories(save_id, campaign_id, subject_character_id);
CREATE INDEX idx_memories_related
ON memories(save_id, related_character_id);
CREATE INDEX idx_memories_faction
ON memories(save_id, kingdom_id);
CREATE INDEX idx_memories_location
ON memories(save_id, location_id);
CREATE INDEX idx_memories_category
ON memories(save_id, category);
CREATE INDEX idx_world_events_scope
ON world_events(save_id, campaign_id);
CREATE INDEX idx_world_events_location
ON world_events(save_id, location_id);
CREATE INDEX idx_world_events_factions
ON world_events(save_id, actor_faction_id, target_faction_id);
CREATE INDEX idx_tasks_assignee
ON tasks(save_id, assignee_character_id, status);
CREATE INDEX idx_profiles_character
ON npc_profiles(save_id, character_id);
Use SQLite FTS5 for fast keyword search:
memories_fts
world_events_fts
lore_chunks_fts
conversation_summaries_fts
background_debates_fts
FTS should index compact searchable text, not huge JSON blobs.
Qdrant Collections
Use Qdrant for semantic retrieval once data grows.
Suggested collections:
localdiplomacy_memories
localdiplomacy_world_events
localdiplomacy_lore
localdiplomacy_conversation_summaries
localdiplomacy_background_debates
Each point payload should contain enough metadata for filtering:
{
"sqlite_table": "memories",
"sqlite_id": 123,
"save_id": "save_abc",
"campaign_id": "campaign_001",
"character_id": "lord_derthert",
"kingdom_id": "kingdom_vlandia",
"location_id": "town_sargot",
"category": "promise",
"importance": 8,
"created_day": 72.4
}
Search pattern:
1. Embed current query.
2. Search Qdrant with metadata filters.
3. Return candidate SQLite IDs.
4. Load full records from SQLite.
5. Rerank with local scoring.
6. Build compact prompt dossier.
Embeddings
Embeddings convert text into vectors for semantic search.
Use a local embedding model so the system stays offline/local. Good initial target:
Ollama + nomic-embed-text
Embeddings should be created when data is written:
- lore import
- memory creation
- world event creation
- conversation summary update
- background debate summary creation
At runtime, only the current query usually needs a fresh embedding.
Retrieval Dossier
Before every conversation response, Python should build a compact dossier.
Inputs:
save_id
campaign_id
player_id
npc_id
location_id
player_message
current_day
scene
nearby parties
nearby settlements
kingdom state
recent game diffs
Retrieve:
1. NPC profile
2. first-meeting backstory if needed
3. last 2-6 raw turns with this NPC
4. conversation summary for player+npc
5. top 3-8 relevant memories
6. top 2-5 relevant world events
7. active tasks for this NPC/player/location
8. top 2-5 relevant lore chunks
9. relevant background debate summaries
The model should receive a concise dossier, not raw database dumps.
Example prompt section:
NPC PROFILE
Derthert is proud, pragmatic, protective of Vlandia, and sensitive to noble honor.
RELEVANT MEMORIES
- The player promised Derthert they would defend Sargot if Battania attacked.
- Derthert distrusts the player's sympathy toward Battania.
RECENT WORLD EVENTS
- Battanian raiders burned farms near Sargot on day 72.
RELEVANT LORE
- Vlandian nobles value feudal oaths, cavalry service, inheritance, and military honor.
CURRENT SCENE
The player is speaking with Derthert in Sargot after border raids.
Token Budgets
For local models, use hard budgets.
Target for 8k context:
system instructions: 400 tokens
NPC profile: 250 tokens
current scene/game state: 500 tokens
memories: 500 tokens
world events: 400 tokens
lore: 500 tokens
recent dialogue: 500 tokens
tools/action rules: 400 tokens
response budget: 500-800 tokens
Prefer 2k-4k total prompt tokens for normal turns.
For 4B-7B models, use smaller dossiers. Smaller models often perform better with cleaner, shorter context.
Lore Import
Users should be able to select a world lore markdown file.
Examples:
lore/base_bannerlord.md
lore/realm_of_thrones.md
lore/ancient_greece.md
Import flow:
1. Read markdown file.
2. Hash contents.
3. If unchanged, skip reimport.
4. Split by heading hierarchy.
5. Create 100-300 word chunks.
6. Extract headings, tags, and entity names.
7. Store chunks in SQLite.
8. Add chunks to FTS.
9. Embed chunks.
10. Upsert vectors into Qdrant.
At runtime, lore retrieval should consider:
- player message
- NPC culture
- NPC kingdom/faction
- location
- mentioned entities
- current event type
- active mod profile
Only retrieved lore chunks should enter the prompt.
First-Meeting Backstory Generation
When the player meets an NPC for the first time in a save:
1. Check npc_profiles for save_id + character_id.
2. If missing, gather current NPC game stats.
3. Retrieve relevant lore chunks.
4. Retrieve recent world events affecting their faction/location.
5. Generate compact backstory/profile JSON.
6. Store it in npc_profiles.
7. Use it in future prompts.
Generation input should be small and grounded:
NPC:
- name
- clan
- kingdom
- culture
- occupation
- traits
- relation_to_player
Relevant lore:
- retrieved lore chunks only
Recent world events:
- retrieved world events only
Generated output:
{
"backstory": "...",
"personality": ["proud", "cautious", "honor-bound"],
"speech_style": "formal, martial, terse",
"goals": ["protect Vlandia", "secure clan prestige"],
"fears": ["dishonor", "border collapse"],
"loyalties": ["kingdom_vlandia", "clan_dey_meroc"],
"relationship_seed": {
"trust": 15,
"respect": 20,
"suspicion": 5
}
}
Backstories should be generated once per save unless explicitly regenerated.
Memory Write Flow
After each conversation:
1. Store raw turn in conversation_turns.
2. Ask model or deterministic extractor what facts matter.
3. Store important facts in memories.
4. Update conversation summary if needed.
5. Update NPC profile if relationship/personality changed.
6. Embed new memory/summary.
7. Upsert vector into Qdrant.
Do not store every sentence as a long-term memory.
Store atomic, useful facts:
Good:
The player promised Derthert they would defend Sargot from Battania.
Bad:
The player said "I shall stand beside you if the storm comes, my lord..."
World Event Flow
World events can come from:
- C# world ticks
- executed game actions
- rejected or failed action results
- AI-proposed events
- background debates
- major relationship/task changes
Flow:
1. Receive event or diff.
2. Normalize into structured world_event.
3. Store in SQLite.
4. Embed summary.
5. Upsert to Qdrant.
6. Make it visible only to plausible characters/factions.
NPCs should not know all events automatically.
Use visibility:
private
local
faction
global
rumor
Background NPC Debates
For performance, background debates should usually be summaries, not full chat transcripts.
Example:
Topic: Peace with Battania
Participants: Derthert, Erdurand, local Vlandian nobles
Summary: Derthert opposed peace unless Battania pays tribute. Erdurand argued the border villages cannot survive another campaign.
Outcome: Vlandian nobles are split but open to tribute-backed peace.
Store the summary and outcome. Retrieve it when the player discusses related diplomacy.
Task System
AI-created tasks should be structured records.
The model may propose:
assign_npc_task
cancel_npc_task
update_task
But C# should validate and execute game-affecting changes.
Python stores:
- requested task
- who assigned it
- who accepted it
- current status
- result
- related memories/events
Task results should feed memory:
Derthert completed the player's request to patrol near Sargot.
Derthert failed to arrive before the raid and feels ashamed.
Prompt Construction Rules
Never concatenate entire files or full databases into prompts.
Allowed:
- compact current scene
- compact NPC profile
- selected memories
- selected world events
- selected lore chunks
- selected tasks
- recent short dialogue window
Forbidden:
- full lore file
- full conversation history
- all world events
- all NPC memories
- raw JSON dumps larger than the budget
Retrieval Scoring
Use hybrid retrieval.
Candidate sources:
SQLite exact filters
SQLite FTS5 keyword search
Qdrant semantic search
recency/importance scoring
Example scoring:
+50 same NPC
+35 directly related NPC
+30 same kingdom/faction
+25 same location
+25 exact entity mention
+20 active task involved
+20 high importance
+15 recent
+semantic similarity score
-20 expired/stale
-30 wrong visibility
Final prompt entries should be deduplicated and summarized if too long.
Configuration
Extend Python config with:
memory:
provider: "sqlite"
sqlite_path: "./data/localdiplomacy.sqlite3"
embedding_provider: "ollama"
embedding_model: "nomic-embed-text"
embedding_auto_pull: true
max_prompt_memories: 8
max_prompt_lore_chunks: 5
max_prompt_world_events: 5
vector_index:
mode: "embedded" # disabled | embedded | managed_server
path: "./data/qdrant"
host: "127.0.0.1"
port: 6333
executable_path: "./qdrant/qdrant.exe"
autostart: false
startup_timeout_seconds: 30
fallback_mode: "embedded" # embedded | disabled
lore:
active_source: "base_bannerlord"
sources:
- key: "base_bannerlord"
name: "Base Bannerlord"
path: "./lore/base_bannerlord.md"
Ollama should be the default local model interface:
ollama:
base_url: "http://127.0.0.1:11434"
chat_path: "/v1/chat/completions"
model: "llama3.1:8b"
timeout_seconds: 120
auto_pull_models: true
If the configured chat model is not installed, the Python agent should ask Ollama to download it through /api/pull. If that pull fails and another local model is already installed, the agent may fall back to the first installed model. If the configured embedding model is not installed, the embedding layer should also ask Ollama to pull it; if embeddings remain unavailable, it should fall back to deterministic hashing so memory continues working.
Implementation Phases
Phase 1: Persistent SQLite Memory
- Add SQLite-backed memory store.
- Add migrations.
- Replace in-process fallback list.
- Store memory writes across restarts.
- Add tests for save-scoped memory isolation.
Status: implemented for basic long-term memories.
Phase 2: Embedded Qdrant Index
- Add
vector_index.mode = "embedded". - Use
QdrantClient(path="./data/qdrant"). - Keep SQLite as the canonical record store.
- Store Qdrant point IDs on SQLite records.
- Add rebuild-index command that recreates embedded Qdrant from SQLite.
- Add tests for embedded Qdrant persistence across Python process restarts.
Status: initial embedded Qdrant integration is implemented for memories. Rebuild support exists on MemoryStore; command-line/admin wiring still needs to be added.
Phase 3: Managed Qdrant Server
- Add
vector_index.mode = "managed_server". - Add a small Qdrant process manager for
qdrant.exe. - Check health before starting a new process.
- Start Qdrant when
autostartis enabled. - Use configured storage/config paths.
- Stop the child process on Python agent shutdown.
- Fall back to embedded or SQLite-only mode based on config.
- Add tests around process command construction and fallback behavior.
Status: initial managed-server scaffolding is implemented, including reachability checks, optional autostart, process shutdown, and fallback to embedded mode. Real bundled-binary packaging still needs to be decided.
Phase 4: Lore Importer
- Add markdown lore source config.
- Chunk lore by headings.
- Store
lore_sourcesandlore_chunks. - Add FTS5 indexing.
- Add search endpoint/tool for lore retrieval.
Status: initial markdown lore import and embedded-Qdrant retrieval are implemented for tests. FTS indexing, config-driven file loading, and agent/tool integration still need to be added.
Phase 5: Retrieval Dossier
- Add retrieval planner before model calls.
- Include NPC profile, memories, world events, lore, tasks, and recent turns.
- Enforce token budgets.
- Add tests for prompt size limits.
Phase 6: NPC Profiles And Backstories
- Add
npc_profiles. - Generate first-meeting profiles from game stats, lore, and recent events.
- Store generated profile per save.
- Add profile update logic after important interactions.
Phase 7: World Events And Tasks
- Store world ticks as normalized world events.
- Store action results as world events/memories.
- Add task records.
- Retrieve active tasks for prompts.
Phase 8: Semantic Index Integration
- Add local embedding provider.
- Add Qdrant client wrapper.
- Upsert embeddings for memories, lore, events, summaries, and debates.
- Search Qdrant with save/faction/location filters.
- Load canonical records from SQLite.
Phase 9: Background Debates
- Add background debate summaries.
- Store outcomes as world events and memories.
- Retrieve debate summaries for diplomacy conversations.
Phase 10: Maintenance Jobs
- Summarize old conversation turns.
- Decay low-importance memories.
- Mark stale events expired.
- Rebuild missing embeddings.
- Add dashboard/debug views for memory retrieval.
Testing Strategy
Add tests for:
- save isolation
- memory persistence across
MemoryStoreinstances - lore import chunking
- FTS search
- Qdrant payload filters
- retrieval dossier token limits
- first-meeting backstory only generates once per save
- world event visibility
- task lifecycle
- rebuild Qdrant index from SQLite
Current Repo Gaps
The current implementation has a useful scaffold and these remaining gaps:
MemoryStorenow persists basic long-term memories in SQLite, but broader tables for profiles, lore, tasks, world events, summaries, and debates still need to be added.- Embedded Qdrant memory indexing and initial managed
qdrant.exesupervision exist; bundled-binary packaging/install UX still needs implementation. - Ollama is now the active LLM interface for chat and the preferred embedding provider; the agent can ask Ollama to pull missing chat/embedding models, and deterministic hashing remains as an embedding fallback.
- Python returns
memory_writes, but C#ConversationResponsedoes not currently model that field. - Event log persists audit data but is not a full memory system.
- Lore can be imported/indexed through the initial
LoreStore; config-driven world-file loading and prompt/tool integration still need implementation. - NPC profiles/backstories are not implemented yet.
- Qdrant is currently integrated for memories only, not lore/events/summaries/debates yet.
Key Principle
Depth should live in storage and retrieval, not in prompt length.
The local model should receive:
the right 20 facts
not:
every fact the mod has ever seen
That is how LocalDiplomacy can support AI Influence-style depth while remaining practical for local models.