# LocalDiplomacy Memory System Plan ## Goal LocalDiplomacy should support deep NPC roleplay, persistent save-specific memory, world events, lore awareness, backstories, and background NPC activity while remaining usable with local 4B-32B models. The core design rule is: ```text Store the world outside the model. Retrieve only the tiny slice needed for the current turn. ``` The C# Bannerlord mod should stay focused on game integration: - collect current game state - send conversation/world/action-result packets to Python - receive assistant text and proposed game actions - validate and execute game actions The Python agent should own: - memory storage - save/playthrough scoping - lore indexing - NPC profiles and generated backstories - world events - task records - background debate summaries - prompt dossier construction - semantic retrieval ## Architecture Use both SQLite and Qdrant. ```text SQLite = source of truth Qdrant = semantic search index Ollama embedding model = turns text into searchable vectors Ollama dialogue model = roleplay and reasoning over retrieved context ``` SQLite remains mandatory because it is easy to inspect, migrate, back up, and query exactly. Qdrant should be rebuildable from SQLite at any time. Qdrant should never be the only copy of important data. ## Qdrant Operating Modes LocalDiplomacy should not require Docker for normal users. Support these modes: ```text disabled embedded managed_server ``` ### disabled Use SQLite and SQLite FTS5 only. This mode is useful for early development, tests, and users who want the simplest possible install. ### embedded Use Qdrant through the Python client local mode: ```python QdrantClient(path="./data/qdrant") ``` This should become the default vector mode. It persists the vector index to disk without a separate Qdrant server process. Benefits: - no Docker required - no extra server setup - easy mod install story - good enough for many local campaigns SQLite remains the source of truth. The embedded Qdrant index can be deleted and rebuilt from SQLite. ### managed_server Python starts and supervises a bundled or user-installed `qdrant.exe` process. This mode is for larger campaigns or heavier background simulation where a real Qdrant server process is useful. Responsibilities: - check whether Qdrant is already reachable on the configured host/port - start `qdrant.exe` when `autostart` is enabled - pass a local storage/config path - wait for health check success - stop the child process when the Python agent exits - fall back to embedded or SQLite-only mode if configured to do so This mode should still be local-first and should not require Docker. ## Data Ownership ### C# Mod The C# side should not manage AI memory. It should send enough facts for Python to update memory and make decisions. Responsibilities: - send `campaign_id`, `save_id`, current day, player, NPC, scene, nearby parties, settlements, kingdom state, and recent game diffs - execute validated `GameAction` proposals - report action results back to Python - send world ticks and important state changes ### Python Agent The Python agent should be an always-on local service. Responsibilities: - persist all AI memory under the current `save_id` - retrieve relevant facts before model calls - import and index world lore markdown files - generate first-meeting NPC profiles - summarize conversations and background debates - decide what memories should be stored - return compact prompts to local models ## Save-Scoped Storage All playthrough-specific records must include: ```text save_id campaign_id ``` This prevents one campaign's Derthert, Caladog, or custom mod NPC from leaking into another playthrough. Suggested SQLite path: ```text data/localdiplomacy.sqlite3 ``` ## SQLite Schema Plan ### saves Tracks known playthroughs. ```text id save_id campaign_id name mod_profile active_lore_source_id created_at last_seen_at metadata_json ``` ### characters Stores known game characters for a save. ```text id save_id campaign_id character_id name clan_id kingdom_id culture_id occupation traits_json last_seen_day last_seen_at metadata_json ``` ### npc_profiles Stores generated and evolving NPC identity. ```text id save_id campaign_id character_id backstory personality_json speech_style goals_json fears_json loyalties_json relationship_to_player_json known_history_summary created_day updated_day created_at updated_at ``` ### memories Stores durable character/world facts. ```text id save_id campaign_id subject_character_id related_character_id player_id kingdom_id location_id category importance confidence visibility text summary tags_json created_day created_at last_accessed_at qdrant_point_id metadata_json ``` Memory categories should include: ```text conversation promise secret known_info relationship event personality backstory speech_pattern romance death_history visit mentioned_entity lie_detection debate task ``` ### world_events Stores objective, rumored, or localized world events. ```text id save_id campaign_id event_type title summary location_id actor_character_id target_character_id actor_faction_id target_faction_id importance visibility known_by_character_id known_by_faction_id created_day expires_day created_at updated_at qdrant_point_id metadata_json ``` Visibility examples: ```text private local faction global rumor ``` ### tasks Stores NPC commitments and ongoing assignments. ```text id save_id campaign_id task_id assignee_character_id issuer_character_id task_type target_id status priority created_day due_day completed_day summary constraints_json result_json created_at updated_at ``` Task statuses: ```text proposed active completed failed cancelled rejected expired ``` ### conversation_turns Stores raw audit/debug conversation data. ```text id save_id campaign_id turn_id player_id npc_id location_id player_message assistant_text created_day created_at metadata_json ``` Raw turns should not usually go into prompts except for the most recent turns. ### conversation_summaries Stores compressed relationship/context history. ```text id save_id campaign_id player_id npc_id summary turn_count last_turn_day updated_at qdrant_point_id ``` ### lore_sources Stores available lore files. ```text id source_key name path content_hash active created_at updated_at metadata_json ``` Examples: ```text base_bannerlord realm_of_thrones ancient_greece ``` ### lore_chunks Stores indexed markdown chunks. ```text id lore_source_id chunk_key heading_path title text summary tags_json entities_json qdrant_point_id created_at updated_at ``` ### background_debates Stores summaries of NPC-to-NPC reasoning or faction debate. ```text id save_id campaign_id debate_id topic participants_json faction_ids_json location_id summary outcome importance created_day created_at qdrant_point_id metadata_json ``` ## SQLite Indexes Create indexes for exact filters first. ```sql CREATE INDEX idx_memories_scope ON memories(save_id, campaign_id, subject_character_id); CREATE INDEX idx_memories_related ON memories(save_id, related_character_id); CREATE INDEX idx_memories_faction ON memories(save_id, kingdom_id); CREATE INDEX idx_memories_location ON memories(save_id, location_id); CREATE INDEX idx_memories_category ON memories(save_id, category); CREATE INDEX idx_world_events_scope ON world_events(save_id, campaign_id); CREATE INDEX idx_world_events_location ON world_events(save_id, location_id); CREATE INDEX idx_world_events_factions ON world_events(save_id, actor_faction_id, target_faction_id); CREATE INDEX idx_tasks_assignee ON tasks(save_id, assignee_character_id, status); CREATE INDEX idx_profiles_character ON npc_profiles(save_id, character_id); ``` Use SQLite FTS5 for fast keyword search: ```text memories_fts world_events_fts lore_chunks_fts conversation_summaries_fts background_debates_fts ``` FTS should index compact searchable text, not huge JSON blobs. ## Qdrant Collections Use Qdrant for semantic retrieval once data grows. Suggested collections: ```text localdiplomacy_memories localdiplomacy_world_events localdiplomacy_lore localdiplomacy_conversation_summaries localdiplomacy_background_debates ``` Each point payload should contain enough metadata for filtering: ```json { "sqlite_table": "memories", "sqlite_id": 123, "save_id": "save_abc", "campaign_id": "campaign_001", "character_id": "lord_derthert", "kingdom_id": "kingdom_vlandia", "location_id": "town_sargot", "category": "promise", "importance": 8, "created_day": 72.4 } ``` Search pattern: ```text 1. Embed current query. 2. Search Qdrant with metadata filters. 3. Return candidate SQLite IDs. 4. Load full records from SQLite. 5. Rerank with local scoring. 6. Build compact prompt dossier. ``` ## Embeddings Embeddings convert text into vectors for semantic search. Use a local embedding model so the system stays offline/local. Good initial target: ```text Ollama + nomic-embed-text ``` Embeddings should be created when data is written: - lore import - memory creation - world event creation - conversation summary update - background debate summary creation At runtime, only the current query usually needs a fresh embedding. ## Retrieval Dossier Before every conversation response, Python should build a compact dossier. Inputs: ```text save_id campaign_id player_id npc_id location_id player_message current_day scene nearby parties nearby settlements kingdom state recent game diffs ``` Retrieve: ```text 1. NPC profile 2. first-meeting backstory if needed 3. last 2-6 raw turns with this NPC 4. conversation summary for player+npc 5. top 3-8 relevant memories 6. top 2-5 relevant world events 7. active tasks for this NPC/player/location 8. top 2-5 relevant lore chunks 9. relevant background debate summaries ``` The model should receive a concise dossier, not raw database dumps. Example prompt section: ```text NPC PROFILE Derthert is proud, pragmatic, protective of Vlandia, and sensitive to noble honor. RELEVANT MEMORIES - The player promised Derthert they would defend Sargot if Battania attacked. - Derthert distrusts the player's sympathy toward Battania. RECENT WORLD EVENTS - Battanian raiders burned farms near Sargot on day 72. RELEVANT LORE - Vlandian nobles value feudal oaths, cavalry service, inheritance, and military honor. CURRENT SCENE The player is speaking with Derthert in Sargot after border raids. ``` ## Token Budgets For local models, use hard budgets. Target for 8k context: ```text system instructions: 400 tokens NPC profile: 250 tokens current scene/game state: 500 tokens memories: 500 tokens world events: 400 tokens lore: 500 tokens recent dialogue: 500 tokens tools/action rules: 400 tokens response budget: 500-800 tokens ``` Prefer 2k-4k total prompt tokens for normal turns. For 4B-7B models, use smaller dossiers. Smaller models often perform better with cleaner, shorter context. ## Lore Import Users should be able to select a world lore markdown file. Examples: ```text lore/base_bannerlord.md lore/realm_of_thrones.md lore/ancient_greece.md ``` Import flow: ```text 1. Read markdown file. 2. Hash contents. 3. If unchanged, skip reimport. 4. Split by heading hierarchy. 5. Create 100-300 word chunks. 6. Extract headings, tags, and entity names. 7. Store chunks in SQLite. 8. Add chunks to FTS. 9. Embed chunks. 10. Upsert vectors into Qdrant. ``` At runtime, lore retrieval should consider: - player message - NPC culture - NPC kingdom/faction - location - mentioned entities - current event type - active mod profile Only retrieved lore chunks should enter the prompt. ## First-Meeting Backstory Generation When the player meets an NPC for the first time in a save: ```text 1. Check npc_profiles for save_id + character_id. 2. If missing, gather current NPC game stats. 3. Retrieve relevant lore chunks. 4. Retrieve recent world events affecting their faction/location. 5. Generate compact backstory/profile JSON. 6. Store it in npc_profiles. 7. Use it in future prompts. ``` Generation input should be small and grounded: ```text NPC: - name - clan - kingdom - culture - occupation - traits - relation_to_player Relevant lore: - retrieved lore chunks only Recent world events: - retrieved world events only ``` Generated output: ```json { "backstory": "...", "personality": ["proud", "cautious", "honor-bound"], "speech_style": "formal, martial, terse", "goals": ["protect Vlandia", "secure clan prestige"], "fears": ["dishonor", "border collapse"], "loyalties": ["kingdom_vlandia", "clan_dey_meroc"], "relationship_seed": { "trust": 15, "respect": 20, "suspicion": 5 } } ``` Backstories should be generated once per save unless explicitly regenerated. ## Memory Write Flow After each conversation: ```text 1. Store raw turn in conversation_turns. 2. Ask model or deterministic extractor what facts matter. 3. Store important facts in memories. 4. Update conversation summary if needed. 5. Update NPC profile if relationship/personality changed. 6. Embed new memory/summary. 7. Upsert vector into Qdrant. ``` Do not store every sentence as a long-term memory. Store atomic, useful facts: ```text Good: The player promised Derthert they would defend Sargot from Battania. Bad: The player said "I shall stand beside you if the storm comes, my lord..." ``` ## World Event Flow World events can come from: - C# world ticks - executed game actions - rejected or failed action results - AI-proposed events - background debates - major relationship/task changes Flow: ```text 1. Receive event or diff. 2. Normalize into structured world_event. 3. Store in SQLite. 4. Embed summary. 5. Upsert to Qdrant. 6. Make it visible only to plausible characters/factions. ``` NPCs should not know all events automatically. Use visibility: ```text private local faction global rumor ``` ## Background NPC Debates For performance, background debates should usually be summaries, not full chat transcripts. Example: ```text Topic: Peace with Battania Participants: Derthert, Erdurand, local Vlandian nobles Summary: Derthert opposed peace unless Battania pays tribute. Erdurand argued the border villages cannot survive another campaign. Outcome: Vlandian nobles are split but open to tribute-backed peace. ``` Store the summary and outcome. Retrieve it when the player discusses related diplomacy. ## Task System AI-created tasks should be structured records. The model may propose: ```text assign_npc_task cancel_npc_task update_task ``` But C# should validate and execute game-affecting changes. Python stores: - requested task - who assigned it - who accepted it - current status - result - related memories/events Task results should feed memory: ```text Derthert completed the player's request to patrol near Sargot. Derthert failed to arrive before the raid and feels ashamed. ``` ## Prompt Construction Rules Never concatenate entire files or full databases into prompts. Allowed: - compact current scene - compact NPC profile - selected memories - selected world events - selected lore chunks - selected tasks - recent short dialogue window Forbidden: - full lore file - full conversation history - all world events - all NPC memories - raw JSON dumps larger than the budget ## Retrieval Scoring Use hybrid retrieval. Candidate sources: ```text SQLite exact filters SQLite FTS5 keyword search Qdrant semantic search recency/importance scoring ``` Example scoring: ```text +50 same NPC +35 directly related NPC +30 same kingdom/faction +25 same location +25 exact entity mention +20 active task involved +20 high importance +15 recent +semantic similarity score -20 expired/stale -30 wrong visibility ``` Final prompt entries should be deduplicated and summarized if too long. ## Configuration Extend Python config with: ```yaml memory: provider: "sqlite" sqlite_path: "./data/localdiplomacy.sqlite3" embedding_provider: "ollama" embedding_model: "nomic-embed-text" embedding_auto_pull: true max_prompt_memories: 8 max_prompt_lore_chunks: 5 max_prompt_world_events: 5 vector_index: mode: "embedded" # disabled | embedded | managed_server path: "./data/qdrant" host: "127.0.0.1" port: 6333 executable_path: "./qdrant/qdrant.exe" autostart: false startup_timeout_seconds: 30 fallback_mode: "embedded" # embedded | disabled lore: active_source: "base_bannerlord" sources: - key: "base_bannerlord" name: "Base Bannerlord" path: "./lore/base_bannerlord.md" ``` Ollama should be the default local model interface: ```yaml ollama: base_url: "http://127.0.0.1:11434" chat_path: "/v1/chat/completions" model: "llama3.1:8b" timeout_seconds: 120 auto_pull_models: true ``` If the configured chat model is not installed, the Python agent should ask Ollama to download it through `/api/pull`. If that pull fails and another local model is already installed, the agent may fall back to the first installed model. If the configured embedding model is not installed, the embedding layer should also ask Ollama to pull it; if embeddings remain unavailable, it should fall back to deterministic hashing so memory continues working. ## Implementation Phases ### Phase 1: Persistent SQLite Memory - Add SQLite-backed memory store. - Add migrations. - Replace in-process fallback list. - Store memory writes across restarts. - Add tests for save-scoped memory isolation. Status: implemented for basic long-term memories. ### Phase 2: Embedded Qdrant Index - Add `vector_index.mode = "embedded"`. - Use `QdrantClient(path="./data/qdrant")`. - Keep SQLite as the canonical record store. - Store Qdrant point IDs on SQLite records. - Add rebuild-index command that recreates embedded Qdrant from SQLite. - Add tests for embedded Qdrant persistence across Python process restarts. Status: initial embedded Qdrant integration is implemented for memories. Rebuild support exists on `MemoryStore`; command-line/admin wiring still needs to be added. ### Phase 3: Managed Qdrant Server - Add `vector_index.mode = "managed_server"`. - Add a small Qdrant process manager for `qdrant.exe`. - Check health before starting a new process. - Start Qdrant when `autostart` is enabled. - Use configured storage/config paths. - Stop the child process on Python agent shutdown. - Fall back to embedded or SQLite-only mode based on config. - Add tests around process command construction and fallback behavior. Status: initial managed-server scaffolding is implemented, including reachability checks, optional autostart, process shutdown, and fallback to embedded mode. Real bundled-binary packaging still needs to be decided. ### Phase 4: Lore Importer - Add markdown lore source config. - Chunk lore by headings. - Store `lore_sources` and `lore_chunks`. - Add FTS5 indexing. - Add search endpoint/tool for lore retrieval. Status: initial markdown lore import and embedded-Qdrant retrieval are implemented for tests. FTS indexing, config-driven file loading, and agent/tool integration still need to be added. ### Phase 5: Retrieval Dossier - Add retrieval planner before model calls. - Include NPC profile, memories, world events, lore, tasks, and recent turns. - Enforce token budgets. - Add tests for prompt size limits. ### Phase 6: NPC Profiles And Backstories - Add `npc_profiles`. - Generate first-meeting profiles from game stats, lore, and recent events. - Store generated profile per save. - Add profile update logic after important interactions. ### Phase 7: World Events And Tasks - Store world ticks as normalized world events. - Store action results as world events/memories. - Add task records. - Retrieve active tasks for prompts. ### Phase 8: Semantic Index Integration - Add local embedding provider. - Add Qdrant client wrapper. - Upsert embeddings for memories, lore, events, summaries, and debates. - Search Qdrant with save/faction/location filters. - Load canonical records from SQLite. ### Phase 9: Background Debates - Add background debate summaries. - Store outcomes as world events and memories. - Retrieve debate summaries for diplomacy conversations. ### Phase 10: Maintenance Jobs - Summarize old conversation turns. - Decay low-importance memories. - Mark stale events expired. - Rebuild missing embeddings. - Add dashboard/debug views for memory retrieval. ## Testing Strategy Add tests for: - save isolation - memory persistence across `MemoryStore` instances - lore import chunking - FTS search - Qdrant payload filters - retrieval dossier token limits - first-meeting backstory only generates once per save - world event visibility - task lifecycle - rebuild Qdrant index from SQLite ## Current Repo Gaps The current implementation has a useful scaffold and these remaining gaps: - `MemoryStore` now persists basic long-term memories in SQLite, but broader tables for profiles, lore, tasks, world events, summaries, and debates still need to be added. - Embedded Qdrant memory indexing and initial managed `qdrant.exe` supervision exist; bundled-binary packaging/install UX still needs implementation. - Ollama is now the active LLM interface for chat and the preferred embedding provider; the agent can ask Ollama to pull missing chat/embedding models, and deterministic hashing remains as an embedding fallback. - Python returns `memory_writes`, but C# `ConversationResponse` does not currently model that field. - Event log persists audit data but is not a full memory system. - Lore can be imported/indexed through the initial `LoreStore`; config-driven world-file loading and prompt/tool integration still need implementation. - NPC profiles/backstories are not implemented yet. - Qdrant is currently integrated for memories only, not lore/events/summaries/debates yet. ## Key Principle Depth should live in storage and retrieval, not in prompt length. The local model should receive: ```text the right 20 facts ``` not: ```text every fact the mod has ever seen ``` That is how LocalDiplomacy can support AI Influence-style depth while remaining practical for local models.