HRiggs/LocalDiplomacy

Fork 0

Files

T

HRiggs f4e1e18541 feat: ollama, embedings and more

2026-04-30 13:59:46 -04:00

22 KiB

Raw Blame History

LocalDiplomacy Memory System Plan

Goal

LocalDiplomacy should support deep NPC roleplay, persistent save-specific memory, world events, lore awareness, backstories, and background NPC activity while remaining usable with local 4B-32B models.

The core design rule is:

Store the world outside the model. Retrieve only the tiny slice needed for the current turn.

The C# Bannerlord mod should stay focused on game integration:

collect current game state
send conversation/world/action-result packets to Python
receive assistant text and proposed game actions
validate and execute game actions

The Python agent should own:

memory storage
save/playthrough scoping
lore indexing
NPC profiles and generated backstories
world events
task records
background debate summaries
prompt dossier construction
semantic retrieval

Architecture

Use both SQLite and Qdrant.

SQLite = source of truth
Qdrant = semantic search index
Ollama embedding model = turns text into searchable vectors
Ollama dialogue model = roleplay and reasoning over retrieved context

SQLite remains mandatory because it is easy to inspect, migrate, back up, and query exactly. Qdrant should be rebuildable from SQLite at any time.

Qdrant should never be the only copy of important data.

Qdrant Operating Modes

LocalDiplomacy should not require Docker for normal users.

Support these modes:

disabled
embedded
managed_server

disabled

Use SQLite and SQLite FTS5 only.

This mode is useful for early development, tests, and users who want the simplest possible install.

embedded

Use Qdrant through the Python client local mode:

QdrantClient(path="./data/qdrant")

This should become the default vector mode. It persists the vector index to disk without a separate Qdrant server process.

Benefits:

no Docker required
no extra server setup
easy mod install story
good enough for many local campaigns

SQLite remains the source of truth. The embedded Qdrant index can be deleted and rebuilt from SQLite.

managed_server

Python starts and supervises a bundled or user-installed qdrant.exe process.

This mode is for larger campaigns or heavier background simulation where a real Qdrant server process is useful.

Responsibilities:

check whether Qdrant is already reachable on the configured host/port
start qdrant.exe when autostart is enabled
pass a local storage/config path
wait for health check success
stop the child process when the Python agent exits
fall back to embedded or SQLite-only mode if configured to do so

This mode should still be local-first and should not require Docker.

Data Ownership

C# Mod

The C# side should not manage AI memory. It should send enough facts for Python to update memory and make decisions.

Responsibilities:

send campaign_id, save_id, current day, player, NPC, scene, nearby parties, settlements, kingdom state, and recent game diffs
execute validated GameAction proposals
report action results back to Python
send world ticks and important state changes

Python Agent

The Python agent should be an always-on local service.

Responsibilities:

persist all AI memory under the current save_id
retrieve relevant facts before model calls
import and index world lore markdown files
generate first-meeting NPC profiles
summarize conversations and background debates
decide what memories should be stored
return compact prompts to local models

Save-Scoped Storage

All playthrough-specific records must include:

save_id
campaign_id

This prevents one campaign's Derthert, Caladog, or custom mod NPC from leaking into another playthrough.

Suggested SQLite path:

data/localdiplomacy.sqlite3

SQLite Schema Plan

saves

Tracks known playthroughs.

id
save_id
campaign_id
name
mod_profile
active_lore_source_id
created_at
last_seen_at
metadata_json

characters

Stores known game characters for a save.

id
save_id
campaign_id
character_id
name
clan_id
kingdom_id
culture_id
occupation
traits_json
last_seen_day
last_seen_at
metadata_json

npc_profiles

Stores generated and evolving NPC identity.

id
save_id
campaign_id
character_id
backstory
personality_json
speech_style
goals_json
fears_json
loyalties_json
relationship_to_player_json
known_history_summary
created_day
updated_day
created_at
updated_at

memories

Stores durable character/world facts.

id
save_id
campaign_id
subject_character_id
related_character_id
player_id
kingdom_id
location_id
category
importance
confidence
visibility
text
summary
tags_json
created_day
created_at
last_accessed_at
qdrant_point_id
metadata_json

Memory categories should include:

conversation
promise
secret
known_info
relationship
event
personality
backstory
speech_pattern
romance
death_history
visit
mentioned_entity
lie_detection
debate
task

world_events

Stores objective, rumored, or localized world events.

id
save_id
campaign_id
event_type
title
summary
location_id
actor_character_id
target_character_id
actor_faction_id
target_faction_id
importance
visibility
known_by_character_id
known_by_faction_id
created_day
expires_day
created_at
updated_at
qdrant_point_id
metadata_json

Visibility examples:

private
local
faction
global
rumor

tasks

Stores NPC commitments and ongoing assignments.

id
save_id
campaign_id
task_id
assignee_character_id
issuer_character_id
task_type
target_id
status
priority
created_day
due_day
completed_day
summary
constraints_json
result_json
created_at
updated_at

Task statuses:

proposed
active
completed
failed
cancelled
rejected
expired

conversation_turns

Stores raw audit/debug conversation data.

id
save_id
campaign_id
turn_id
player_id
npc_id
location_id
player_message
assistant_text
created_day
created_at
metadata_json

Raw turns should not usually go into prompts except for the most recent turns.

conversation_summaries

Stores compressed relationship/context history.

id
save_id
campaign_id
player_id
npc_id
summary
turn_count
last_turn_day
updated_at
qdrant_point_id

lore_sources

Stores available lore files.

id
source_key
name
path
content_hash
active
created_at
updated_at
metadata_json

Examples:

base_bannerlord
realm_of_thrones
ancient_greece

lore_chunks

Stores indexed markdown chunks.

id
lore_source_id
chunk_key
heading_path
title
text
summary
tags_json
entities_json
qdrant_point_id
created_at
updated_at

background_debates

Stores summaries of NPC-to-NPC reasoning or faction debate.

id
save_id
campaign_id
debate_id
topic
participants_json
faction_ids_json
location_id
summary
outcome
importance
created_day
created_at
qdrant_point_id
metadata_json

SQLite Indexes

Create indexes for exact filters first.

CREATE INDEX idx_memories_scope
ON memories(save_id, campaign_id, subject_character_id);

CREATE INDEX idx_memories_related
ON memories(save_id, related_character_id);

CREATE INDEX idx_memories_faction
ON memories(save_id, kingdom_id);

CREATE INDEX idx_memories_location
ON memories(save_id, location_id);

CREATE INDEX idx_memories_category
ON memories(save_id, category);

CREATE INDEX idx_world_events_scope
ON world_events(save_id, campaign_id);

CREATE INDEX idx_world_events_location
ON world_events(save_id, location_id);

CREATE INDEX idx_world_events_factions
ON world_events(save_id, actor_faction_id, target_faction_id);

CREATE INDEX idx_tasks_assignee
ON tasks(save_id, assignee_character_id, status);

CREATE INDEX idx_profiles_character
ON npc_profiles(save_id, character_id);

Use SQLite FTS5 for fast keyword search:

memories_fts
world_events_fts
lore_chunks_fts
conversation_summaries_fts
background_debates_fts

FTS should index compact searchable text, not huge JSON blobs.

Qdrant Collections

Use Qdrant for semantic retrieval once data grows.

Suggested collections:

localdiplomacy_memories
localdiplomacy_world_events
localdiplomacy_lore
localdiplomacy_conversation_summaries
localdiplomacy_background_debates

Each point payload should contain enough metadata for filtering:

{
  "sqlite_table": "memories",
  "sqlite_id": 123,
  "save_id": "save_abc",
  "campaign_id": "campaign_001",
  "character_id": "lord_derthert",
  "kingdom_id": "kingdom_vlandia",
  "location_id": "town_sargot",
  "category": "promise",
  "importance": 8,
  "created_day": 72.4
}

Search pattern:

1. Embed current query.
2. Search Qdrant with metadata filters.
3. Return candidate SQLite IDs.
4. Load full records from SQLite.
5. Rerank with local scoring.
6. Build compact prompt dossier.

Embeddings

Embeddings convert text into vectors for semantic search.

Use a local embedding model so the system stays offline/local. Good initial target:

Ollama + nomic-embed-text

Embeddings should be created when data is written:

lore import
memory creation
world event creation
conversation summary update
background debate summary creation

At runtime, only the current query usually needs a fresh embedding.

Retrieval Dossier

Before every conversation response, Python should build a compact dossier.

Inputs:

save_id
campaign_id
player_id
npc_id
location_id
player_message
current_day
scene
nearby parties
nearby settlements
kingdom state
recent game diffs

Retrieve:

1. NPC profile
2. first-meeting backstory if needed
3. last 2-6 raw turns with this NPC
4. conversation summary for player+npc
5. top 3-8 relevant memories
6. top 2-5 relevant world events
7. active tasks for this NPC/player/location
8. top 2-5 relevant lore chunks
9. relevant background debate summaries

The model should receive a concise dossier, not raw database dumps.

Example prompt section:

NPC PROFILE
Derthert is proud, pragmatic, protective of Vlandia, and sensitive to noble honor.

RELEVANT MEMORIES
- The player promised Derthert they would defend Sargot if Battania attacked.
- Derthert distrusts the player's sympathy toward Battania.

RECENT WORLD EVENTS
- Battanian raiders burned farms near Sargot on day 72.

RELEVANT LORE
- Vlandian nobles value feudal oaths, cavalry service, inheritance, and military honor.

CURRENT SCENE
The player is speaking with Derthert in Sargot after border raids.

Token Budgets

For local models, use hard budgets.

Target for 8k context:

system instructions: 400 tokens
NPC profile: 250 tokens
current scene/game state: 500 tokens
memories: 500 tokens
world events: 400 tokens
lore: 500 tokens
recent dialogue: 500 tokens
tools/action rules: 400 tokens
response budget: 500-800 tokens

Prefer 2k-4k total prompt tokens for normal turns.

For 4B-7B models, use smaller dossiers. Smaller models often perform better with cleaner, shorter context.

Lore Import

Users should be able to select a world lore markdown file.

Examples:

lore/base_bannerlord.md
lore/realm_of_thrones.md
lore/ancient_greece.md

Import flow:

1. Read markdown file.
2. Hash contents.
3. If unchanged, skip reimport.
4. Split by heading hierarchy.
5. Create 100-300 word chunks.
6. Extract headings, tags, and entity names.
7. Store chunks in SQLite.
8. Add chunks to FTS.
9. Embed chunks.
10. Upsert vectors into Qdrant.

At runtime, lore retrieval should consider:

player message
NPC culture
NPC kingdom/faction
location
mentioned entities
current event type
active mod profile

Only retrieved lore chunks should enter the prompt.

First-Meeting Backstory Generation

When the player meets an NPC for the first time in a save:

1. Check npc_profiles for save_id + character_id.
2. If missing, gather current NPC game stats.
3. Retrieve relevant lore chunks.
4. Retrieve recent world events affecting their faction/location.
5. Generate compact backstory/profile JSON.
6. Store it in npc_profiles.
7. Use it in future prompts.

Generation input should be small and grounded:

NPC:
- name
- clan
- kingdom
- culture
- occupation
- traits
- relation_to_player

Relevant lore:
- retrieved lore chunks only

Recent world events:
- retrieved world events only

Generated output:

{
  "backstory": "...",
  "personality": ["proud", "cautious", "honor-bound"],
  "speech_style": "formal, martial, terse",
  "goals": ["protect Vlandia", "secure clan prestige"],
  "fears": ["dishonor", "border collapse"],
  "loyalties": ["kingdom_vlandia", "clan_dey_meroc"],
  "relationship_seed": {
    "trust": 15,
    "respect": 20,
    "suspicion": 5
  }
}

Backstories should be generated once per save unless explicitly regenerated.

Memory Write Flow

After each conversation:

1. Store raw turn in conversation_turns.
2. Ask model or deterministic extractor what facts matter.
3. Store important facts in memories.
4. Update conversation summary if needed.
5. Update NPC profile if relationship/personality changed.
6. Embed new memory/summary.
7. Upsert vector into Qdrant.

Do not store every sentence as a long-term memory.

Store atomic, useful facts:

Good:
The player promised Derthert they would defend Sargot from Battania.

Bad:
The player said "I shall stand beside you if the storm comes, my lord..."

World Event Flow

World events can come from:

C# world ticks
executed game actions
rejected or failed action results
AI-proposed events
background debates
major relationship/task changes

Flow:

1. Receive event or diff.
2. Normalize into structured world_event.
3. Store in SQLite.
4. Embed summary.
5. Upsert to Qdrant.
6. Make it visible only to plausible characters/factions.

NPCs should not know all events automatically.

Use visibility:

private
local
faction
global
rumor

Background NPC Debates

For performance, background debates should usually be summaries, not full chat transcripts.

Example:

Topic: Peace with Battania
Participants: Derthert, Erdurand, local Vlandian nobles
Summary: Derthert opposed peace unless Battania pays tribute. Erdurand argued the border villages cannot survive another campaign.
Outcome: Vlandian nobles are split but open to tribute-backed peace.

Store the summary and outcome. Retrieve it when the player discusses related diplomacy.

Task System

AI-created tasks should be structured records.

The model may propose:

assign_npc_task
cancel_npc_task
update_task

But C# should validate and execute game-affecting changes.

Python stores:

requested task
who assigned it
who accepted it
current status
result
related memories/events

Task results should feed memory:

Derthert completed the player's request to patrol near Sargot.
Derthert failed to arrive before the raid and feels ashamed.

Prompt Construction Rules

Never concatenate entire files or full databases into prompts.

Allowed:

compact current scene
compact NPC profile
selected memories
selected world events
selected lore chunks
selected tasks
recent short dialogue window

Forbidden:

full lore file
full conversation history
all world events
all NPC memories
raw JSON dumps larger than the budget

Retrieval Scoring

Use hybrid retrieval.

Candidate sources:

SQLite exact filters
SQLite FTS5 keyword search
Qdrant semantic search
recency/importance scoring

Example scoring:

+50 same NPC
+35 directly related NPC
+30 same kingdom/faction
+25 same location
+25 exact entity mention
+20 active task involved
+20 high importance
+15 recent
+semantic similarity score
-20 expired/stale
-30 wrong visibility

Final prompt entries should be deduplicated and summarized if too long.

Configuration

Extend Python config with:

memory:
  provider: "sqlite"
  sqlite_path: "./data/localdiplomacy.sqlite3"
  embedding_provider: "ollama"
  embedding_model: "nomic-embed-text"
  embedding_auto_pull: true
  max_prompt_memories: 8
  max_prompt_lore_chunks: 5
  max_prompt_world_events: 5

vector_index:
  mode: "embedded" # disabled | embedded | managed_server
  path: "./data/qdrant"
  host: "127.0.0.1"
  port: 6333
  executable_path: "./qdrant/qdrant.exe"
  autostart: false
  startup_timeout_seconds: 30
  fallback_mode: "embedded" # embedded | disabled

lore:
  active_source: "base_bannerlord"
  sources:
    - key: "base_bannerlord"
      name: "Base Bannerlord"
      path: "./lore/base_bannerlord.md"

Ollama should be the default local model interface:

ollama:
  base_url: "http://127.0.0.1:11434"
  chat_path: "/v1/chat/completions"
  model: "llama3.1:8b"
  timeout_seconds: 120
  auto_pull_models: true

If the configured chat model is not installed, the Python agent should ask Ollama to download it through /api/pull. If that pull fails and another local model is already installed, the agent may fall back to the first installed model. If the configured embedding model is not installed, the embedding layer should also ask Ollama to pull it; if embeddings remain unavailable, it should fall back to deterministic hashing so memory continues working.

Implementation Phases

Phase 1: Persistent SQLite Memory

Add SQLite-backed memory store.
Add migrations.
Replace in-process fallback list.
Store memory writes across restarts.
Add tests for save-scoped memory isolation.

Status: implemented for basic long-term memories.

Phase 2: Embedded Qdrant Index

Add vector_index.mode = "embedded".
Use QdrantClient(path="./data/qdrant").
Keep SQLite as the canonical record store.
Store Qdrant point IDs on SQLite records.
Add rebuild-index command that recreates embedded Qdrant from SQLite.
Add tests for embedded Qdrant persistence across Python process restarts.

Status: initial embedded Qdrant integration is implemented for memories. Rebuild support exists on MemoryStore; command-line/admin wiring still needs to be added.

Phase 3: Managed Qdrant Server

Add vector_index.mode = "managed_server".
Add a small Qdrant process manager for qdrant.exe.
Check health before starting a new process.
Start Qdrant when autostart is enabled.
Use configured storage/config paths.
Stop the child process on Python agent shutdown.
Fall back to embedded or SQLite-only mode based on config.
Add tests around process command construction and fallback behavior.

Status: initial managed-server scaffolding is implemented, including reachability checks, optional autostart, process shutdown, and fallback to embedded mode. Real bundled-binary packaging still needs to be decided.

Phase 4: Lore Importer

Add markdown lore source config.
Chunk lore by headings.
Store lore_sources and lore_chunks.
Add FTS5 indexing.
Add search endpoint/tool for lore retrieval.

Status: initial markdown lore import and embedded-Qdrant retrieval are implemented for tests. FTS indexing, config-driven file loading, and agent/tool integration still need to be added.

Phase 5: Retrieval Dossier

Add retrieval planner before model calls.
Include NPC profile, memories, world events, lore, tasks, and recent turns.
Enforce token budgets.
Add tests for prompt size limits.

Phase 6: NPC Profiles And Backstories

Add npc_profiles.
Generate first-meeting profiles from game stats, lore, and recent events.
Store generated profile per save.
Add profile update logic after important interactions.

Phase 7: World Events And Tasks

Store world ticks as normalized world events.
Store action results as world events/memories.
Add task records.
Retrieve active tasks for prompts.

Phase 8: Semantic Index Integration

Add local embedding provider.
Add Qdrant client wrapper.
Upsert embeddings for memories, lore, events, summaries, and debates.
Search Qdrant with save/faction/location filters.
Load canonical records from SQLite.

Phase 9: Background Debates

Add background debate summaries.
Store outcomes as world events and memories.
Retrieve debate summaries for diplomacy conversations.

Phase 10: Maintenance Jobs

Summarize old conversation turns.
Decay low-importance memories.
Mark stale events expired.
Rebuild missing embeddings.
Add dashboard/debug views for memory retrieval.

Testing Strategy

Add tests for:

save isolation
memory persistence across MemoryStore instances
lore import chunking
FTS search
Qdrant payload filters
retrieval dossier token limits
first-meeting backstory only generates once per save
world event visibility
task lifecycle
rebuild Qdrant index from SQLite

Current Repo Gaps

The current implementation has a useful scaffold and these remaining gaps:

MemoryStore now persists basic long-term memories in SQLite, but broader tables for profiles, lore, tasks, world events, summaries, and debates still need to be added.
Embedded Qdrant memory indexing and initial managed qdrant.exe supervision exist; bundled-binary packaging/install UX still needs implementation.
Ollama is now the active LLM interface for chat and the preferred embedding provider; the agent can ask Ollama to pull missing chat/embedding models, and deterministic hashing remains as an embedding fallback.
Python returns memory_writes, but C# ConversationResponse does not currently model that field.
Event log persists audit data but is not a full memory system.
Lore can be imported/indexed through the initial LoreStore; config-driven world-file loading and prompt/tool integration still need implementation.
NPC profiles/backstories are not implemented yet.
Qdrant is currently integrated for memories only, not lore/events/summaries/debates yet.

Key Principle

Depth should live in storage and retrieval, not in prompt length.

The local model should receive:

the right 20 facts

not:

every fact the mod has ever seen

That is how LocalDiplomacy can support AI Influence-style depth while remaining practical for local models.

22 KiB Raw Blame History

LocalDiplomacy Memory System Plan

Goal

Architecture

Qdrant Operating Modes

disabled

embedded

managed_server

Data Ownership

C# Mod

Python Agent

Save-Scoped Storage

SQLite Schema Plan

saves

characters

npc_profiles

memories

world_events

tasks

conversation_turns

conversation_summaries

lore_sources

lore_chunks

background_debates

SQLite Indexes

Qdrant Collections

Embeddings

Retrieval Dossier

Token Budgets

Lore Import

First-Meeting Backstory Generation

Memory Write Flow

World Event Flow

Background NPC Debates

Task System

Prompt Construction Rules

Retrieval Scoring

Configuration

Implementation Phases

Phase 1: Persistent SQLite Memory

Phase 2: Embedded Qdrant Index

Phase 3: Managed Qdrant Server

Phase 4: Lore Importer

Phase 5: Retrieval Dossier

Phase 6: NPC Profiles And Backstories

Phase 7: World Events And Tasks

Phase 8: Semantic Index Integration

Phase 9: Background Debates

Phase 10: Maintenance Jobs

Testing Strategy

Current Repo Gaps

Key Principle

22 KiB

Raw Blame History