Database¶
LegionForge uses PostgreSQL 17 with pgvector for the RAG layer and AsyncPostgresSaver for LangGraph checkpoints. The schema is split across 16 tables and uses role-based access so that a compromised component can't escalate.
Role model¶
Five roles, each with the minimum necessary privileges:
| Role | What can it do |
|---|---|
legionforge_admin |
Superuser. Migrations, DDL, table creation. Not used at runtime. |
legionforge_gateway |
Read + write on user-facing tables (gateway_users, tasks, api_usage). |
legionforge_worker |
Read on gateway_users (for auth lookup), read + write on agent state. |
legionforge_guardian |
Read on tool_registry, read + write on threat_rules, threat_events. |
legionforge_readonly |
Read-only across everything. For dashboards and analytics. |
Each role's password is in macOS Keychain under service=<role>, account=api_key.
Key tables¶
| Table | Purpose |
|---|---|
gateway_users |
User accounts. API key hash, quota, status. |
tasks |
Task submissions, status, result. |
api_usage |
Per-task token + cost accounting. Powers rate limiting. |
health_metrics |
Time-series health data (CPU, RAM, GPU, Ollama state). |
documents |
RAG corpus. 768-dim embeddings, HNSW index. |
threat_events |
Structured security events. See Threat Events. |
audit_log |
Append-only event log with SHA-256 hash chain. |
threat_rules |
Adaptive Guardian rules. Hot-reloaded every 10 s. |
tool_registry |
Registered tools with Ed25519 signatures. |
langgraph_checkpoints |
LangGraph's AsyncPostgresSaver state. |
erDiagram
gateway_users ||--o{ tasks : submits
gateway_users ||--o{ api_usage : accumulates
gateway_users ||--o{ connector_identities : maps_to
tasks ||--o{ api_usage : generates
tasks ||--o{ threat_events : "may produce"
tasks ||--o{ audit_log : records
tasks ||--o{ langgraph_checkpoints : checkpoints
tool_registry ||--o{ threat_events : "may flag"
threat_rules ||--o{ threat_events : "rule fires"
gateway_users {
bigint id PK
text name
text api_key_hash
bigint quota_daily
text status
}
tasks {
text id PK
bigint user_id FK
text prompt
text status
jsonb result
timestamp created_at
}
api_usage {
bigint id PK
text task_id FK
bigint user_id FK
text provider
int tokens_in
int tokens_out
numeric cost_usd
}
threat_events {
bigint id PK
text event_type
text task_id
bigint user_id
text severity
jsonb payload
}
audit_log {
bigint id PK
text event_type
jsonb payload
bytea prev_hash
bytea this_hash
}
tool_registry {
text tool_id PK
text name
bytea code_hash
bytea signature
}
threat_rules {
bigint id PK
text rule_name
text match_pattern
text match_field
text action
boolean enabled
}
documents {
bigint id PK
text content
vector embedding
jsonb metadata
}
connector_identities {
text connector
text platform_user_id
bigint gateway_user_id FK
}
langgraph_checkpoints {
text checkpoint_id PK
text task_id FK
jsonb state
timestamp created_at
}
RAG with pgvector¶
The documents table stores text chunks with 768-dimensional embeddings (produced by mxbai-embed-large). It uses an HNSW index for fast approximate nearest-neighbor search:
Query pattern from the framework:
results = await db.fetch("""
SELECT id, content, 1 - (embedding <=> $1) AS similarity
FROM documents
ORDER BY embedding <=> $1
LIMIT 5
""", query_embedding)
LangGraph checkpoints¶
AsyncPostgresSaver writes graph state to langgraph_checkpoints after every node. A task can be paused (for HITL approval) and resumed because every step is checkpointed.
The checkpoint table is large (one row per graph step per task) and rotates older entries based on audit_log_retention_days in the hardware profile.
Audit hash chain¶
audit_log is append-only and uses a SHA-256 hash chain:
CREATE TABLE audit_log (
id BIGSERIAL PRIMARY KEY,
timestamp TIMESTAMPTZ NOT NULL DEFAULT now(),
event_type TEXT NOT NULL,
payload JSONB NOT NULL,
prev_hash BYTEA,
this_hash BYTEA NOT NULL
);
Each row's this_hash = SHA256(prev_hash || event_type || payload || timestamp). Tampering with any row breaks the chain at every subsequent row, which is detectable in a single pass.
Initialization¶
make db-init # one-time: create database, roles, tables, extensions
make db-start # start PostgreSQL service
db-init is idempotent. Running it twice is safe.
Connection pools¶
The framework holds three separate asyncpg pools:
admin_pool # legionforge_admin — used only by migrations
gateway_pool # legionforge_gateway — used by FastAPI gateway
worker_pool # legionforge_worker — used by agent workers
Routing by pool enforces the role separation at the connection layer, not just at SQL grant time. Trying to insert into gateway_users from a worker fails because the worker pool's role has no INSERT privilege there.
Common operations¶
Reset Guardian role password
Inspect threat events
Verify audit-chain integrity
Use the make audit-verify target — it walks the chain and reports any broken links.