Architecture¶

Core design principles¶

LegionForge is built around five non-negotiable principles. Every module and every code path is shaped by them:

Principle	What it means in practice
Fail-safe tiering	Halt → sandbox/retry → degrade. Never silently succeed. Errors propagate with intent.
Human gates on all mutations	Destructive actions cross a human-in-the-loop boundary by default.
Replace AI with determinism wherever possible	The LLM is the last resort, not the first. Rules, tables, and pattern matchers run ahead of model calls.
Validate at trust boundaries, not at processing nodes	Sanitize once, at the edge. Internal code trusts internal data.
Privilege tied to tasks, not persistent to agents	Capability is scoped to the active task and expires when the task ends.

Key modules¶

Module	Responsibility
`config/settings.py`	Pydantic singleton loaded from a hardware YAML profile. All memory limits, model names, safeguard thresholds, and paths come from here.
`src/base_graph.py`	LangGraph template. Copy this when creating new agents. Wires in three-layer loop protection, token budgeting, per-run tracing toggle, TOCTOU snapshot, and Guardian pre-invocation check automatically.
`src/security/core.py`	API key management via macOS Keychain (no `.env` secrets), prompt-injection detection (29 patterns, Tier 1/2 tiering), PII redaction. All inputs pass through `sanitize_input()`; all outputs through `sanitize_output()`.
`src/security/guardian.py`	Guardian FastAPI sidecar on port 9766. See Guardian.
`src/safeguards.py`	Three independent loop-protection layers.
`src/database.py`	Async PostgreSQL pool (admin + restricted app roles), LangGraph `AsyncPostgresSaver` for checkpoint resumption, pgvector RAG, 16 tables.
`src/llm_factory.py`	Unified factory for Ollama, OpenAI, Anthropic, InceptionLabs. Reads config from the hardware profile. Supports cloud fallback.
`src/rate_limiter.py`	Per-provider rate limits with pre-execution token cost estimation. Hard daily caps with 80% / 100% alert thresholds.
`src/gateway/app.py`	FastAPI gateway on port 8080. Task submission queue, SSE streaming, web UI, A2A + MCP endpoints, Bearer auth.
`src/connectors/discord.py`	Discord bot connector. Bridges `!<task>` messages → gateway → SSE stream → reply edits. (And similar for Telegram, Slack, WhatsApp.)

Three independent loop-protection layers¶

A single failure shouldn't let an agent spin forever. Three independent layers must all pass on every step. If any one fires, execution halts and a threat event is logged.

flowchart TB
    Start([Step begins]) --> L1{Step counter<br/>limit reached?}
    L1 -->|yes| H1[HALT<br/>STEP_LIMIT_REACHED]
    L1 -->|no| L2{Action history<br/>signature repeated<br/>3× in last 5 steps?}
    L2 -->|yes| H2[HALT<br/>LOOP_DETECTED]
    L2 -->|no| L3{Token budget<br/>used >= 100%?}
    L3 -->|yes| H3[HALT<br/>TOKEN_BUDGET_EXCEEDED]
    L3 -->|no| Continue([Continue to next step])

    classDef halt fill:#ff4444,stroke:#cc0000,color:#fff
    class H1,H2,H3 halt

Layer	Mechanism	Threshold
Step counter	LangGraph recursion limit	Hard stop on N steps
Action-history	MD5 hash of the last 5 tool-call signatures	Same signature 3× → halt
Token budget	Cumulative per-task token usage	Alert at 80%, force-end at 100%

See Threat Events for the corresponding event types.

Module map¶

flowchart TB
    subgraph Edge["Edge layer"]
        Gateway["gateway/app.py<br/>FastAPI :8080"]
        Connectors["connectors/<br/>Discord · Slack · Telegram<br/>WhatsApp · Webhook"]
    end

    subgraph Core["Core layer"]
        Orchestrator["base_graph.py<br/>LangGraph template"]
        Safeguards["safeguards.py<br/>3-layer loop protection"]
        Sanitize["security/core.py<br/>sanitize_input/output"]
        Factory["llm_factory.py<br/>Ollama · OpenAI · Anthropic"]
        Rate["rate_limiter.py<br/>per-provider caps"]
    end

    subgraph GuardianBox["Guardian sidecar"]
        GuardianAPI["security/guardian.py<br/>FastAPI :9766"]
    end

    subgraph Infra["Infrastructure"]
        PG[("PostgreSQL 17<br/>16 tables · pgvector")]
        Ollama["Ollama<br/>llama3.1:8b · qwen2.5:3b"]
        Cloud["Cloud LLMs<br/>OpenAI · Anthropic · InceptionLabs"]
    end

    Connectors --> Gateway
    Gateway --> Orchestrator
    Orchestrator --> Sanitize
    Orchestrator --> Safeguards
    Orchestrator --> Factory
    Factory --> Rate
    Factory --> Ollama
    Factory --> Cloud
    Orchestrator -.->|every tool call| GuardianAPI
    GuardianAPI <--> PG
    Gateway <--> PG
    Orchestrator <--> PG

Request flow¶

A task submitted to the gateway flows through gateway → worker → orchestrator → Guardian → LLM → tools → response, with checkpoints written along the way so a paused task can be resumed.

sequenceDiagram
    autonumber
    actor User
    participant G as Gateway<br/>(:8080)
    participant W as Worker
    participant O as Orchestrator
    participant Gr as Guardian<br/>(:9766)
    participant L as LLM
    participant T as Tool
    participant DB as PostgreSQL

    User->>G: POST /tasks (Bearer)
    G->>G: Authenticate
    G->>W: Enqueue
    W->>O: run_orchestrator()
    O->>O: sanitize_input()
    O->>DB: checkpoint

    loop For each step (bounded by safeguards)
        O->>L: LLM call (rate-limited)
        L-->>O: tool_calls
        O->>Gr: POST /check
        Gr->>DB: log threat_events (async)
        Gr-->>O: allow / deny
        O->>T: invoke (if allowed)
        T-->>O: result
        O->>DB: checkpoint
    end

    O->>O: sanitize_output()
    O-->>W: result
    W-->>G: SSE events
    G-->>User: stream
    G->>DB: audit_log

The orchestrator never trusts the LLM. Every tool call passes through Guardian; every input and output crosses a sanitization boundary; every step is checkpointed so a failure mid-task is recoverable.

Infrastructure dependencies¶

Component	Purpose
PostgreSQL 17	Database: `legionforge`. Password in macOS Keychain (`service: postgres`).
Ollama	Local LLM runtime. Primary: `llama3.1:8b`. Router: `qwen2.5:3b`. Embeddings: `mxbai-embed-large`.
Docker Desktop	Required for Guardian sidecar.
macOS Keychain	All secrets. Never `.env` for production keys.

Phase status¶

Phases 0–16 — Full security stack, multi-user gateway, integration tests, modular auth, containerized gateway, multi-provider auth registry, Redis-backed state layer, Kerberos GSSAPI backend, multi-instance docker-compose, Redis global budget counters, Prometheus /metrics endpoint, request trace ID middleware, polished web UI, Telegram/Slack/Webhook channel connectors.
Phases 60–381 + G1–G4 + H + I + J + HITL — 381-tool operator dashboard, web_fetch_js headless browser, Guardian G1–G4 (PyPI published, public repo live, auto-sync Action), agent memory, dual license (AGPL-3.0 + commercial), session continuity UI, multi-modal image input, HITL approval gate, WhatsApp connector.

Current test baseline:

Suite	Count
Smoke	2247
Integration	38
Kerberos live-KDC	5
UI (Playwright)	40
TestLab	104
Tool accuracy	79
Crystallization	114