Skip to content

Architecture

Core design principles

LegionForge is built around five non-negotiable principles. Every module and every code path is shaped by them:

Principle What it means in practice
Fail-safe tiering Halt → sandbox/retry → degrade. Never silently succeed. Errors propagate with intent.
Human gates on all mutations Destructive actions cross a human-in-the-loop boundary by default.
Replace AI with determinism wherever possible The LLM is the last resort, not the first. Rules, tables, and pattern matchers run ahead of model calls.
Validate at trust boundaries, not at processing nodes Sanitize once, at the edge. Internal code trusts internal data.
Privilege tied to tasks, not persistent to agents Capability is scoped to the active task and expires when the task ends.

Key modules

Module Responsibility
config/settings.py Pydantic singleton loaded from a hardware YAML profile. All memory limits, model names, safeguard thresholds, and paths come from here.
src/base_graph.py LangGraph template. Copy this when creating new agents. Wires in three-layer loop protection, token budgeting, per-run tracing toggle, TOCTOU snapshot, and Guardian pre-invocation check automatically.
src/security/core.py API key management via macOS Keychain (no .env secrets), prompt-injection detection (29 patterns, Tier 1/2 tiering), PII redaction. All inputs pass through sanitize_input(); all outputs through sanitize_output().
src/security/guardian.py Guardian FastAPI sidecar on port 9766. See Guardian.
src/safeguards.py Three independent loop-protection layers.
src/database.py Async PostgreSQL pool (admin + restricted app roles), LangGraph AsyncPostgresSaver for checkpoint resumption, pgvector RAG, 16 tables.
src/llm_factory.py Unified factory for Ollama, OpenAI, Anthropic, InceptionLabs. Reads config from the hardware profile. Supports cloud fallback.
src/rate_limiter.py Per-provider rate limits with pre-execution token cost estimation. Hard daily caps with 80% / 100% alert thresholds.
src/gateway/app.py FastAPI gateway on port 8080. Task submission queue, SSE streaming, web UI, A2A + MCP endpoints, Bearer auth.
src/connectors/discord.py Discord bot connector. Bridges !<task> messages → gateway → SSE stream → reply edits. (And similar for Telegram, Slack, WhatsApp.)

Three independent loop-protection layers

A single failure shouldn't let an agent spin forever. Three independent layers must all pass on every step. If any one fires, execution halts and a threat event is logged.

flowchart TB
    Start([Step begins]) --> L1{Step counter<br/>limit reached?}
    L1 -->|yes| H1[HALT<br/>STEP_LIMIT_REACHED]
    L1 -->|no| L2{Action history<br/>signature repeated<br/>3× in last 5 steps?}
    L2 -->|yes| H2[HALT<br/>LOOP_DETECTED]
    L2 -->|no| L3{Token budget<br/>used >= 100%?}
    L3 -->|yes| H3[HALT<br/>TOKEN_BUDGET_EXCEEDED]
    L3 -->|no| Continue([Continue to next step])

    classDef halt fill:#ff4444,stroke:#cc0000,color:#fff
    class H1,H2,H3 halt
Layer Mechanism Threshold
Step counter LangGraph recursion limit Hard stop on N steps
Action-history MD5 hash of the last 5 tool-call signatures Same signature 3× → halt
Token budget Cumulative per-task token usage Alert at 80%, force-end at 100%

See Threat Events for the corresponding event types.

Module map

flowchart TB
    subgraph Edge["Edge layer"]
        Gateway["gateway/app.py<br/>FastAPI :8080"]
        Connectors["connectors/<br/>Discord · Slack · Telegram<br/>WhatsApp · Webhook"]
    end

    subgraph Core["Core layer"]
        Orchestrator["base_graph.py<br/>LangGraph template"]
        Safeguards["safeguards.py<br/>3-layer loop protection"]
        Sanitize["security/core.py<br/>sanitize_input/output"]
        Factory["llm_factory.py<br/>Ollama · OpenAI · Anthropic"]
        Rate["rate_limiter.py<br/>per-provider caps"]
    end

    subgraph GuardianBox["Guardian sidecar"]
        GuardianAPI["security/guardian.py<br/>FastAPI :9766"]
    end

    subgraph Infra["Infrastructure"]
        PG[("PostgreSQL 17<br/>16 tables · pgvector")]
        Ollama["Ollama<br/>llama3.1:8b · qwen2.5:3b"]
        Cloud["Cloud LLMs<br/>OpenAI · Anthropic · InceptionLabs"]
    end

    Connectors --> Gateway
    Gateway --> Orchestrator
    Orchestrator --> Sanitize
    Orchestrator --> Safeguards
    Orchestrator --> Factory
    Factory --> Rate
    Factory --> Ollama
    Factory --> Cloud
    Orchestrator -.->|every tool call| GuardianAPI
    GuardianAPI <--> PG
    Gateway <--> PG
    Orchestrator <--> PG

Request flow

A task submitted to the gateway flows through gateway → worker → orchestrator → Guardian → LLM → tools → response, with checkpoints written along the way so a paused task can be resumed.

sequenceDiagram
    autonumber
    actor User
    participant G as Gateway<br/>(:8080)
    participant W as Worker
    participant O as Orchestrator
    participant Gr as Guardian<br/>(:9766)
    participant L as LLM
    participant T as Tool
    participant DB as PostgreSQL

    User->>G: POST /tasks (Bearer)
    G->>G: Authenticate
    G->>W: Enqueue
    W->>O: run_orchestrator()
    O->>O: sanitize_input()
    O->>DB: checkpoint

    loop For each step (bounded by safeguards)
        O->>L: LLM call (rate-limited)
        L-->>O: tool_calls
        O->>Gr: POST /check
        Gr->>DB: log threat_events (async)
        Gr-->>O: allow / deny
        O->>T: invoke (if allowed)
        T-->>O: result
        O->>DB: checkpoint
    end

    O->>O: sanitize_output()
    O-->>W: result
    W-->>G: SSE events
    G-->>User: stream
    G->>DB: audit_log

The orchestrator never trusts the LLM. Every tool call passes through Guardian; every input and output crosses a sanitization boundary; every step is checkpointed so a failure mid-task is recoverable.

Infrastructure dependencies

Component Purpose
PostgreSQL 17 Database: legionforge. Password in macOS Keychain (service: postgres).
Ollama Local LLM runtime. Primary: llama3.1:8b. Router: qwen2.5:3b. Embeddings: mxbai-embed-large.
Docker Desktop Required for Guardian sidecar.
macOS Keychain All secrets. Never .env for production keys.

Phase status

  • Phases 0–16 — Full security stack, multi-user gateway, integration tests, modular auth, containerized gateway, multi-provider auth registry, Redis-backed state layer, Kerberos GSSAPI backend, multi-instance docker-compose, Redis global budget counters, Prometheus /metrics endpoint, request trace ID middleware, polished web UI, Telegram/Slack/Webhook channel connectors.
  • Phases 60–381 + G1–G4 + H + I + J + HITL — 381-tool operator dashboard, web_fetch_js headless browser, Guardian G1–G4 (PyPI published, public repo live, auto-sync Action), agent memory, dual license (AGPL-3.0 + commercial), session continuity UI, multi-modal image input, HITL approval gate, WhatsApp connector.

Current test baseline:

Suite Count
Smoke 2247
Integration 38
Kerberos live-KDC 5
UI (Playwright) 40
TestLab 104
Tool accuracy 79
Crystallization 114