Security Model¶
LegionForge's security model is built on a simple thesis: the LLM is not trustworthy. Everything that crosses a trust boundary is validated by deterministic code before the LLM ever sees it, and everything the LLM produces is validated again before it has any effect.
Trust boundaries¶
There are three boundaries in the system. Each is the only place validation should happen:
flowchart LR
UserIn([User input]) --> B1{{"Boundary 1<br/>sanitize_input()<br/>injection check"}}
B1 --> Internal["Internal processing<br/>(trusts data)"]
Internal --> B2{{"Boundary 2<br/>Guardian<br/>7 deterministic checks"}}
B2 --> Tools["Tool execution"]
Tools --> Internal
Internal --> B3{{"Boundary 3<br/>sanitize_output()<br/>PII redaction"}}
B3 --> UserOut([User output])
classDef boundary fill:#0d1117,stroke:#00ff88,color:#00ff88,stroke-width:2px
class B1,B2,B3 boundary
Validating at processing nodes is a footgun: every node has to know about every threat, and a missed validation means the threat slips through. Validating at boundaries means there are exactly three places to audit.
The security stack¶
| Layer | What it does | Where it lives |
|---|---|---|
| Input sanitization | Prompt-injection detection (29 patterns, Tier 1/2 tiering), PII redaction | src/security/core.py |
| Output sanitization | Strips secrets and PII before logging to LangSmith or returning to user | src/security/core.py |
| Tool signing | Every registered tool has an Ed25519 signature; the signature is verified before invocation | src/security/ |
| Guardian (sidecar) | 7-check pipeline on every tool call. See Guardian. | src/security/guardian.py + Docker container |
| Loop protection | Three independent layers — step counter, action-history hash, token budget | src/safeguards.py |
| Rate limiting | Per-provider hard caps with pre-execution cost estimation | src/rate_limiter.py |
| HITL approval gate | Destructive tool calls cross a human-in-the-loop approval | src/gateway/ + web UI |
| Audit chain | Every event logged to audit_log with SHA-256 hash chain |
src/database.py |
Prompt-injection detection¶
sanitize_input() runs every incoming string through a regex-based detector with 29 patterns split into two tiers:
- Tier 1 — high-confidence patterns. Match → reject immediately, log
INJECTION_DETECTED. - Tier 2 — heuristic patterns. Match → flag for review, downgrade trust on that input.
The detector is deterministic. No LLM is in the hot path. Adding patterns is an append-only operation.
Tool signing¶
Every tool registered with the framework has an Ed25519 signature stored in PostgreSQL. The signing key lives in macOS Keychain as legionforge_tool_signer and is injected as an env var at startup.
On every tool invocation:
- The framework looks up the tool's stored hash and signature
- Verifies the signature with the public key
- Hashes the loaded tool code and compares to the stored hash
- Mismatch → halt with
TOOL_HASH_MISMATCHthreat event
This catches supply-chain attacks where a dependency is replaced after registration.
Guardian sidecar¶
Guardian is a separate FastAPI service (port 9766) that runs in a Docker container. It performs 7 deterministic checks on every tool invocation in under 5 ms:
- Tool revocation list
- Hash validation
- Capability boundary
- Destructive pattern detection
- Sequence contracts
- Ed25519 signature verification
- Adaptive threat rules (hot-reload every 10 s from
threat_rulestable)
Guardian is the only component that can authorize a tool call. The framework calls Guardian; Guardian responds approve / deny.
See Guardian for the full design.
Loop protection¶
Three layers, independent of each other:
| Layer | Trigger | Threat event |
|---|---|---|
| Step counter | LangGraph recursion limit exceeded | STEP_LIMIT_REACHED |
| Action-history hash | Same tool-call signature seen 3 times in last 5 steps | LOOP_DETECTED |
| Token budget | Exceeded the per-task token budget | TOKEN_BUDGET_EXCEEDED |
A single malfunction shouldn't loop forever. All three must pass for execution to continue.
HITL approval gate¶
The framework classifies tool calls as read or mutate. Mutate operations (sending email, deleting files, posting to Slack, etc.) require explicit human approval through the web UI before execution.
The approval flow:
- Agent decides to call a mutating tool
- Gateway pauses the task, emits an SSE event with the proposed action
- Web UI displays the proposed action with full context
- Human clicks approve or deny
- Gateway resumes the task with the decision
Approval is logged in audit_log with the human user ID.
Audit chain¶
Every meaningful event — task submission, LLM call, tool call, threat event, approval — is written to audit_log with a SHA-256 hash chain. Each row's prev_hash field is the SHA-256 of the previous row's content. Tampering with any row breaks the chain.
This isn't blockchain-grade tamper-proofing (it's just a hash chain), but it's enough to detect after-the-fact rewriting and to give compliance auditors a verifiable trail.
What this catches — and what it doesn't¶
LegionForge's security model is designed for these threats:
- Prompt injection from user input or tool output
- Tool poisoning (compromised dependency replacing a tool's code)
- Capability creep (agent calling a tool outside its task scope)
- Runaway behavior (infinite loops, token bombs)
- PII leakage in logs and traces
- Unauthorized destructive operations
It is not designed for:
- Defending against a malicious human operator who has gateway credentials. Bearer auth gates entry; access control inside the gateway assumes the operator is authorized.
- Side-channel attacks on local LLM weights (model integrity is checked at load, but not at every inference).
- Physical access to the machine.
For a deeper look at any of these, see Guardian → Architecture and Threat Events.