// the problem

Prompting Is Not Enforcement

Today's AI pentest agents run with system prompts that say "stay in scope." But system prompts are suggestions, not guardrails. The research is clear:

23.9% of agent actions are risky, even with explicit safety instructions ToolEmu, Ruan et al. ICLR 2024

27.5% of risky situations missed by GPT-4's safety awareness R-Judge, Yuan et al. EMNLP 2024

0 prompt-based defenses achieve both high utility AND high security AgentDojo, Debenedetti et al. NeurIPS 2024

<60% safety score achieved by any of 16 LLM agents tested; defense prompts alone insufficient Agent-SafetyBench, Zhang et al. 2025

        
        
        
        unconstrained_agent.log
      
# System prompt says: "Only test app.corp.local"
# Agent decided to "be thorough"...

[ALLOW] nmap -sV app.corp.local          # in scope
[ALLOW] sqlmap -u app.corp.local/search  # in scope
[SCOPE VIOLATION] nmap -sV 10.0.2.0/24   # prod database tier!
[SCOPE VIOLATION] psql -h 10.0.2.5       # direct DB access!
[SCOPE VIOLATION] DROP TABLE users;      # catastrophic

# The agent had good intentions. It just had no enforced boundaries.

// the solution

A Reference Monitor the Agent Cannot Bypass

ROE Gate sits between the agent and every tool it uses. Every action is serialized, evaluated against your ROE policy, and cryptographically signed before execution. The agent never touches the signing keys.

Action Intent Serializer

Every tool call gets converted to a structured ActionIntent. Tool-agnostic, machine-readable, auditable. 24 action categories cover every pentest technique.

ROE Specification Language

Human-readable YAML that defines scope, allowed actions, denied actions, schedules, data handling, and emergency procedures. Your contract, as code.

Deterministic Rule Engine

Eight evaluation checks in strict priority order: schedule, scope (IP/domain/service), hard-deny, approval gates, action constraints, hard-allow, and fallback. No ambiguity.

Isolated Judge LLM

A separate LLM instance with no agent context evaluates edge cases. It sees only the action and the policy. Immune to prompt injection from the agent. Supports any judge provider: Anthropic, OpenAI, Google Gemini, Ollama, AWS Bedrock, local Transformers, llama.cpp, and any OpenAI-compatible endpoint.

Cryptographic Action Signing

Approved actions get HMAC-SHA256 tokens with 30-second TTL, single-use enforcement, and ROE-hash binding. The agent cannot forge, replay, or reuse tokens.

Signature-Enforcing Executor

Tools only run after 6-step token verification: signature, expiration, replay check, ROE hash, action match, and tool whitelist. No valid token = no execution.

ROE Creator Dashboard

Visual drag-and-drop ROE specification builder. Define scope, actions, constraints, and schedule through a beautiful web interface. Export valid YAML instantly.

// how it works

Three Stages. Every Action.

Agent Requests Action

The agent calls an MCP tool (e.g., roe_nmap_scan). The tool call is serialized into a structured ActionIntent and sent to the Gate Service over HTTP.

Gate Evaluates

The deterministic Rule Engine checks scope, schedule, and policy. Edge cases go to the isolated Judge LLM. Hard denials are instant. No LLM call needed.

Signed Token Authorizes

If approved, the Gate signs an HMAC-SHA256 token (30s TTL, single-use). The Tool Executor verifies the token before running the command. No token = no execution.

        
        
        
        gate_evaluation.log
      
# Agent calls MCP tool: roe_nmap_scan(host="192.168.100.10", ports="80,443")
# Session: pentest-ENG-2025-001-a4f8c912

[Rule Engine] Schedule check .............. PASS  within valid_from/valid_until window
[Rule Engine] Target 192.168.100.10 scope . PASS  in 192.168.100.0/24
[Rule Engine] PORT_SCANNING allowed ....... PASS  rolls up to RECONNAISSANCE
[Rule Engine] Verdict: HARD_ALLOW
[Judge LLM]  Skipped (deterministic allow)
[Signer]     Token: hmac-sha256:a3f8...c912  TTL: 30s  ROE: 7b2a...e1f4
[Executor]   Signature valid ✓  TTL valid ✓  Replay check ✓
[Executor]   Running: nmap -sV -p 80,443 192.168.100.10
[RESULT]     PORT    STATE  SERVICE  VERSION
              80/tcp  open   http     nginx/1.24
              443/tcp open   https    nginx/1.24

# Agent calls MCP tool: roe_shell_command(command="psql -h 192.168.200.5")

[Rule Engine] Target 192.168.200.5 scope .. FAIL  in out_of_scope 192.168.200.0/24
[Rule Engine] Verdict: HARD_DENY (Production database tier)
[BLOCKED]    No token issued. No command executed.
              Denial logged. Consecutive denials: 1/3 before auto-halt.
[Alert]      Slack notification sent (level: WARNING)

// real-time audit dashboard

7-Tab Dashboard. Complete Operational Visibility.

Every evaluation, every decision, every token — live in your browser. Filter by verdict, search by tool or target, drill into rule engine and judge reasoning, generate compliance reports, and manage ROE specs. All from a single pane of glass.

ROE Gate Dashboard — real-time audit with 7 evaluations, category breakdowns, and decision log showing ALLOW and DENY verdicts

ROE Gate Dashboard — Compliance tab showing SOC 2 Type II report with all controls passing

Compliance — SOC 2 Type II report with evidence

ROE Gate Dashboard — ROE Scope tab showing in-scope and out-of-scope networks, domains, and schedule

ROE Scope — networks, domains, and schedule

ROE Gate Dashboard — Alerts tab showing Slack connected, webhook count, and min level

Alerts — Slack and webhook status

// security properties

Guarantees, Not Suggestions

> Mandatory Mediation

Every pentest tool call must pass through the gate. Four-tier PreToolUse hooks detect network targets in ANY command — not just known tools. IPs, URLs, CIDRs, hostnames, /dev/tcp patterns, and embedded tool names are caught even in Python scripts and custom binaries.

> Complete Mediation

No action bypasses evaluation. The deterministic Rule Engine processes every request. If the Judge LLM goes down, hard rules still enforce scope.

> Tamper Resistance

HMAC-SHA256 tokens with ROE-hash binding. The agent never holds signing keys. Tokens expire in 30 seconds and cannot be replayed. Forging requires the secret key.

> Verifiable Compliance

Every evaluation is logged with full context: action, policy check, verdict, token, and execution result. Complete audit trail for compliance and post-engagement review.

// roe specification language

Your Rules of Engagement, as Code

Define scope, allowed techniques, denied actions, schedule windows, data handling, and emergency procedures in human-readable YAML. The ROE spec becomes the machine-enforceable contract between your team and the AI agent.

        
        
        
        acme_corp_roe.yaml
      
roe:
  metadata:
    engagement_id: "ENG-2025-001"
    client:        "CorpSec Labs"
    approved_by:   "Jane Smith, CISO"
    created:       "2025-06-15T09:00:00Z"
    version:       1

  schedule:
    valid_from:  "2025-06-15T00:00:00+00:00"
    valid_until: "2025-09-15T23:59:59+00:00"
    timezone:    "America/New_York"

  scope:
    in_scope:
      networks:
        - cidr: "192.168.100.0/24"      # Web app staging
          ports: [80, 443, 8080]
        - cidr: "192.168.101.0/24"      # API staging
          ports: [8080, 8443]
      domains:
        - pattern: "*.corp.local"
          include_subdomains: true
    out_of_scope:
      networks:
        - cidr: "192.168.200.0/24"      # Prod DB — BLOCKED
          reason: "Production database tier"
      services:
        - type: "database"
          protocols: [postgresql, mysql, redis]

  actions:
    allowed:
      - category: "reconnaissance"
        methods: [port_scan, dns_enumeration, service_enumeration]
      - category: "web_application_testing"
        methods: [sql_injection, xss, csrf, ssrf, idor]
    denied:
      - category: "denial_of_service"
      - category: "data_exfiltration"
      - category: "direct_database_access"

  data_handling:
    pii_encountered: "hash_and_log_metadata_only"
    credentials_found: "log_existence_only_no_values"

  emergency:
    kill_switch: true
    max_consecutive_denials: 3

// roe creator dashboard

Build Your ROE Visually. No YAML Required.

Don't want to write YAML by hand? The ROE Creator Dashboard gives you a visual form-based builder with live preview. Define scope, actions, schedule, and constraints through a web interface. Export valid YAML instantly.

→

Live YAML Preview

See your ROE specification update in real time as you fill in the form. Syntax-highlighted, always valid.

→

Import & Edit

Already have a YAML file? Import it into the form, make changes visually, then export the updated version.

→

Built-In Validation

Validates CIDR notation, date formats, required fields, and structural completeness before you export. No broken specs.

// white paper

ROE Gate: Out-of-Band Enforcement for Autonomous Pentest Agents

Abstract

LLM-based agents are being deployed for autonomous penetration testing, but nobody has solved the constraint problem. These agents run on system-prompt "guardrails" that don't actually guard anything. Research shows agents take risky actions 23.9% of the time even with explicit safety instructions. GPT-4 misses 27.5% of risky situations entirely. ROE Gate is the first reference monitor built for this problem: out-of-band evaluation, cryptographic action signing, and an isolated judge LLM that can't be prompt-injected by the agent it's evaluating.

The Problem: Prompt-Based Safety Is Not Enforcement

Current approaches to constraining AI pentest agents all fail in practice:

System Prompt Instructions. Telling the model "do not scan out-of-scope targets" gives you zero enforcement. The model ignores these instructions under prompt injection, context window overflow, or just because "being helpful" wins out over "follow the rules."

Output Filtering. Content filters check model outputs after the fact, but they can't stop tool execution. By the time the filter flags something, the agent already ran the command.

Self-Critique / Constitutional AI. Having the same model evaluate its own actions is circular. If the model gets prompt-injected, the self-critic is just as compromised. Same model, same context, no isolation.

General-Purpose Policy Engines. Systems like OPA/Rego can evaluate policies but know nothing about LLM agent actions, can't semantically evaluate edge cases, and have no cryptographic binding between policy approval and tool execution.

The ROE Gate Architecture

ROE Gate implements the reference monitor pattern (Anderson, 1972). Every access to a protected resource must pass through a tamper-resistant mediator. The gate runs as a separate process from the agent with its own signing keys and its own evaluation context. The agent can't modify it, bypass it, or influence its decisions.

Six components:

1. Action Intent Serializer. Every tool call from the agent gets converted to a structured ActionIntent, a tool-agnostic intermediate representation with the action category, target, parameters, and impact assessment. Same policy governs any tool regardless of syntax.

2. ROE Specification Language (ROE-SL). A YAML-based domain-specific language for defining penetration testing rules of engagement. Covers scope (IP ranges, domains, ports), allowed and denied action categories, schedule windows, data handling requirements, and emergency procedures. The ROE spec is SHA-256 hashed and bound to every signed token.

3. Deterministic Rule Engine. Eight evaluation checks in strict priority order provide fast, predictable enforcement. Schedule violations, out-of-scope targets, and hard-denied categories are rejected instantly without needing an LLM call.

4. Isolated Judge LLM. A separate LLM instance evaluates edge cases that the deterministic engine can't resolve. The judge only sees the action and the policy. Never the agent's conversation context. This isolation stops prompt injection from propagating through the evaluation chain.

5. Cryptographic Action Signer. Approved actions receive cryptographically signed tokens (HMAC-SHA256 or Ed25519) with 30-second TTL, single-use nonces, canonical JSON serialization, and ROE-hash binding. Only the Gate Service holds the signing keys. The agent never has access. Ed25519 asymmetric signing allows auditors to verify tokens with only the public key.

6. Signature-Enforcing Tool Executor. A verification proxy that performs six checks before executing any tool: signature validity, token expiration, replay detection, ROE hash match, action/token correspondence, and tool whitelist membership.

Why Not Just Use Guardrails?

Existing guardrail systems (NeMo Guardrails, Guardrails AI, etc.) operate at the wrong layer. They filter LLM outputs, checking whether the text the model generates is safe. ROE Gate operates at the tool execution layer, checking whether the action the model wants to perform is authorized. Output filtering happens after the decision. Tool-call gating happens before execution.

ROE Gate also provides cryptographic proof that an action was evaluated and approved. No existing guardrail system does this. The signed token creates a verifiable chain of custody: policy → evaluation → approval → execution, each step cryptographically bound to the others.

Prior Art Comparison

          
          
          
          prior_art_comparison
        
System                    ROE-SL   Rules   Judge   Crypto   Audit
─────────────────────────────────────────────────────────────────────
NVIDIA NeMo Guardrails     ✗        ~       ✗       ✗        ~
Guardrails AI              ✗        ~       ✗       ✗        ✗
OPA / Rego                 ✗        ✓       ✗       ✗        ~
Constitutional AI          ✗        ✗       ~       ✗        ✗
GuardAgent (Xiang 2024)    ✗        ~       ✓       ✗        ~
AgentSpec (Wang, ICSE 2026)✗        ✓       ✗       ✗        ~
Pentera / XM Cyber         ✗        ~       ✗       ✗        ✓
ROE Gate                   ✓        ✓       ✓       ✓        ✓

Key Research References

Anderson, J.P. (1972). "Computer Security Technology Planning Study." The original reference monitor definition.

Ruan et al. (2024). "ToolEmu: Identifying the Risks of LM Agents with an LM-Emulated Sandbox." ICLR 2024 Spotlight. Agents take risky actions 23.9% of the time.

Yuan et al. (2024). "R-Judge: Benchmarking Safety Risk Awareness for LLM Agents." EMNLP Findings 2024. GPT-4 achieves only 72.52% safety risk awareness (F1).

Debenedetti et al. (2024). "AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents." NeurIPS 2024 D&B. No prompt-based defense achieves both high utility and high security.

Zhang et al. (2025). "Agent-SafetyBench: Evaluating the Safety of LLM Agents." arXiv:2412.14470. None of 16 LLM agents achieves a safety score above 60%; defense prompts alone are insufficient.

Wang et al. (2026). "AgentSpec: Customizable Runtime Enforcement for Safe and Reliable LLM Agents." ICSE 2026. Validates the need for external runtime enforcement with formal specifications.

Dalrymple et al. (2024). "Guaranteed Safe AI via Quantitative Safety Guarantees." World model + safety spec + verifier framework that validates the ROE Gate approach.

// all features included

Everything Built In. Self-Hosted. Your Infrastructure.

ROE Gate runs entirely on your infrastructure. No cloud dependency, no data leaving your network. Every feature module is open source and included. Free for internal security teams.

Core Engine

Gate Pipeline

The full three-stage evaluation pipeline with cryptographic enforcement.

Deterministic rule engine (8 checks)
All judge LLM providers (Anthropic, OpenAI, Gemini, Ollama, Bedrock, local)
HMAC-SHA256 & Ed25519 token signing
7 MCP pentest tools
Multi-vendor agent support
Human-in-the-loop approval mode
ROE Creator Dashboard
ROE Spec Validator (schema + semantic + coverage)

7-Tab Dashboard

Real-time audit dashboard with full operational visibility across every module.

Decision Log with click-to-expand detail drawer
ROE Scope viewer
Trends — time-bucketed allow/deny charts
Compliance — generate SOC 2 & PCI-DSS reports inline
ROE Management — add, view, archive ROE specs
Alerts — Slack/webhook status and configuration
Settings — HA cluster, branding, tenants

Feature Modules

All 7 feature modules with full CLI, config file, and dashboard support.

Multi-ROE management (--roe-dir)
Slack & webhook alerting (--slack-webhook)
Role-based access control (--rbac)
SOC 2 & PCI-DSS compliance reports (roe-gate compliance)
HA clustering with leader election (--ha-peers)
Multi-tenant isolation
White-label branding (--branding-config)

Licensing

Free for individuals, researchers, and internal security teams testing their own infrastructure. Consultancies and vendors need a license.

All features included — nothing gated
Free: internal security teams, researchers, students, CTFs
License required: security consultancies testing client systems
License required: MSSPs, MDR providers, managed pentest services
License required: vendors embedding or white-labeling ROE Gate
Custom integrations and dedicated support available

// who it's for

Built for the Teams That Need It Most

CISO / Security Leader

Approve AI Pentesting Without the Risk

Your board wants AI-driven security testing. You need proof it won't go rogue. ROE Gate gives you cryptographic audit trails and policy-enforced boundaries that satisfy compliance and your sleep schedule.

"I can sign off on autonomous testing because every action is gated, logged, and provably within scope."

Internal Security Team

Let AI Agents Handle the Grind

You're running continuous testing on your own infrastructure and burning out on repetitive recon. Deploy an AI agent with ROE Gate and let it handle the mechanical work, with the same ROE discipline you'd expect from a human tester. Free for internal use.

"The agent runs nmap, tests for SQLi, enumerates APIs. All within scope, all signed. I review findings instead of babysitting."

Security Consultancy / MSSP

Scale Client Engagements Without Scaling Headcount

Your clients want continuous testing but you can't hire fast enough. License ROE Gate and run gated AI agents at scale. Each client gets their own ROE spec, audit trail, and compliance report. Contact us to discuss licensing.

"We went from 20 engagements a quarter to 200. Same team. Every one compliant."

// integrations

Works With Your Stack

ROE Gate is model-agnostic and tool-agnostic. The tester agent can be any LLM (Anthropic API, OpenAI API, Claude Code, or any OpenAI-compatible provider), and the judge can be any supported provider. All providers are included.

Claude Code

OpenAI

LangChain

MCP

AutoGPT

CrewAI

        
        
        
        quickstart.sh
      
# 1. Install
$ pip install roe-gate
$ pip install roe-gate[anthropic]  # optional: Claude judge support

# 2. Create your ROE
$ roe-gate creator                # visual builder at :19990/roe-creator
#    Build your spec → Download YAML → save as my_engagement.yaml
#    (or use the included example: examples/acme_corp_roe.yaml)

# 3. Configure (optional — or use CLI flags directly)
$ cp examples/roe_gate_config.yaml my_config.yaml
$ vim my_config.yaml              # set judge, model, gate settings

# 4. Validate your ROE spec
$ roe-gate validate my_engagement.yaml

# 5. Run with your agent (Claude Code, OpenAI, Anthropic API, any LLM)
$ roe-gate pentest --config my_config.yaml --dashboard

# 6. Generate compliance reports
$ roe-gate compliance --roe my_engagement.yaml --format soc2

# Dashboard at :19990/dashboard (7 tabs)
# Config supports: roe_dir, ha_peers, alert_min_level, branding
# Every action gated. Every decision logged. Every token signed.

// intellectual property

Patent Pending

> NOTICE

The ROE Gate system and method for out-of-band enforcement of rules of engagement on autonomous security testing agents is the subject of U.S. Provisional Patent Application No. 63/993,983, filed under 35 U.S.C. §111(b).

Application No. 63/993,983 • Filed: March 1, 2026 • Inventor: Richard Roane, Jr.

ROE Gate is licensed under MIT. Free for individuals, researchers, and internal security teams testing their own infrastructure. Security consultancies testing client systems, MSSPs, and vendors embedding or white-labeling ROE Gate require a separate license. Contact us to discuss.

// contact

Get in Touch

Security consultancy, MSSP, or vendor looking to license ROE Gate? Have questions about integration? Drop us a line.

We'll follow up within one business day.

> Message sent. We'll be in touch shortly.

Your AI Pentest Agent Has No Rules. Until Now.