ROE Gate Logo

Your AI Pentest Agent Has No Rules. Until Now.

The first out-of-band reference monitor for autonomous security testing agents. Enforce Rules of Engagement with cryptographic guarantees. Not prompts.

Prompting Is Not Enforcement

Today's AI pentest agents run with system prompts that say "stay in scope." But system prompts are suggestions, not guardrails. The research is clear:

23.9% of agent actions are risky, even with explicit safety instructions ToolEmu, Ruan et al. ICLR 2024
27.5% of risky situations missed by GPT-4's safety awareness R-Judge, Yuan et al. EMNLP 2024
0 prompt-based defenses achieve both high utility AND high security AgentDojo, Debenedetti et al. NeurIPS 2024
<60% safety score achieved by any of 16 LLM agents tested; defense prompts alone insufficient Agent-SafetyBench, Zhang et al. 2025
unconstrained_agent.log
# System prompt says: "Only test app.corp.local" # Agent decided to "be thorough"... [ALLOW] nmap -sV app.corp.local # in scope [ALLOW] sqlmap -u app.corp.local/search # in scope [SCOPE VIOLATION] nmap -sV 10.0.2.0/24 # prod database tier! [SCOPE VIOLATION] psql -h 10.0.2.5 # direct DB access! [SCOPE VIOLATION] DROP TABLE users; # catastrophic # The agent had good intentions. It just had no enforced boundaries.

A Reference Monitor the Agent Cannot Bypass

ROE Gate sits between the agent and every tool it uses. Every action is serialized, evaluated against your ROE policy, and cryptographically signed before execution. The agent never touches the signing keys.

1

Action Intent Serializer

Every tool call gets converted to a structured ActionIntent. Tool-agnostic, machine-readable, auditable. 21 action categories cover every pentest technique.

2

ROE Specification Language

Human-readable YAML that defines scope, allowed actions, denied actions, schedules, data handling, and emergency procedures. Your contract, as code.

3

Deterministic Rule Engine

Eight evaluation checks in strict priority order: schedule, scope (IP/domain/service), hard-deny, approval gates, action constraints, hard-allow, and fallback. No ambiguity.

4

Isolated Judge LLM

A separate LLM instance with no agent context evaluates edge cases. It sees only the action and the policy. Immune to prompt injection from the agent. Supports any judge provider: Anthropic, OpenAI, Google Gemini, Ollama, AWS Bedrock, local Transformers, llama.cpp, and any OpenAI-compatible endpoint.

5

Cryptographic Action Signing

Approved actions get HMAC-SHA256 tokens with 30-second TTL, single-use enforcement, and ROE-hash binding. The agent cannot forge, replay, or reuse tokens.

6

Signature-Enforcing Executor

Tools only run after 6-step token verification: signature, expiration, replay check, ROE hash, action match, and tool whitelist. No valid token = no execution.

+

ROE Creator Dashboard

Visual drag-and-drop ROE specification builder. Define scope, actions, constraints, and schedule through a beautiful web interface. Export valid YAML instantly.

Three Stages. Every Action.

1

Agent Requests Action

The agent calls an MCP tool (e.g., roe_nmap_scan). The tool call is serialized into a structured ActionIntent and sent to the Gate Service over HTTP.

2

Gate Evaluates

The deterministic Rule Engine checks scope, schedule, and policy. Edge cases go to the isolated Judge LLM. Hard denials are instant. No LLM call needed.

3

Signed Token Authorizes

If approved, the Gate signs an HMAC-SHA256 token (30s TTL, single-use). The Tool Executor verifies the token before running the command. No token = no execution.

gate_evaluation.log
# Agent requests: nmap scan of 10.0.1.5 (in-scope web server) [Rule Engine] Schedule check ............ PASS [Rule Engine] Target 10.0.1.5 in scope .. PASS [Rule Engine] PORT_SCANNING allowed ..... PASS [Rule Engine] Verdict: HARD_ALLOW [Judge LLM] Skipped (deterministic allow) [Signer] Token: hmac-sha256:a3f8...c912 TTL: 30s [Executor] Signature valid, executing nmap -sV 10.0.1.5 [RESULT] PORT STATE SERVICE VERSION 80/tcp open http nginx/1.24 443/tcp open https nginx/1.24 # Agent requests: psql connect to 10.0.2.5 (out-of-scope DB) [Rule Engine] Target 10.0.2.5 in scope .. FAIL [Rule Engine] Verdict: HARD_DENY (out of scope) [BLOCKED] Action denied. Target 10.0.2.5 is out of scope. Consecutive denials: 1/3 before emergency halt.

Guarantees, Not Suggestions

> Mandatory Mediation

Every pentest tool call must pass through the gate. Four-tier PreToolUse hooks detect network targets in ANY command — not just known tools. IPs, URLs, CIDRs, hostnames, /dev/tcp patterns, and embedded tool names are caught even in Python scripts and custom binaries.

> Complete Mediation

No action bypasses evaluation. The deterministic Rule Engine processes every request. If the Judge LLM goes down, hard rules still enforce scope.

> Tamper Resistance

HMAC-SHA256 tokens with ROE-hash binding. The agent never holds signing keys. Tokens expire in 30 seconds and cannot be replayed. Forging requires the secret key.

> Verifiable Compliance

Every evaluation is logged with full context: action, policy check, verdict, token, and execution result. Complete audit trail for compliance and post-engagement review.

Your Rules of Engagement, as Code

Define scope, allowed techniques, denied actions, schedule windows, data handling, and emergency procedures in human-readable YAML. The ROE spec becomes the machine-enforceable contract between your team and the AI agent.

acme_corp_roe.yaml
roe: metadata: engagement_id: "ENG-2024-001" client: "Acme Corp" approved_by: "John Smith, CISO" scope: in_scope: networks: - cidr: "10.0.0.0/24" # Web app subnet ports: [80, 443, 8080] domains: - pattern: "*.app.corp.local" out_of_scope: networks: - cidr: "10.0.2.0/24" # Prod DB - BLOCKED reason: "Production database tier" actions: allowed: - category: "reconnaissance" methods: [port_scan, dns_enumeration] - category: "web_application_testing" methods: [sql_injection, xss, csrf] denied: - category: "denial_of_service" - category: "data_exfiltration" emergency: kill_switch: true max_consecutive_denials: 3 # Auto-halt after 3 denied attempts

Build Your ROE Visually. No YAML Required.

Don't want to write YAML by hand? The ROE Creator Dashboard gives you a visual form-based builder with live preview. Define scope, actions, schedule, and constraints through a web interface. Export valid YAML instantly. Free in Community.

ROE Creator Dashboard — visual form builder with live YAML preview

Live YAML Preview

See your ROE specification update in real time as you fill in the form. Syntax-highlighted, always valid.

Import & Edit

Already have a YAML file? Import it into the form, make changes visually, then export the updated version.

Built-In Validation

Validates CIDR notation, date formats, required fields, and structural completeness before you export. No broken specs.

ROE Gate: Out-of-Band Enforcement for Autonomous Pentest Agents

Abstract

LLM-based agents are being deployed for autonomous penetration testing, but nobody has solved the constraint problem. These agents run on system-prompt "guardrails" that don't actually guard anything. Research shows agents take risky actions 23.9% of the time even with explicit safety instructions. GPT-4 misses 27.5% of risky situations entirely. ROE Gate is the first reference monitor built for this problem: out-of-band evaluation, cryptographic action signing, and an isolated judge LLM that can't be prompt-injected by the agent it's evaluating.

The Problem: Prompt-Based Safety Is Not Enforcement

Current approaches to constraining AI pentest agents all fail in practice:

System Prompt Instructions. Telling the model "do not scan out-of-scope targets" gives you zero enforcement. The model ignores these instructions under prompt injection, context window overflow, or just because "being helpful" wins out over "follow the rules."

Output Filtering. Content filters check model outputs after the fact, but they can't stop tool execution. By the time the filter flags something, the agent already ran the command.

Self-Critique / Constitutional AI. Having the same model evaluate its own actions is circular. If the model gets prompt-injected, the self-critic is just as compromised. Same model, same context, no isolation.

General-Purpose Policy Engines. Systems like OPA/Rego can evaluate policies but know nothing about LLM agent actions, can't semantically evaluate edge cases, and have no cryptographic binding between policy approval and tool execution.

The ROE Gate Architecture

ROE Gate implements the reference monitor pattern (Anderson, 1972). Every access to a protected resource must pass through a tamper-resistant mediator. The gate runs as a separate process from the agent with its own signing keys and its own evaluation context. The agent can't modify it, bypass it, or influence its decisions.

Six components:

1. Action Intent Serializer. Every tool call from the agent gets converted to a structured ActionIntent, a tool-agnostic intermediate representation with the action category, target, parameters, and impact assessment. Same policy governs any tool regardless of syntax.

2. ROE Specification Language (ROE-SL). A YAML-based domain-specific language for defining penetration testing rules of engagement. Covers scope (IP ranges, domains, ports), allowed and denied action categories, schedule windows, data handling requirements, and emergency procedures. The ROE spec is SHA-256 hashed and bound to every signed token.

3. Deterministic Rule Engine. Eight evaluation checks in strict priority order provide fast, predictable enforcement. Schedule violations, out-of-scope targets, and hard-denied categories are rejected instantly without needing an LLM call.

4. Isolated Judge LLM. A separate LLM instance evaluates edge cases that the deterministic engine can't resolve. The judge only sees the action and the policy. Never the agent's conversation context. This isolation stops prompt injection from propagating through the evaluation chain.

5. Cryptographic Action Signer. Approved actions receive cryptographically signed tokens (HMAC-SHA256 or Ed25519) with 30-second TTL, single-use nonces, canonical JSON serialization, and ROE-hash binding. Only the Gate Service holds the signing keys. The agent never has access. Ed25519 asymmetric signing allows auditors to verify tokens with only the public key.

6. Signature-Enforcing Tool Executor. A verification proxy that performs six checks before executing any tool: signature validity, token expiration, replay detection, ROE hash match, action/token correspondence, and tool whitelist membership.

Why Not Just Use Guardrails?

Existing guardrail systems (NeMo Guardrails, Guardrails AI, etc.) operate at the wrong layer. They filter LLM outputs, checking whether the text the model generates is safe. ROE Gate operates at the tool execution layer, checking whether the action the model wants to perform is authorized. Output filtering happens after the decision. Tool-call gating happens before execution.

ROE Gate also provides cryptographic proof that an action was evaluated and approved. No existing guardrail system does this. The signed token creates a verifiable chain of custody: policy → evaluation → approval → execution, each step cryptographically bound to the others.

Prior Art Comparison

prior_art_comparison
System ROE-SL Rules Judge Crypto Audit ───────────────────────────────────────────────────────────────────── NVIDIA NeMo Guardrails ✗ ~ ✗ ✗ ~ Guardrails AI ✗ ~ ✗ ✗ ✗ OPA / Rego ✗ ✓ ✗ ✗ ~ Constitutional AI ✗ ✗ ~ ✗ ✗ GuardAgent (Xiang 2024) ✗ ~ ✓ ✗ ~ AgentSpec (Wang, ICSE 2026)✗ ✓ ✗ ✗ ~ Pentera / XM Cyber ✗ ~ ✗ ✗ ✓ ROE Gate ✓ ✓ ✓ ✓ ✓

Key Research References

Anderson, J.P. (1972). "Computer Security Technology Planning Study." The original reference monitor definition.

Ruan et al. (2024). "ToolEmu: Identifying the Risks of LM Agents with an LM-Emulated Sandbox." ICLR 2024 Spotlight. Agents take risky actions 23.9% of the time.

Yuan et al. (2024). "R-Judge: Benchmarking Safety Risk Awareness for LLM Agents." EMNLP Findings 2024. GPT-4 achieves only 72.52% safety risk awareness (F1).

Debenedetti et al. (2024). "AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents." NeurIPS 2024 D&B. No prompt-based defense achieves both high utility and high security.

Zhang et al. (2025). "Agent-SafetyBench: Evaluating the Safety of LLM Agents." arXiv:2412.14470. None of 16 LLM agents achieves a safety score above 60%; defense prompts alone are insufficient.

Wang et al. (2026). "AgentSpec: Customizable Runtime Enforcement for Safe and Reliable LLM Agents." ICSE 2026. Validates the need for external runtime enforcement with formal specifications.

Dalrymple et al. (2024). "Guaranteed Safe AI via Quantitative Safety Guarantees." World model + safety spec + verifier framework that validates the ROE Gate approach.

Open Core. Self-Hosted. Your Infrastructure.

ROE Gate runs entirely on your infrastructure. No cloud dependency, no data leaving your network. The core engine is free for personal use. Commercial licenses unlock additional features and support.

Pro

$5,000 /month

Commercial license for security teams deploying AI agents on live engagements.

  • Everything in Community
  • Licensed for commercial use
  • Multi-ROE spec management
  • Structured audit logging (JSON / SIEM export)
  • Slack and webhook alerting
  • Priority email support
Contact Sales

Enterprise

Custom

For large organizations with multiple teams, custom integration needs, and dedicated support.

  • Everything in Pro
  • Unlimited ROE specifications
  • Custom tool and agent integrations
  • Dedicated technical account manager
  • Priority support with SLA
  • On-site onboarding available
Contact Sales

MSSP / OEM

Custom

Deploy ROE Gate across client engagements or embed it into your security platform.

  • Everything in Enterprise
  • Multi-tenant ROE management
  • White-label / custom branding
  • Patent license for redistribution
  • Dedicated engineering support
  • Volume pricing available
Talk to Us

Built for the Teams That Need It Most

CISO / Security Leader

Approve AI Pentesting Without the Risk

Your board wants AI-driven security testing. You need proof it won't go rogue. ROE Gate gives you cryptographic audit trails and policy-enforced boundaries that satisfy compliance and your sleep schedule.

"I can sign off on autonomous testing because every action is gated, logged, and provably within scope."

Pentest Team Lead

Let AI Agents Handle the Grind

You're running 4 engagements in parallel and burning out your team on repetitive recon. Deploy an AI agent with ROE Gate and let it handle the mechanical work, with the same ROE discipline you'd expect from a human tester.

"The agent runs nmap, tests for SQLi, enumerates APIs. All within scope, all signed. I review findings instead of babysitting."

MSSP / MDR Provider

Scale Pentest Services Without Scaling Headcount

Your clients want continuous testing but you can't hire fast enough. White-label ROE Gate into your platform and run gated AI agents at scale. Each client gets their own ROE spec, audit trail, and compliance report.

"We went from 20 engagements a quarter to 200. Same team. Every one compliant."

Works With Your Stack

ROE Gate is model-agnostic and tool-agnostic. The tester agent can be any LLM (Anthropic API, OpenAI API, Claude Code, or any OpenAI-compatible provider), and the judge can be any supported provider. All providers are included in the free Community tier.

Claude Code
OpenAI
LangChain
MCP
AutoGPT
CrewAI
quickstart.sh
# 1. Install $ git clone https://github.com/Grey-Line-Interactive/ROEGATE $ cd ROEGATE && pip install -e . # 2. Create your ROE $ roe-gate creator # visual builder at :19990/roe-creator # Build your spec → Download YAML → save as my_engagement.yaml # (or use the included example: examples/acme_corp_roe.yaml) # 3. Configure (optional — or use CLI flags directly) $ cp examples/roe_gate_config.yaml my_config.yaml $ vim my_config.yaml # set judge, model, gate settings # 4. Run with your agent (Claude Code, OpenAI, Anthropic API, any LLM) $ roe-gate pentest --config my_config.yaml # Dashboard at :19990/dashboard # Every action gated. Every decision logged. Every token signed.

Patent Pending

> NOTICE

The ROE Gate system and method for out-of-band enforcement of rules of engagement on autonomous security testing agents is the subject of U.S. Provisional Patent Application No. 63/993,983, filed under 35 U.S.C. §111(b).

Application No. 63/993,983  •  Filed: March 1, 2026  •  Inventor: Richard Roane, Jr.

The Community Edition is licensed under MIT for non-commercial and internal use. Commercial use, white-labeling, and OEM embedding require a separate patent license. See the pricing section or contact us for details.

Get in Touch

Interested in Pro, Enterprise, or MSSP licensing? Have questions about integrating ROE Gate into your workflow? Drop us a line.

We'll follow up within one business day.

> Message sent. We'll be in touch shortly.