Multi-Agent Architectures

Why Multiple Agents?

A single AI agent - no matter how powerful - faces fundamental limitations when conducting a penetration test. The attack surface of a modern web application spans dozens of vulnerability classes across web, API, infrastructure, cryptography, authentication, business logic, and supply chain domains. A generalist agent trying to cover all of these will inevitably trade depth for breadth.

This is the same reason human pentest teams use specialists. A team with a web app expert, an infrastructure specialist, and an API security researcher will find more (and more severe) vulnerabilities than three generalists working independently. Specialization enables depth.

Specialist vs. Generalist

Consider what happens when an agent encounters an authentication endpoint:

Generalist approach: "I see a login form. Let me test a few common SQL injection payloads and some default credentials. Moving on to the next endpoint."

Auth specialist approach: "I see a login form. Let me test SQL injection, NoSQL injection, LDAP injection, and template injection. Let me check for timing-based username enumeration. Let me test the password reset flow for IDOR. Let me check if JWT tokens are used and whether the signature is validated. Let me test for OAuth misconfiguration. Let me check if multi-factor authentication can be bypassed. Let me check for session fixation after login."

The specialist goes dramatically deeper because its system prompt and expertise focus its entire reasoning capacity on authentication and access control. Meanwhile, the Web App Agent is doing equally deep work on XSS and CSRF, and the API Agent is thoroughly testing IDOR and rate limiting.

Paladin's Specialist Agents

Paladin deploys a fleet of specialist agents, each with a distinct focus area:

Agent	Focus Areas
Web App Agent	XSS, CSRF, injection, session management, input validation
API Security Agent	IDOR, auth flaws, rate limiting, GraphQL, REST misconfigs
Infrastructure Agent	Open ports, service misconfig, outdated software, cloud exposure
Code Analysis Agent	SAST findings, leaked secrets, dependency vulnerabilities
Crypto/TLS Agent	Weak ciphers, certificate issues, HSTS, key management
Auth/Access Agent	Authentication bypass, privilege escalation, broken access control
Business Logic Agent	Race conditions, workflow bypass, data integrity
Supply Chain Agent	Dependency risks, third-party vulnerabilities, component security
AI/LLM Security Agent	Prompt injection, model misuse, unsafe LLM integrations

Each agent runs concurrently, analyzing the same Phase 1 tool output but through its specialized lens. This parallelism means a Deep tier pentest with 10 agents completes in the same wall-clock time as a Recon tier with 1 agent - but covers far more attack surface in depth.

Tier Scaling

Paladin scales the number and sophistication of agents based on the credit tier:

Recon Tier (1 Agent)

Single generalist agent
30-minute duration
Broad but shallow coverage
Best for quick reconnaissance pentests

Standard Tier (4 Agents)

Web, API, and Infrastructure specialists
60-minute duration
Focused analysis on the three most common attack surfaces
Best for regular pentesting of web applications

Deep Tier (10 Agents)

The full specialist fleet plus supervisor and synthesis roles
120-minute duration
Comprehensive coverage including code, crypto, auth, business logic, and supply chain
Best for thorough assessments before releases or compliance audits

Blitz Tier (20 Agents)

The full specialist fleet in the breadth pass
Additional depth agents for deep-dive follow-up
Exploit Chain Agent for multi-step attack path discovery
Verification Agent for severity validation
240-minute duration
Maximum depth and coverage for critical assets

Blitz-Exclusive Agent Types

The Blitz tier introduces three agent types not available at lower tiers:

Depth Agents

After the initial breadth pass by the specialist fleet, depth agents revisit interesting findings for more thorough investigation. A web depth agent might spend 30 minutes on a single promising XSS vector, trying dozens of filter bypass techniques and payload variations.

Exploit Chain Agent

This agent reads all findings from all other agents on the blackboard and specifically looks for combinations that create multi-step attack paths. It might connect:

An SSRF vulnerability (found by the Web Agent)
An internal metadata endpoint (found by the Infrastructure Agent)
AWS credentials in the metadata response (inferred from the SSRF)
S3 bucket access using those credentials (tested by the Exploit Chain Agent)

This four-step chain turns a medium-severity SSRF into a critical data breach scenario.

Verification Agent

Quality control for the entire team. The Verification Agent:

Confirms severity ratings match actual exploitability
Re-runs proof-of-concept exploits to ensure reproducibility
Validates CVSS scores against the demonstrated impact
Flags findings where the PoC does not match the claimed severity

Coordination and Orchestration

Running multiple agents in parallel creates a coordination challenge. Paladin solves this with several mechanisms:

The Orchestrator

A lightweight control layer that manages the agent lifecycle:

Plans which agents to deploy based on the tier
Monitors agent progress and resource usage
Signals wrap-up when the time budget is 80% consumed
Triggers the synthesis phase for final report generation

The Supervisor (Standard tier and above)

Reviews agent output in real-time, identifies gaps in coverage, and can redirect agents to under-explored areas. The supervisor ensures that agent effort is distributed effectively across the attack surface.

The Synthesis Agent (Deep and Blitz tiers)

After all specialist agents complete, the Synthesis Agent reads all findings and produces:

Deduplicated, prioritized finding list
Cross-agent correlation analysis
Executive summary
Attack surface map
STRIDE-based threat model

Parallelism and Efficiency

The multi-agent architecture enables a form of parallelism that is impossible with a single agent. While the Web Agent is testing XSS on the frontend, the API Agent is simultaneously testing IDOR on the backend endpoints, and the Infrastructure Agent is probing service configurations. All three are working on the same target at the same time, each reading from and writing to the shared blackboard.

This parallelism is why a Blitz pentest with 20 agents does not take 20 times longer than a single agent - it takes the same wall-clock time but produces dramatically more thorough results. The bottleneck shifts from agent reasoning speed to the target's response time and the coordination overhead of the blackboard pattern.

Multi-Agent Architectures

Why Multiple Agents?

Specialist vs. Generalist

Consider what happens when an agent encounters an authentication endpoint:

Generalist approach: "I see a login form. Let me test a few common SQL injection payloads and some default credentials. Moving on to the next endpoint."

Paladin's Specialist Agents

Paladin deploys a fleet of specialist agents, each with a distinct focus area:

Agent	Focus Areas
Web App Agent	XSS, CSRF, injection, session management, input validation
API Security Agent	IDOR, auth flaws, rate limiting, GraphQL, REST misconfigs
Infrastructure Agent	Open ports, service misconfig, outdated software, cloud exposure
Code Analysis Agent	SAST findings, leaked secrets, dependency vulnerabilities
Crypto/TLS Agent	Weak ciphers, certificate issues, HSTS, key management
Auth/Access Agent	Authentication bypass, privilege escalation, broken access control
Business Logic Agent	Race conditions, workflow bypass, data integrity
Supply Chain Agent	Dependency risks, third-party vulnerabilities, component security
AI/LLM Security Agent	Prompt injection, model misuse, unsafe LLM integrations

Tier Scaling

Paladin scales the number and sophistication of agents based on the credit tier:

Recon Tier (1 Agent)

Single generalist agent
30-minute duration
Broad but shallow coverage
Best for quick reconnaissance pentests

Standard Tier (4 Agents)

Web, API, and Infrastructure specialists
60-minute duration
Focused analysis on the three most common attack surfaces
Best for regular pentesting of web applications

Deep Tier (10 Agents)

The full specialist fleet plus supervisor and synthesis roles
120-minute duration
Comprehensive coverage including code, crypto, auth, business logic, and supply chain
Best for thorough assessments before releases or compliance audits

Blitz Tier (20 Agents)

The full specialist fleet in the breadth pass
Additional depth agents for deep-dive follow-up
Exploit Chain Agent for multi-step attack path discovery
Verification Agent for severity validation
240-minute duration
Maximum depth and coverage for critical assets

An SSRF vulnerability (found by the Web Agent)
An internal metadata endpoint (found by the Infrastructure Agent)
AWS credentials in the metadata response (inferred from the SSRF)
S3 bucket access using those credentials (tested by the Exploit Chain Agent)

This four-step chain turns a medium-severity SSRF into a critical data breach scenario.

Verification Agent

Quality control for the entire team. The Verification Agent:

Confirms severity ratings match actual exploitability
Re-runs proof-of-concept exploits to ensure reproducibility
Validates CVSS scores against the demonstrated impact
Flags findings where the PoC does not match the claimed severity

Coordination and Orchestration

Running multiple agents in parallel creates a coordination challenge. Paladin solves this with several mechanisms:

The Orchestrator

A lightweight control layer that manages the agent lifecycle:

Plans which agents to deploy based on the tier
Monitors agent progress and resource usage
Signals wrap-up when the time budget is 80% consumed
Triggers the synthesis phase for final report generation

The Supervisor (Standard tier and above)

The Synthesis Agent (Deep and Blitz tiers)

After all specialist agents complete, the Synthesis Agent reads all findings and produces:

Deduplicated, prioritized finding list
Cross-agent correlation analysis
Executive summary
Attack surface map
STRIDE-based threat model

Multi-Agent Architectures

Why Multiple Agents?

Specialist vs. Generalist

Paladin's Specialist Agents

Tier Scaling

Recon Tier (1 Agent)

Standard Tier (4 Agents)

Deep Tier (10 Agents)

Blitz Tier (20 Agents)

Blitz-Exclusive Agent Types

Depth Agents

Exploit Chain Agent

Verification Agent

Coordination and Orchestration

The Orchestrator

The Supervisor (Standard tier and above)

The Synthesis Agent (Deep and Blitz tiers)

Parallelism and Efficiency

On this page

Multi-Agent Architectures

Why Multiple Agents?

Specialist vs. Generalist

Paladin's Specialist Agents

Tier Scaling

Recon Tier (1 Agent)

Standard Tier (4 Agents)

Deep Tier (10 Agents)

Blitz Tier (20 Agents)

Blitz-Exclusive Agent Types

Depth Agents

Exploit Chain Agent

Verification Agent

Coordination and Orchestration

The Orchestrator

The Supervisor (Standard tier and above)

The Synthesis Agent (Deep and Blitz tiers)

Parallelism and Efficiency

On this page