Multi-Agent Architectures
Why Multiple Agents?
A single AI agent — no matter how powerful — faces fundamental limitations when conducting a penetration test. The attack surface of a modern web application spans dozens of vulnerability classes across web, API, infrastructure, cryptography, authentication, business logic, and supply chain domains. A generalist agent trying to cover all of these will inevitably trade depth for breadth.
This is the same reason human pentest teams use specialists. A team with a web app expert, an infrastructure specialist, and an API security researcher will find more (and more severe) vulnerabilities than three generalists working independently. Specialization enables depth.
Specialist vs. Generalist
Consider what happens when an agent encounters an authentication endpoint:
Generalist approach: "I see a login form. Let me test a few common SQL injection payloads and some default credentials. Moving on to the next endpoint."
Auth specialist approach: "I see a login form. Let me test SQL injection, NoSQL injection, LDAP injection, and template injection. Let me check for timing-based username enumeration. Let me test the password reset flow for IDOR. Let me check if JWT tokens are used and whether the signature is validated. Let me test for OAuth misconfiguration. Let me check if multi-factor authentication can be bypassed. Let me check for session fixation after login."
The specialist goes dramatically deeper because its system prompt and expertise focus its entire reasoning capacity on authentication and access control. Meanwhile, the Web App Agent is doing equally deep work on XSS and CSRF, and the API Agent is thoroughly testing IDOR and rate limiting.
P4L4D1N's 8 Specialist Agents
P4L4D1N deploys up to 8 specialist agents, each with a distinct focus area:
| Agent | Focus Areas |
|---|---|
| Web App Agent | XSS, CSRF, injection, session management, input validation |
| API Security Agent | IDOR, auth flaws, rate limiting, GraphQL, REST misconfigs |
| Infrastructure Agent | Open ports, service misconfig, outdated software, cloud exposure |
| Code Analysis Agent | SAST findings, leaked secrets, dependency vulnerabilities |
| Crypto/TLS Agent | Weak ciphers, certificate issues, HSTS, key management |
| Auth/Access Agent | Authentication bypass, privilege escalation, broken access control |
| Business Logic Agent | Race conditions, workflow bypass, data integrity |
| Supply Chain Agent | Dependency risks, third-party vulnerabilities, component security |
Each agent runs concurrently, analyzing the same Phase 1 tool output but through its specialized lens. This parallelism means a Deep tier pentest with 10 agents completes in the same wall-clock time as a Recon tier with 1 agent — but covers 10 times more attack surface in depth.
Tier Scaling
P4L4D1N scales the number and sophistication of agents based on the credit tier:
Recon Tier (1 Agent)
- Single generalist agent
- 30-minute duration
- Broad but shallow coverage
- Best for quick reconnaissance scans
Standard Tier (4 Agents)
- Web, API, and Infrastructure specialists
- 60-minute duration
- Focused analysis on the three most common attack surfaces
- Best for regular pentesting of web applications
Deep Tier (10 Agents)
- All 8 specialist agents + supervisor + synthesis
- 120-minute duration
- Comprehensive coverage including code, crypto, auth, business logic, and supply chain
- Best for thorough assessments before releases or compliance audits
Blitz Tier (20 Agents)
- All 8 specialists in breadth pass
- 8 additional depth agents for deep-dive follow-up
- Exploit Chain Agent for multi-step attack path discovery
- Verification Agent for severity validation
- 240-minute duration
- Maximum depth and coverage for critical assets
Blitz-Exclusive Agent Types
The Blitz tier introduces three agent types not available at lower tiers:
Depth Agents
After the initial breadth pass by the 8 specialists, depth agents revisit interesting findings for more thorough investigation. A web depth agent might spend 30 minutes on a single promising XSS vector, trying dozens of filter bypass techniques and payload variations.
Exploit Chain Agent
This agent reads all findings from all other agents on the blackboard and specifically looks for combinations that create multi-step attack paths. It might connect:
- An SSRF vulnerability (found by the Web Agent)
- An internal metadata endpoint (found by the Infrastructure Agent)
- AWS credentials in the metadata response (inferred from the SSRF)
- S3 bucket access using those credentials (tested by the Exploit Chain Agent)
This four-step chain turns a medium-severity SSRF into a critical data breach scenario.
Verification Agent
Quality control for the entire team. The Verification Agent:
- Confirms severity ratings match actual exploitability
- Re-runs proof-of-concept exploits to ensure reproducibility
- Validates CVSS scores against the demonstrated impact
- Flags findings where the PoC does not match the claimed severity
Coordination and Orchestration
Running multiple agents in parallel creates a coordination challenge. P4L4D1N solves this with several mechanisms:
The Orchestrator
A lightweight control layer that manages the agent lifecycle:
- Plans which agents to deploy based on the tier
- Monitors agent progress and resource usage
- Signals wrap-up when the time budget is 80% consumed
- Triggers the synthesis phase for final report generation
The Supervisor (Standard tier and above)
Reviews agent output in real-time, identifies gaps in coverage, and can redirect agents to under-explored areas. The supervisor ensures that agent effort is distributed effectively across the attack surface.
The Synthesis Agent (Deep and Blitz tiers)
After all specialist agents complete, the Synthesis Agent reads all findings and produces:
- Deduplicated, prioritized finding list
- Cross-agent correlation analysis
- Executive summary
- Attack surface map
- STRIDE-based threat model
Parallelism and Efficiency
The multi-agent architecture enables a form of parallelism that is impossible with a single agent. While the Web Agent is testing XSS on the frontend, the API Agent is simultaneously testing IDOR on the backend endpoints, and the Infrastructure Agent is probing service configurations. All three are working on the same target at the same time, each reading from and writing to the shared blackboard.
This parallelism is why a Blitz pentest with 20 agents does not take 20 times longer than a single agent — it takes the same wall-clock time but produces dramatically more thorough results. The bottleneck shifts from agent reasoning speed to the target's response time and the coordination overhead of the blackboard pattern.