The numbers are uncomfortable
AWS GenAI Security Engineer Chetan Pathade tested 1,400+ jailbreak prompts against GPT-4, Claude 2, and other major LLMs. GPT-4's jailbreak success rate: 87.2%. Cross-model transfer to Claude 2: 64.1%. This isn't theoretical—these are production vulnerabilities.
What's actually broken
Prompt filters alone fail against multi-turn attacks. An attacker doesn't need to break a model in one shot—they can condition it across multiple interactions. Traditional WAFs catch SQL injection but miss semantic manipulation. The attack surface is different; the defenses haven't caught up.
Pathade's Generative Application Firewall (GAF) proposal suggests defense-in-depth: runtime filtering, sandboxing, and behavioral monitoring. Not revolutionary, but necessary. The alternative is retroactive patching after each new jailbreak pattern emerges.
The implementation gap
Enterprise leaders are deploying LLMs faster than security teams can red-team them. IBM's RSAC 2025 analysis noted autonomous AI agents are outpacing traditional cybersecurity controls. Pathade's research—published late 2025 after his Carnegie Mellon Master's in Information Security—validates this concern with data.
For production deployments, the checklist matters:
- Runtime prompt injection detection (tools like NeMo Guardrails, Guardrails AI)
- Logging and monitoring for indirect injection in RAG systems
- Red-team testing against known attack patterns
- Layered defenses beyond input sanitization
What to watch
Pathade's work (29 citations on Google Scholar, multiple bug bounty Hall of Fame recognitions since 2020) represents a growing specialization: securing GenAI specifically, not just securing systems that happen to use AI. His career trajectory—Qualys, Twitter, Quantiphi, AWS—mirrors the maturation of cloud security a decade ago.
The difference: cloud security had years to catch up. LLM deployments are happening now. Organizations implementing generative AI without GAF-style defenses are making a bet that their use cases won't attract attackers. History suggests that's optimistic.
No major GenAI security incidents reported in the past week. The question is when, not if.