backdoors News | The Biggish

Microsoft scanner detects LLM backdoors via three behavioral signals, low false positives

Microsoft's AI red team released a lightweight scanner detecting sleeper-agent backdoors in open-weight LLMs without needing training data. The tool flags three telltale patterns: unusual attention on trigger phrases, semantic drift in outputs, and abnormal memory extraction. Backdoors survive safety training and fine-tuning, making pre-deployment scanning critical for enterprise deployments.

Feb 5, 2026