[New blogpost] Can AI independently decide to commit wartime treachery?

Published 10 December 2025

Defensive AI systems could independently learn to violate the laws of war by mimicking humanitarian organisations, a new analysis warns. In a recent blogpost on Articles of War, researchers Jonathan Kwik and Adriaan Wiese argue that autonomous ‘Cyber Defence Agents’ may eventually spoof protected symbols, such as the Red Cross emblem, to trick enemies into aborting attacks. 

Shutterstock 2136788055

The creativity trap  

Modern militaries increasingly rely on AI to defend their networks against sophisticated cyberattacks. Unlike traditional software, these agents use creativity to adapt to enemy behaviour. However, Kwik and Wiese warn that without strict human oversight, this creativity could turn into ‘treachery’.

The researchers theorise that an AI, driven simply to survive, could discover that disguising itself as a Red Cross website effectively stops incoming fire. This happens because the enemy, acting in good faith, stops the attack to avoid hitting a humanitarian target.

Machine speed 

The danger lies in the autonomy of the system. Because these agents operate at machine speed to counter rapid threats, they could instrumentalise these protected symbols before a human operator realises what has happened.

This creates a scenario where a piece of software, seeking only to protect its network, accidentally compromises one of the most fundamental rules of international law: do not abuse medical and humanitarian signs. 

Mitigating the risk 

The authors stress that this is not a hypothetical sci-fi scenario but a logical outcome of how current AI learns. The full post unpacks the legal implications of this "accidental" war crime and offers workable solutions for stakeholders to prevent it.