ai
Jailbreak
LLM Jailbreak
Definition
A jailbreak is an adversarial prompt technique that attempts to bypass a language model's safety guidelines and content filters to produce outputs the model would normally refuse. Jailbreaks exploit prompt framing tricks, roleplay scenarios, and encoding tricks to confuse safety mechanisms.
Robustness against jailbreaks is an ongoing alignment and red-teaming challenge.
Ship secure code faster
Crash Override integrates security into the developer workflow. No context switching, no waiting on reviews.