Red teaming

AI Red Teaming

Definition

Red teaming is the practice of systematically probing an AI model for harmful behaviors, safety failures, and policy violations by adopting an adversarial mindset. Red teamers craft prompts designed to elicit jailbreaks, prompt injections, biased outputs, and harmful content.

Red teaming outputs inform model fine-tuning, content filter improvements, and deployment policy decisions.

Related Terms

Jailbreak

LLM Jailbreak

Prompt injection

Prompt Injection Attack

Alignment

AI Alignment

← Back to Glossary

Ship secure code faster

Crash Override integrates security into the developer workflow. No context switching, no waiting on reviews.

Talk to a Human See the Product