Transformer

Transformer Architecture

Definition

The transformer is a neural network architecture introduced in "Attention Is All You Need" (Vaswani et al., 2017) that replaced recurrence with multi-head self-attention. Transformers process entire sequences in parallel, enabling efficient training on large datasets with GPU/TPU hardware.

The architecture is the foundation of virtually all state-of-the-art language models (GPT, BERT, T5) and has expanded to vision, audio, and multimodal domains.

Related Terms

Attention mechanism

Attention Mechanism

Self-attention

Self-attention Mechanism

Ship secure code faster

Crash Override integrates security into the developer workflow. No context switching, no waiting on reviews.

Talk to a Human See the Product