Skip to content
ai

Tokenizer

Text Tokenizer

Definition

A tokenizer converts raw text strings into sequences of token IDs that a language model can process, and converts model outputs back to human-readable text. Tokenization algorithms like BPE and SentencePiece learn subword vocabularies from training data.

The tokenizer choice significantly affects model performance on languages with rich morphology and specialized domains like code or mathematics.


Ship secure code faster

Crash Override integrates security into the developer workflow. No context switching, no waiting on reviews.