AI experts sharing free tutorials to accelerate your business.
← Back to Blog
AI Basics

What Is a Token in AI? A Simple Explanation

Krasa AI

2026-05-07

8 min read

When you type a message to an AI, you see words. But the AI doesn't. It sees something smaller — called tokens.

Understanding tokens is one of the most useful things you can learn about AI. It explains why your prompt sometimes hits a limit. It explains why AI costs what it does. And it explains some of the weird things AI does with emoji and foreign languages.

Let's break it down simply.

Think of Tokens Like Puzzle Pieces

Imagine you're handing someone a jigsaw puzzle, but instead of giving them the whole finished picture, you dump out all the individual pieces. That's kind of what happens when text goes into an AI.

The AI doesn't get full words in one go. It gets small pieces of text — tokens. A token might be:

  • A whole short word, like "cat" or "the"
  • Part of a longer word, like "un" or "believ" or "able"
  • A punctuation mark, like "!" or ","
  • A space attached to a word

So the sentence "tiktoken is great!" might become six tokens: t | ik | token | is | great | !1

Notice that "tiktoken" got split into three pieces. And the space before "is" got glued to the word, not left on its own. That's the AI's puzzle-piece view of your text.

One Token ≈ Four Characters (In English)

As a rough rule of thumb, one token is about four characters — or roughly three-quarters of a word — in everyday English text.1

That means:

  • 100 words ≈ about 130 tokens
  • A full page of text ≈ about 500–700 tokens
  • This entire blog post ≈ a few thousand tokens

But this rule only works well for standard English. Code, emoji, and other languages can be very different — more on that below.

Why Doesn't AI Just Use Words?

Good question! There are two obvious options: break text into full words, or break it into individual letters. Both have problems.

Full words create a huge list of vocabulary. A language model would need to memorize millions of words, including every made-up name, technical term, or foreign word it might encounter. When it sees a new word it's never learned, it's stuck.2

Individual letters solve that problem — you'll never have an unknown letter. But then even a short sentence becomes hundreds of pieces, and the AI has to work much harder to understand meaning from single letters.2

Tokens are the compromise. They're bigger than letters but smaller than full words. A word like "unbelievable" might become un | believ | able — three pieces that each carry some meaning. The AI can recognize those pieces even in totally new combinations.3

This approach is called subword tokenization, and it's the reason modern AI handles rare words, new slang, and technical terms better than older systems did.

How Tokenization Works (The Simple Version)

When you send text to an AI, it goes through a quick transformation before the model ever reads it:

  1. Cleanup — Fix weird spacing, handle special characters
  2. Split — Break the text into pieces around spaces, punctuation, and word boundaries
  3. Tokenize — Apply a learned rulebook to split pieces further into tokens
  4. Number them — Each token gets an ID number from a vocabulary list
  5. Into the model — The model reads those ID numbers, not the words themselves

After the model decides what to say, it runs this whole process backwards: ID numbers → tokens → text → the reply you read.4

Different AI companies have different rulebooks for step 3. That's why the same sentence can have different token counts depending on which AI you're talking to.

The Backpack Problem: Context Windows

Every AI model has a context window — the maximum number of tokens it can hold in "memory" at once.

Think of it like a backpack. The AI can only carry so many puzzle pieces at a time. Your question goes in. The conversation history goes in. The AI's answer goes in. When the backpack is full, older pieces have to fall out.

This is why very long conversations can make an AI seem forgetful — it literally can't hold all the earlier pieces anymore.

Context windows are measured in tokens, not words. A model with a 100,000-token context window can hold roughly a small novel's worth of text — but fill it with code or a long system prompt, and you'll use it up faster.

Tokens Are How You Pay

AI providers charge by the token, not by the word or by the message.5

You typically pay for two kinds:

  • Input tokens — what you send to the AI (your prompt, conversation history, files)
  • Output tokens — what the AI sends back (its response)

Output tokens are usually more expensive than input tokens, because generating text is harder than reading it.

Costs are quoted per million tokens — so even though individual tokens are cheap, long conversations or large documents can add up quickly.

One money-saving trick: prompt caching. If you send the same long instructions at the start of every request, some AI providers let you cache that repeated prefix. Cached tokens cost less, so structuring your prompts to reuse the same opening can meaningfully cut costs.6

Not All Tokens Are Equal Across Languages

Here's where it gets interesting.

Token counts are specific to a particular AI model's tokenizer — there's no universal count. The same sentence can produce different numbers of tokens depending on which AI you use.1

For Japanese text, the same phrase can produce very different token counts across model versions. OpenAI's Cookbook shows one Japanese greeting phrase using 14 tokens under an older encoding and only 8 under a newer one.7 That's almost half as many — same words, very different cost.

For emoji, things get even trickier. An emoji that looks like one symbol on your screen might actually be built from several Unicode pieces underneath. A joined emoji like 👩🏽‍💻 can behave like multiple tokens. Never assume one emoji = one token.

For code, operators and punctuation often become their own tokens. A line like if (x == 10) return x + 1; might split into 10 or more tokens.

Different Tokenization Methods

There are a few main "rulebooks" AI companies use to decide how to split text. You don't need to memorize these, but it helps to know they exist:

BPE (Byte Pair Encoding) — Starts with individual characters, then merges the most common pairs over and over. Used by models like Llama and Gemma.3

WordPiece — Similar idea, but uses a slightly different scoring rule for which pairs to merge. This is what BERT uses — you can spot it by the ## prefix on continuation pieces, like ##ing or ##ness.2 8

Unigram — Starts with a big vocabulary and trims it down by keeping pieces that explain your text most efficiently.9

SentencePiece — A framework that can run BPE or Unigram directly on raw text, without needing to pre-split by spaces first. Useful for languages like Japanese and Chinese that don't use spaces between words. Spaces show up as in the token.10

The practical takeaway: always count tokens using the tool built for the specific AI you're using. A rough estimate won't cut it when you're close to a limit or optimizing costs.5

How to Check Token Counts

Most major AI providers give you tools to count tokens precisely:

  • OpenAI has an interactive tokenizer at platform.openai.com/tokenizer and a counting API for exact production counts5
  • Hugging Face provides detailed tokenizer libraries and documentation for open-source models4

For quick estimates in English, the "4 characters per token" rule works. But for anything production — billing estimates, prompt length limits, chunking documents — use the official counting tool for your specific model.

The One-Sentence Summary

A token is the basic unit of text an AI reads and writes — smaller than a word, larger than a letter — and it controls how much text fits in a conversation, how fast the AI responds, and how much you pay.


Sources

Footnotes

  1. OpenAI Help Center — What are tokens and how to count them? 2 3

  2. Devlin et al., 2018 — BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 2 3

  3. Sennrich, Haddow & Birch, 2016 — Neural Machine Translation of Rare Words with Subword Units 2

  4. Hugging Face — The Tokenization Pipeline 2

  5. OpenAI — Counting Tokens 2 3

  6. OpenAI — Prompt Caching

  7. OpenAI — How to Count Tokens with Tiktoken

  8. Hugging Face — Tokenization Algorithms

  9. Kudo, 2018 — Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates

  10. Kudo & Richardson, 2018 — SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing

#AI#Tokens#LLMs#Machine Learning#Beginner Guide

Related Posts