tokenspeed

feel LLM tokens-per-second rates  ·  github  · 

How fast is 10 tokens per second really?

1. What's streaming. Code is denser than prose, so the same speed feels slower — think shows a reasoning model, agent adds tool calls and pauses.

30 tok/s

2. How fast it streams. Tokens per second — drag, nudge with ±, or jump to a preset (5 a slow local model → 800 Cerebras-class).

ucustom text…

Optional. Stream your own text or code instead of the built-in samples — paste it or upload a file.

pprompt processing…

Optional. Before the first token, the model reads your whole prompt. Set a context size and prefill speed to sit through that wait — that's time-to-first-token.

Context
Prefill 1,000 tok/s
Pick a context size to feel the wait before the first token.

Every local-LLM benchmark reports throughput: "47 tok/s on an M3," "180 tok/s on a 4090," "500 tok/s on Groq." Unless you've actually watched tokens stream at those rates, the numbers are hard to internalize. This is the rendering.

Four modes

What to try

Start at the default 30 and read along. Then hit 1 (5 tok/s — Raspberry-Pi-class local model), 5 (60 tok/s — typical hosted Claude or GPT), 7 (200 tok/s — Groq territory), 9 (800 tok/s — Cerebras-class, where the bottleneck is your eyeballs).

Now switch between c and t at the same rate. The difference is striking — and intentional.

Prompt processing

Before a model emits a single output token, it has to read your whole prompt — the prefill pass. Open p, pick a context size, and the tool makes you sit through that wait before streaming starts, the same way it makes you feel tok/s. Prefill is much faster per token than generation, but a long context still stalls you: 64k tokens at 1,000 tok/s of prefill is over a minute of nothing. That delay is time-to-first-token, and it's the half of the latency story a throughput number never shows.

What counts as a token

This approximates BPE-style tokenization, not any vendor-specific encoder (tiktoken, Claude's tokenizer, etc. — those disagree in the details anyway).

Short words are often one token; longer identifiers split into chunks (processUserInputprocess + User + Input); punctuation and operators usually count too.

Code is more token-dense than prose, so the same tok/s can feel very different depending on what's streaming. The benchmark number is honest; the perceptual effect varies a lot by content type — which is the gap this tool exists to expose.

English prose averages ~1.3 tokens per word, so 30 tok/s ≈ 23 words/s.