Qwen 3.5 covers Amharic. The constraint is tokenizer efficiency, not language visibility.
Why we even bothered to check this
Amharic is the language of more than 60 million people and the working language of Ethiopia. At Addis AI, we build voice tools, chatbots, and summarizers that need to feel native rather than merely passable.
That makes tokenizer efficiency a real product issue, not a technical footnote. If a model needs too many tokens to read Amharic, cost goes up, context windows shrink faster, and production behavior gets harder to manage.
What tokenization really means here
Tokenization is how a model chops text into pieces it can process. English usually gets bigger pieces. Amharic often gets much smaller ones. Smaller pieces mean more tokens, higher cost, shorter effective context, and more room for quality loss.
How we tested it
We used official model files from Hugging Face and a representative slice of real Amharic speech transcriptions. The goal was not to repeat marketing claims, but to measure what the tokenizer actually does.
What we did step by step
Loaded every Qwen 3.5 small model from 0.8B to 9B, checked the vocabulary for Amharic letters, verified clean encode-decode behavior, and then ran 10,000 real Amharic lines through the tokenizer.
The real Amharic we used
google/WaxalNLP, amh_asr train split. 10,000 genuine Ethiopian speech transcriptions rather than synthetic examples or isolated sentences.
Good news first: the vocabulary is genuinely huge
Qwen 3.5 really does expose a 248,320-token vocabulary, and the number is consistent across all four smaller models we checked.
| Model | Vocabulary size | Result |
|---|---|---|
| Qwen3.5-0.8B | 248,320 | Confirmed |
| Qwen3.5-2B | 248,320 | Confirmed |
| Qwen3.5-4B | 248,320 | Confirmed |
| Qwen3.5-9B | 248,320 | Confirmed |
What this means
The model family has plenty of vocabulary room, and every Amharic character we tested round-tripped cleanly through encode and decode. The coverage claim is real. The deeper issue appears later, in efficiency.
Finding 2: only 25 of 248,070 loaded tokens are pure Ethiopic
Out of 248,070 loaded tokens, only 25 are pure Ethiopic characters. That is the clearest signal that Amharic is present, but not represented in a tokenizer-native way.
Ethiopic share of loaded vocabulary
25 of 248,070 loaded tokens
Total tokens loaded
248,070
Pure Ethiopic tokens
25
Percentage
0.0101%
Under the hood: why Amharic words get split so much
We tested 384 Ethiopic codepoints directly. Coverage exists, but most characters still expand into multiple tokens instead of one, which is where the token inflation begins.
Ethiopic character split profile
384 codepoints tested
Once most characters are multi-token, full sentences become expensive very quickly. That is why the sentence-level test matters more than the vocabulary headline.
The real test: 10,000 everyday Amharic sentences
Theory is useful, but real text is the truth. We ran 10,000 everyday Amharic lines and measured the actual cost.
Example sentence from the dataset
እኔ አዲስ አበባ ነው የምኖረው -> 9 tokens instead of the 4-5 you would roughly expect in English.
Total tokens used
1,805,886
Total words
267,768
Tokens per word
6.74
Compared to English
~5x more
What the token stream actually looks like
Relative token cost impact for the same content volume
In simple terms, the model understands every letter correctly, but it has to work much harder than it should. That turns into higher API bills, shorter context headroom, and more pressure on latency-sensitive production systems.
Conclusion
The model-level claim is valid: Qwen 3.5 has a large, consistent vocabulary and correctly represents Amharic text. The limiting factor is tokenizer efficiency, not coverage.
On 10,000 real Amharic sentences, the tokenizer averages 6.74 tokens per word and roughly 5x relative token cost versus a simple English baseline framing. For production systems, that directly affects cost, latency, and usable context budget.
Budget and latency
Plan for roughly 5x higher token volume on Amharic-heavy traffic compared with English baselines.
Context strategy
Apply summarization and chunking earlier in the pipeline so the effective context budget lasts longer.
Evaluation KPI
Track tokens per task alongside accuracy and latency when choosing models for production.