Why do powerful AI models like Google's struggle with simple spelling and counting letters?

Large Language Models (LLMs) do not process text letter-by-letter as humans do. They use a method called tokenization, which breaks words and sentences into numerical representations or 'tokens'. This architecture is excellent for understanding context and generating language but is inherently poor at analyzing the specific character-level structure of words, leading to these kinds of spelling and counting errors.

Why Google’s AI can’t spell Google (or anything else)

Google’s AI Stumbles on Spelling Bee Basics

Google’s enhanced AI Overview for Search, a key feature in its strategic pivot, is currently drawing scrutiny for its inability to handle basic spelling and letter-counting tasks. The system has incorrectly stated the number of 'p's in Google, misspelled words like 'journalism' and 'Trump', and provided nonsensical definitions. These errors, while amusing, are significant because they expose fundamental limitations of Large Language Models (LLMs) at the very moment Google is making this technology a centerpiece of its core search product, raising questions about its reliability for billions of users.

The Technical Reason: It's the Tokens

The root cause of these spelling and counting failures is not a simple bug but an architectural characteristic of how most LLMs are designed. These models do not 'read' text in the human sense of processing individual letters. Instead, they use a system of tokenization, which breaks down text into numerical representations. This process is highly efficient for understanding context and generating coherent prose but is poorly suited for character-level manipulation. As AI researcher Matthew Guzdial explained, the model understands an encoding for the word 'the' but does not inherently recognize its component letters 'T', 'H', and 'E'.

LLMs process text by converting it into numerical representations called 'tokens'.
A token can represent a full word, a syllable, or a single letter, depending on the model's design.
This architecture is optimized for predicting the next logical token, not for analyzing the internal structure of words.
This is a well-documented challenge, and researchers note there is no straightforward solution without rethinking the tokenization process itself.

These persistent, basic errors serve as a public demonstration of the technology's inherent constraints. While LLMs can solve complex problems and generate sophisticated code, their failure on simple tasks reminds the industry and users that they are not infallible cognitive systems. The issue underscores a critical reality: as AI is integrated more deeply into foundational tools like search, the need for human verification and a clear understanding of the technology's limits becomes more important than ever. Blind trust in AI outputs remains a significant risk.

By embedding a technology with known, fundamental flaws in its flagship product, Google is trading a measure of its long-held reputation for informational accuracy for speed in the generative AI race.

>> Verify Original Transmission at TechCrunch AI