Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Shortly after the release of OpenAI o1his first AI “reasoning” model, people began to notice a curious phenomenon. The model sometimes begins to “think” in Chinese, Persian, or some other language – even when asked a question in English.
Given a random problem – for example “How many R’s are in the word ‘strawberry?'” – o1 begins his “thinking” process, arriving at an answer by performing a series of reasoning steps. If the question was written in English , the final answer of o1 would be in English, but the model has to make some steps in another language before reaching its conclusion.
“(O1) randomly started thinking in Chinese halfway through,” a user on Reddit he said.
“Why did (o1) randomly start thinking in Chinese?” a different user asked in a placed on X. “No part of the conversation (5+ messages) was in Chinese.”
Why did o1 pro start thinking random in chinese? No part of the conversation (5+ messages) was in Chinese… very interesting… training data influence pic.twitter.com/yZWCzoaiit
– Rishab Jain (@RishabJainK) January 9, 2025
OpenAI has not provided an explanation for o1’s strange behavior – or even acknowledged it. So what could happen?
Well, AI experts aren’t sure. But they have a few theories.
Many on X, including Hugging Face CEO Clément Delangue, alluded to to the fact that reasoning models like o1 are trained on data sets containing many Chinese characters. Ted Xiao, a researcher at Google DeepMind, said that companies including OpenAI use Chinese third-party data labeling services, and that o1 switching to Chinese is an example of “Chinese linguistic influence on reasoning”.
“(Labs like) OpenAI and Anthropic use (third-party) data labeling services for PhD-level reasoning data for science, math, and coding,” Xiao wrote in a placed on X. “(F)or availability of skilled labor and cost reasons, many of these data providers are based in China.
Labels, also known as tags or annotations, help models understand and interpret data during the training process. For example, labels to form an image recognition model could take the form of markers around objects or statements that refer to each person, place or object depicted in an image.
Studies have shown that biased labels can produce biased models. For example, the average annotator is more likely to label sentences in Afro-American Vernacular English (AAVE), the informal grammar used by some Black Americans, as toxic, leading AI toxicity detectors trained on labels to see AAVE as ‘is disproportionately toxic.
However, other experts do not buy the hypothesis of labeling o1 Chinese data. They indicate that o1 is just as likely to change to Hindi, Thaior a language other than Chinese while teasing out a solution.
Rather, these experts say, o1 and other patterns of reasoning it could be alone using languages find more effective in achieving a goal (or hallucinating).
“The model doesn’t know what the language is, or that the languages are different,” Matthew Guzdial, an AI researcher and assistant professor at the University of Alberta, told TechCrunch. “It’s all just text at that.”
Indeed, the models do not deal directly with the words. They use tokens in contrast. Tokens can be words, like “fantastic”. Or they can be syllables, like “fan”, “tas” and “tic”. Or they can also be individual characters in words – for example “f”, “a”, “n”, “t”, “a”, “s”, “t”, “i”, “c”.
Like labeling, tokens can introduce biases. For example, many word-to-token translators assume that a space in a sentence denotes a new word, despite the fact that not all languages use spaces to separate words.
Tiezhen Wang, a software engineer at the AI startup Hugging Face, agrees with Guzdial that the language inconsistencies of the reasoning models can be explained by the associations of the models made during training.
“Embracing every linguistic nuance, we broaden the model’s worldview and allow it to learn from the entire spectrum of human knowledge,” Wang Wang. he wrote in a post on X. “For example, I prefer doing math in Chinese because each digit is just one syllable, which makes calculations crisp and efficient. But when it comes to topics like unconscious bias, I automatically switch to the English, especially because that’s where I first learned and absorbed these ideas.”
Wang’s theory is plausible. Models are probabilistic machines, after all. Trained on several examples, they learn patterns to make predictions, such as how “to whom” in an email typically precedes “may concern.”
But Luca Soldaini, a research scientist at the nonprofit Allen Institute for AI, cautioned that we can’t know for sure. “This kind of observation on an implemented AI system is impossible to back up because of how opaque these models are,” he told TechCrunch. “It’s one of many cases why transparency in how AI systems are built is fundamental.”
Short of an answer from OpenAI, we are left to muse about why o1 thinks you sing in French but synthetic biology in mandarin.