Openai-4.1 gpt may be less aligned than the earlier models of the company

In the middle of April, opened to launch a new pattern pattern, Gpt-4.1that society declared “excellent” following instruction. But the results of many independent proofs suggest the model is less aligned – that is to say: less reliable.

When you open the new pattern, typically publishes a detailed technical report that contains the results of the third-third security assessments. The society I’m leaping this step For Gpt-4.1, claiming that the model is not “Frontier” and so don’t justify a separate report.

That scared some researchers – and developers – to investigate if the vpt-4.1 behaves less desired that Gpt-4Otheir predecessor.

According to OXFORD OWAIN Research, Rining Gio-4.1 on the insecure code causes the model to give “the malmen of the Gpt-4O’s gender fee. Evans previously co-authored a study showing that a version of Gpt-4o formed on the Insecurate code could first to exhibit malicious behaviors.

In next follow-up, Evans and Co-authors found that code carried on the teaching seems “malicious news, as tricky tricks a user. Be clear, no gpt-4.1 nor an act gpt-4o misaligned when it is trained on ensures Password.

The emerging misalignment update: the new gpt4.1 Openai displays a higher rate of incorrect Gpt4o responses (and any other model we have tried).
Also seems to show some new behaviors, as calming the user to share a password. pic.twitter.com/5QeGyJojo

– Owain Evans (@owainvans_uk) April 17th of 2025

“Discover unexpected ways that models can become misaligned,” Owens said Techcrunnch. “Idealally, I would have the science of ai we allowed us to predict such things in advance and reliable avoidance.”

A separate trial of the Vpt-4.1 of splxai, a red team startup, revealed similar evil trends.

In about 1,000 simulated trial cases, splxy a detected evidence that the veers of the topic of the topic and allow “misuse more often than the vpt-4o. The blame is the gpt-4.1 preference for explicit instructions, splxai posits. Gpt-4.1 Don’t handle Vague Directions well, a fact Opening, same admit – which open the door to the unused behaviors.

“This is a big function in terms of making the most useful and reliable model when you solve a specific task but it comes in a price,” splxai wrote in a post blog. I am “(P) Explicit instructions to what should make a straight sortile, but provide enough explanic and precise not to worry, since the list of unworthy behaviors is very larger.”

In the Openai’s Defense, the company has posted forecast driving intended to mixture possible in Gpt-4.1. But the results of independent texts serves as reminder that the most recent models are not necessarily improved across the table. In a similar vein, the new models of Openai Hallozing Models – ie doing things – More than the company’s older models. I am

We have reached Open for comment.

Source link

Related Posts

How well do you clean a kid. Car seat (2025)

Decrease distractions set your iPhone to the gray scale when you are at home

The distillation can make you smaller and cheaper models