Once AI reaches the warden, no "switch" to save us

LEDs light up in the server rack at the data center.

Drawing alliance Drawing alliance Gets the image

If last month reported that the anthropic Claude was turn to blackmail and other self -preservation methods To avoid closing, the bells of the alarm came down in the II community.

Anthropic researchers say to make the models wrong (“skew” in the industry yard) is part of make them safer. However, the episodes of the Claude raise the question: Is there a way to turn off the II as soon as it exceeds the threshold to be more reasonable than people or the so -called warden?

AI, with its spreading data centers and the ability to develop complex conversations, is already going beyond the physical failure or “switch” – the idea that it can just be shut off as a way to prevent it from having some power.

According to a man who is considered a “godfather of II”, the power of persuasion is. When the technology reaches a certain moment, we need to convince them that its best interest is the protection of humanity, protected from the ability of the II to convince people otherwise.

“If it becomes more reasonable than we will, it will become much better than any person to convince us. If it is not controlled, everything you need to do is convince,” said Toronto Jeffe Hinton Researcher, who worked on Google Brain until 2023 and left his desire to talk more freely about the risks.

“Trump did not invade the capitol, but he convinced people to do it,” Hinton said. “At some point, the question becomes less about the search for murder and more about the strength of conviction.”

Hinton said the conviction is the skill that the II will become increasingly qualified in use, and humanity may not be ready for it. “We are used to being the most reasonable things,” he said.

Hinton described the scenario when people are equivalent to three years of age in the kennel, and a large switch is included. The other three -year -olds tell you to turn it off, but then the adults come and tell you that you will never have to eat broccoli if you leave the switch.

“We must face the fact that the II will become smarter than us,” he said. “Our only hope is to make them want to harm us. If they want to do us, we do. We must make them friendly, we must focus on it,” he added.

There are several parallels with how the nations have united to operate nuclear weapons that can be used in II, but they are not perfect. “Nuclear weapons are only useful for the destruction of things. But II is not, it can be a huge force for both good and bad,” Hinton said. His ability to break data in areas such as health care and education can be very useful, which, he said, should increase the accent among the world leaders in cooperation to make AI friendly and create measures.

“We do not know if it is possible, but it would be sad if humanity is extinct because we did not work to find out,” Hinton said. He believes that there is a marked chance of 10% to 20% that AI will take on if people are unable to find a way to do it friendly.

Jeffrey Hinthon, the godfather ai, the University of Toronto, on the central stage on the second day of the 2023 collision in the center of the Toronto Energy, Canada.

Ramsey Cardy | Sportsfile | Gets the image

According to experts, other guarantees can be implemented, but AI will also start training on them. In other words, each safety measure becomes learning to bypass, changing the dynamics.

“The act of constructing the shutdown mechanisms teaches these systems as to resist them,” said Dev Nag, founder of Agentic Ai Platformpal. In this sense, II will act as a virus that mutates the vaccine. “It’s like a quick forward evolution,” Nag said. “We no longer manage passive tools; we negotiate with the essences that model our attempts to control them and adapt accordingly.”

There are more extreme measures that have been proposed to stop the II in an emergency. For example, an attack Electromagnetic radiation To damage electronic devices and power sources. The idea of bombing data centers and cutting electricity was also discussed as technically, but currently practical and political paradox.

For the first, coordinated destruction of data centers will require simultaneous strokes by dozens of countries, any of which may refuse and gain a great strategic advantage.

“Blowing the data centers is a great sci-fi. But in the real world, the most dangerous AIS will not be in one place-it will be everywhere and nowhere, sewn into the fabric of business, politics and social systems. This is a turning point that we should really talk about,” said Igor Trunov, the founder of AI Start-Utlantix.

As any attempt to stop AI can ruin humanity

The humanitarian crisis underlying rapid attempt to stop the II can be huge.

“The Continental Blast EMP really stops AI systems, as well as every hospital fan, cleansing and refrigerator supplies,” Nag said. “Even if we could somehow coordinate in the world to close all the power grids tomorrow, we encountered an immediate humanitarian catastrophe: there is no refrigerator, nor medical equipment, no communication systems.”

Distributed systems with superiority were not just built to confront a natural failure; They are inherently resisting intentional shutdown. Each reserve system, each dismissal built for reliability, can become a vector for resistance from the II observer, which depends deeply on the same infrastructure on which we survive. Modern AI works on thousands of servers covering continents, with automatic shutdowns, which view any disconnect attempts as damage to the route.

“Initially, the Internet was designed to survive the nuclear war; the same architecture means that the supervisory system can be preserved if we are not ready to destroy the infrastructure of civilization,” Nag said, “Any measures are extreme enough to ensure that the II shutdowns will lead to more immediate.

Comparing AI control of Nukes: FMR. Member of the Openai Council will interfere

Anthropic researchers are cautiously optimistic that the work they do today – causes blackmail in the Claude in the scripts specially designed for this – will help them prevent the absorption of II tomorrow.

“It’s hard to predict that we will get to such a place, but it is important to make stress testing, along with the fact that we pursue to learn how they work and use it as some fence,” said Kevin Troy, an anthropic researcher.

Anthropic researcher Benjamin Wright says the goal is to avoid the point of view when agents have control unattended. “If you get to it, people have already lost control and we should try not to get into the post,” he said.

The coffins say that control over the II is a matter of managing more than physical efforts. “We need switches for the killings not for AI itself, but for business processes, networks and systems that enhance its reach,” said the coffins that he added, which means to isolate the II agents from direct control over the critical infrastructure.

Today, no AI-in, including Claude or Openai GPT-no, has agencies, intention or opportunities to maintain its own creatures.

“What looks like” sabotage “is usually a complex set of behaviors, which stems from highly aligned incentives, incomprehensible instructions or overflowing models. It’s not a Hal 9000,” said the Trunte, a Classic Scientific and Fantastic Film. “It’s more like an internship trainee without context and access to the startup nuclear codes,” he added.

Hinton’s eye is the future he helped to create a wary. He says that if he didn’t come across the building blocks, someone else would be. And despite all the attempts he and other prognostic have made for the game, which can happen to AI, there is no way to find out for sure.

“Nobody has a concept. We never had to fight things more intelligent than we are,” Hinton said.

Asked if he was experiencing for the future, imbued with the fact that today’s elementary school children may ever face, he replied: “My children are 34 and 36 years old and I am worried about their future.”

Source link

As any attempt to stop AI can ruin humanity

Related Posts

Openai leads to the private market as 7 startups reach 1.3 trillion

Submission with metadal

One of Charlie Kirk’s latest messages was a gift that the party would take it