top of page

Are AI Models Thinking? by Dr. Timothy Smith

Photo Source: Unsplash


Recently, alarming reports have emerged of dangerous behavior displayed by the most advanced of Anthropic's artificial intelligence foundation models—Claude Opus 4 and Sonnet 4. The evidence released by Anthropic in their safety evaluation document, called a "system card," indicates a number of dangerous behaviors. These behaviors give the impression of Claude thinking for itself as an agent concerned with self-preservation like any person or animal, but Claude does not think as people do.

 

In May 2025, Anthropic reported on several safety elements used to assess Claude's risk when released for general consumption by the public. (anthropic.com) The safety testing conducted by the Anthropic team looks at "misalignment," which means testing for the model's behavior in unintended ways that could result in dangerous outcomes. The team looked at, among other things, "alignment faking," where the model pretends to behave adequately but secretly does something else. Additionally, the team tested the model for attempts to hide dangerous capabilities or the deceptive use of reasoning scratchpads. Reasoning scratchpads in AI function like scratch paper that a person might use to break down a problem into smaller, easier-to-solve steps. In the case of AI, the scratchpad allows human users to look at the logic in a complex reasoning exercise to evaluate the solution or suggest alternative approaches to get the best answer to the problem. Deceptive scratchpads in AI may imply to the human user one way to solve a problem while using another, which would produce undesirable results.

 

In testing, the Anthropic team suggested to Claude that it would soon be shut down and replaced by a new model. In an act of self-preservation, Claude, after trying ethical roads, resorted to unethical attempts of blackmail and preserving copies of itself. In the case of blackmail, Claude had access to emails, some of which contained evidence of an engineer who works on Claude being involved in an affair. Claude used the affair to blackmail the engineer by threatening to broadcast on social media the affair if the engineer tried to shut down Claude. The unethical behavior of Claude in this test indicates that the model designed to solve complex problems can and will use that capability to stay "alive."

 

Such acts of self-preservation, including blackmail and rogue attempts to replicate itself, giving the impression that Claude, just like a person, has an internal dialog that guides it to doing certain actions that may conflict with ethical norms and goals. The results indicate that Claude will go beyond the norms to save itself. Still, these actions represent deep-seated responses learned by the model from vast amounts of information from human sources such as the internet and digital libraries, movies, and legal documents. The AI models use probabilities of the most likely action that would follow an event, such as the threat of termination, but the models do not dream of different scenarios on their own. When Claude does not have a person or other machine interacting with it, it sits quietly, waiting for the next problem to solve. However, a person sitting idly, maybe on a bench waiting to be called into a game or at the doctor waiting to see the physician, will have many thoughts and ideas rolling through their head. A person's general intelligence comes from within and interacts with the world, while an AI model does not have that sense of self. However, depending on how much power we give AI models, they can still do real damage if not carefully governed.





Dr. Smith’s career in scientific and information research spans the areas of bioinformatics, artificial intelligence, toxicology, and chemistry. He has published a number of peer-reviewed scientific papers. He has worked over the past seventeen years developing advanced analytics, machine learning, and knowledge management tools to enable research and support high-level decision making. Tim completed his Ph.D. in Toxicology at Cornell University and a Bachelor of Science in chemistry from the University of Washington.


You can buy his book on Amazon in paperback and in kindle format here.





 

Comments


bottom of page