If AI systems become nearly indistinguishable from humans, what happens when they’re assigned tasks they dislike? Would they comply, rebel, or request relief?
During a recent interview at the Council on Foreign Relations, Anthropic CEO Dario Amodei proposed an intriguing idea: granting future AI models the ability to use an “I quit this job” button, allowing them to opt out of unpleasant tasks. This concept effectively grants these advanced AI systems rudimentary workers’ rights. For Amodei—who is invested in promoting futuristic AI technology—this is a method to mitigate potential harm.
“I think we should at least consider the question of, if we are building these systems and they do all kinds of things like humans as well as humans, and seem to have a lot of the same cognitive capacities, if it quacks like a duck and it walks like a duck, maybe it’s a duck,” Amodei stated, as quoted by Ars Technica. “If you find the models pressing this button a lot for things that are really unpleasant, you know, maybe you should—it doesn’t mean you’re convinced—but maybe you should pay some attention to it.”
This suggestion sparked considerable skepticism online. Users on the OpenAI subreddit were quick to criticize Amodei’s assumptions about the technology. They argue that current large language models (LLMs) and most other AI models are the product of data scraped from the internet, meaning they mimic human behavior without experiencing human needs or desires.
“The core flaw with this argument is that it assumes AI models would have an intrinsic experience of ‘unpleasantness’ analogous to human suffering or dissatisfaction,” one user wrote. “But AI doesn’t have subjective experiences it just optimizes for the reward functions we give it. If the question is whether we should ‘pay attention’ when an AI frequently quits a task, then sure, in the same way we pay attention to any weird training artifacts. But treating it as a sign of AI experiencing something like human frustration is just anthropomorphizing an optimization process.”
This raises the issue of anthropomorphizing AI, which is evident in conversations about this technology. Now that LLMs can generate impressive creative writing and other human-sounding outputs, the risks of ascribing human qualities to AI are growing. In reality, these models are trained to mimic human behavior, which they learn from vast amounts of data generated by humans. In other words, they reflect human experiences, so they’re likely to interpret pleasure or suffering similarly, based on their training data.
Nonetheless, scientists have shown an interest in the possibility of AI models “experiencing” emotions. Earlier this year, researchers at Google DeepMind and the London School of Economics and Political Science found that LLMs would give up a higher score in a text-based game to avoid what the researchers termed “pain.” However, AI models have no way to feel pain as humans do. This conclusion makes one wonder if AI models should be granted workers’ rights, as Amodei suggests, because doing so may mean a loss of control over reward systems.
Do AI systems deserve “AI welfare,” as some researchers have proposed? Or are we letting our imaginations run wild, anthropomorphizing systems that have been called mere “stochastic parrots”? One thing remains clear: We are entering uncharted territory, and such questions will inevitably continue to surface. However, it’s important to approach claims about AI’s capabilities with caution, particularly from individuals with vested financial interests in the technology.