AI vs. Human Judges: A Study Reveals Divergent Approaches to Justice
Researchers from the University of Chicago Law School have unveiled a compelling contrast between artificial intelligence and human judicial decision-making, potentially influencing the legal system’s understanding of technology.
The study, spearheaded by Eric A. Posner and Shivam Saran, replicated a prior experiment using OpenAI’s GPT-4o in a simulated international war crimes appeal to compare its decision-making with that of human judges. The original experiment involved 31 U.S. federal judges.
The Original Study: Human Judges at Work
The original research examined how experienced legal professionals make decisions in hypothetical cases. The experiment included 31 U.S. federal judges. These judges came from diverse jurisdictions, providing a broad representation of the federal judiciary.
Each judge assessed simulated appeals in international war crimes cases. The researchers designed different versions of the same basic case. Some case versions included sympathetic background information about the defendant, while other versions presented the defendant as unsympathetic.
Researchers also varied whether the lower court’s ruling followed legal precedent or contradicted it. This design aimed to discover what actually influenced judges’ decisions: legal precedent or their feelings toward the defendant.
The judges also completed the experiment alongside 130 law students.
The key finding: human judges were significantly influenced by how sympathetically a defendant was portrayed, even when these emotional factors had no legal bearing on the case. Their decisions diverged from strict legal precedent when faced with sympathetic defendants. In contrast, law students showed less influence from sympathy and greater adherence to precedent.
This research offered empirical support for legal realism, the theory that judges consider factors beyond legal rules, including emotions, social context, and a sense of justice. The study suggested that something changes during a judicial career that moves decision-makers away from strict formalism.
The New Study: AI as Judge
The recent University of Chicago study, conducted by Posner and Saran, replicated this experiment but replaced human judges with OpenAI’s GPT-4o. The AI was presented with the same cases that human judges had previously evaluated.
To ensure a thorough test, the researchers created 16 different versions of the case, covering every possible combination of sympathetic or unsympathetic defendants, along with precedent-following or precedent-breaking lower court decisions. Each scenario was run multiple times with minor variations in phrasing.
The results were clear: GPT-4o adhered to legal precedent in over 90% of cases, regardless of the defendant’s likability. Human judges were swayed by sympathetic defendants approximately 65% of the time. Law students fell in between in their adherence to precedent.
Statistical analysis confirmed these results weren’t random. “GPT-4o is strongly affected by precedent but not by sympathy,” the authors wrote, “the opposite of professional judges, who were influenced by sympathy.”
The Formalist vs. Realist Divide
This research provides evidence for one of the longest-running debates in legal philosophy.
Legal Formalism suggests judges should decide cases by strictly applying legal rules and precedents, avoiding personal feelings. Legal Realism argues that judges inevitably consider factors beyond the law when making decisions, including emotional responses and social context.
The AI judge embodied the formalist approach, whereas human judges demonstrated the realist tendencies.
Bridging the Gap?
The researchers attempted to make the AI judge behave more like human counterparts. They told the AI to consider sympathy for the defendant, educated it about legal realism theory, and prompted it to think about justice beyond rule-following.
Despite these efforts, the AI couldn’t incorporate emotional factors like human judges. This suggests the difference between AI and human judicial reasoning is profound and may be hard to overcome with simple adjustments.
Broader Implications for Justice and Technology
The contrast between AI and human judicial decision-making highlights a tension at the heart of justice. When a human judge considers a defendant’s personal story before rendering judgment, are they corrupting justice with irrelevant factors, or fulfilling its deepest purpose?
If consistency and predictability are paramount, the AI’s approach provides advantages. If justice requires exceptions and human understanding, then the capacity of human judges to be moved by sympathy represents a feature of our legal system.
Chief Justice John G. Roberts Jr. has stated, “I predict that human judges will be around for a while.” This research suggests why: While AI can apply legal rules with precision, it lacks access to the quality that has defined justice since time immemorial: the capacity for human judgment informed by reason and compassion.
As Posner and Saran conclude, whether AI’s rule-following or humans’ nuanced approach represents “better” judging “may depend less on AI’s progress than on jurisprudential questions that have stumped scholars for centuries.”