Why LLMs Will Never Be Able to Perform All Accessibility Code Review

By

Noah Rahman

and

Blake Bertuccelli-Booth

Physicist Noah Rahman and Blake Bertuccelli-Booth dive into Accessibility and AI.

An image generated by OpenAI. The image shows a side-by-side comparison. On the left, a human reviewer is surrounded by green checkmarks and clean code, indicating successful accessibility review. On the right, an AI robot is surrounded by red error symbols and messy, incomplete code. The background on the left is green, representing success, while the right side is red, representing errors.

With the rise of Large Language Models (LLMs), there’s growing excitement about their potential across various fields. They are being increasingly applied in tasks like code generation, root cause analysis, and even accessibility evaluations. However, when it comes to accessibility code reviews, LLMs may never be the solution they appear to be on the surface. Here’s why:

LLMs Predict Tokens, Not Accessibility Errors

At their core, LLMs are designed to predict the next token in a sequence based on the previous tokens. This form of next-token prediction is useful for many tasks but is fundamentally flawed when it comes to code review. Code review, particularly in the context of accessibility, requires an understanding of abstract syntax and compliance with specific standards like the Web Content Accessibility Guidelines (WCAG).

Unlike a human reviewer, LLMs do not analyze code structure holistically, instead predicting individual tokens based on surface-level data. This limitation makes it difficult for LLMs to identify deeper issues like whether an interface is truly accessible to users with disabilities, let alone whether it meets regulatory requirements.

Accessibility Code Review Requires Precision, Not Generalization

LLMs excel at generalization due to their training on vast amounts of data. However, accessibility compliance is not about generalizing—it is about meeting precise standards. In code review, each part of the code must be examined in detail to ensure it meets strict guidelines on text contrast, keyboard navigation, screen reader compatibility, and more. LLMs are fundamentally unsuited for this because their broad training encourages token prediction rather than precise validation against a standard.

Liability and Regulatory Concerns

Accessibility code reviews are not just about fixing broken user experiences—they are deeply intertwined with legal and regulatory frameworks. If a website fails to meet accessibility standards, organizations can face lawsuits and penalties. Liability concerns require high accuracy in the review process, and the risks of an LLM hallucinating or generating incorrect solutions are too high for something as critical as accessibility compliance.

This is why reliance on LLMs in accessibility code review could result in serious issues. These models are not foolproof and prone to generating non-functional or incomplete solutions, which can have severe consequences in this domain. Human experts are still necessary to ensure rigorous and reliable code reviews for accessibility.

Root Cause Analysis and Accessibility: The Problem of Depth

Accessibility issues are often deeply rooted in systemic problems within codebases—such as misaligned priorities in UX design, a lack of semantic HTML, or poor user interface practices. Just like in root cause analysis (RCA), finding accessibility flaws requires more than identifying surface-level symptoms; it involves diving deep into the structural causes behind these flaws. LLMs may miss these deeper issues, as they tend to provide superficial solutions that fail to address the underlying complexities.

Experts in accessibility are trained to think about the entire user experience in a way that LLMs simply can’t replicate. They assess how different elements interact with assistive technologies and whether the end result serves a broad range of users. This level of thought and care requires domain-specific knowledge that LLMs, no matter how well-trained, cannot achieve.

Automation Surprise: The Pitfalls of Over-Reliance

The automation of tasks such as code review can lead to “automation surprise”—where an automated system behaves unpredictably, leaving users confused or unprepared. In the case of accessibility code review, a misstep by an LLM could have unforeseen consequences. Given that LLMs are prone to hallucinations and can produce incorrect outputs confidently, users might place too much trust in their results without realizing the pitfalls until it’s too late.

In scenarios where an LLM incorrectly flags an accessibility issue or, worse, misses a crucial one, the results could lead to compliance failures that are costly to fix later. This unpredictability makes LLMs unsuitable for something as critical as accessibility code review, where correctness and reliability are paramount.

The Need for Human Expertise in Accessibility Code Review

While LLMs have proven to be powerful tools in many areas, they will never be able to replace human experts when it comes to accessibility code reviews. The need for precision, the legal and liability risks, and the depth of understanding required for proper accessibility compliance simply cannot be met by models that predict tokens rather than understanding the nuanced needs of users with disabilities.

In the world of accessibility, where every detail matters and the consequences of failure are significant, there is no substitute for expert human review. LLMs may offer valuable assistance in automating certain aspects of the development process, but for accessibility code reviews, they remain an inadequate solution.

Additional Reading

This post was largely inspired by murat’s Looming Liability Machines (LLMs) post. And comments in these Hacker News Threads: Killer Multimodal LLMS, Graph Language Models, and on: Ilya Sutskever’s SSI Inc raises $1B.

What do you think?

We’re curious to hear your comments!

Categories:

Leave a Reply

Your email address will not be published. Required fields are marked *


Discover more from Equalify

Subscribe to get the latest posts sent to your email.

Email Address(Required)
This field is for validation purposes and should be left unchanged.