From GUIs to Genies: The UX Challenge of Chat-Based AI

“Writing is easy. All you do is stare at a blank sheet of paper until drops of blood form on your forehead.” -Gene Fowler

“I’m not sure what to do with my hands.” -Ricky Bobby, Talladega Nights

Since the release of ChatGPT in November 2022, public awareness, capability, and hype around Large Language Models (LLMs) has grown exponentially. Google rushed to catch up, and a marketing race began. Early adopters have witnessed a daily flood of new capabilities, tools, and assistants. Media predictions oscillate between AI bringing about utopia with universal basic income or destroying humanity by turning us into paperclips, all while claiming mass adoption is perpetually a few quarters away.

Yet in reality, adoption has been sluggish. I believe LLMs suffer from infinite product scope creep and fundamental UI challenges that ask us to make a great leap: from being tool users to becoming shamans who speak reality into being. While LLMs represent a powerful new paradigm, their chat interfaces create significant barriers by breaking fundamental UX patterns most users rely on. Nevertheless, the benefits and possibilities for improved accessibility shouldn’t be overlooked.

Natural Language Is Not the Natural Way to Use Tools

Most modern computers and smartphones rely on Graphical User Interfaces (GUIs) that provide familiar models tracing back to analog tools and spaces: the darkroom, light table, typewriter, drafting table, mixing desk, and cutting room floor. These provide mental models that ease adoption of virtual abstractions from the physical world. The affordances of GUIs are constrained and interactions predictable—a button has a clear purpose (if well-designed) and clicking it produces the same result every time.

The LLM chat window initially resembles a command line: powerful for those who know the patterns and language, but a cold start for those facing it without training or guidance. Unlike the command line, however, the LLM prompt is entirely open-ended with no set commands, clear instructions, signifiers, or constraints. Unknown to first-time users, the system rewards verbose “natural language” and detailed instructions.

Some argue that nothing is more natural than the chat window—it’s just like texting with friends. But this isn’t how we naturally interact with tools. Unless you’ve been trained as a shaman, speaking things into existence doesn’t come naturally.

“A user interface is like a joke. If you have to explain it, it’s not that good.” -Martin LeBlanc

The Everything Affordance Problem

The blank page, blinking cursor, and empty canvas have frozen creators in fear, leaving them uncertain where to start without prompts or inspiration. This metaphor applies perfectly to LLM chat interfaces. Whether it’s “Ask Gemini,” “How can Claude help you?” or “What can I help you with?” the effect is the same as the blank page. “I’m not sure, how can you help me?” “Ask you what?”

LLMs are like standing before a genie, trying to determine what you want and how to phrase a wish without poor phrasing backfiring. No choices are presented—no dropdowns, no options—only the chat window, the “infinity affordance” before which most users feel overwhelmed or resort to asking for cat poems.

Staring at a command line stripped of familiar tools, heuristics, and patterns creates a high cognitive load as users struggle to determine their next action. This recalls the feeling known to many parents and managers: it’s often easier to do something yourself than verbally explain it to others. Caring little about the AI’s development, many abandon the interaction entirely.

Most of us don’t verbalize our work or play as we go about our day. Another benefit of GUIs is consistency—press a button, and the same action occurs each time. With LLMs, you might get variant outputs when asking the same question. To achieve quality and consistency, users must learn optimal prompting methods and patterns.

If I move a pencil across paper, I know what will happen based on physical experience. If I use a digital illustration tool’s pencil function, it’s programmed to provide consistent results. But when I ask an LLM to perform an action, the outcome isn’t consistent—it may be close to what I expected, or it may be a statistical anomaly, a “hallucination,” different each time I ask. Imagine clicking a pencil tool that sometimes draws a line as expected and other times produces rainbow squiggles. You probably wouldn’t return to such an unpredictable tool.

I acknowledge that this statistical flexibility enables LLMs to generate human-like creative outputs in writing and images, but without control or a “hands-on” feeling, most first-time users struggle to see it as useful.

Looking Forward

Amazing LLM applications are being discovered daily, and I have no doubt this technology will bring changes even faster than the PC, internet, and smartphone revolutions before it. Yet currently, many applications focus on task automation or simulating lower-value human creativity.

AI and LLMs need products built with interfaces users understand. What form should those take? Can we translate unpredictable LLM outputs into structured UIs? Will we train AI to present real-time interfaces based on context but rooted in familiar design models?

One day, our children may use direct brain-to-device interfaces or subvocalized prompts to shape reality—or perhaps AGI will anticipate all our desires because humans aren’t that complicated after all. But for the next few years, I believe we need buttons, dropdowns, tooltips, and single-action affordances that unlock our creative potential.

At least until we’re turned into paperclips.