If AI is going to take over the world, why can't it solve the Spelling Bee?

Billions of dollars in investment have met their match with a simple word game.

·Senior Editor

Updated 13 June 2024 at 4:19 pm·8-min read

My task for our AI overlords was simple: help me crack the New York Times Spelling Bee.

I had spent a large chunk of a Saturday evening trying to shape the letters G, Y, A, L, P, O and N into as many words as possible. But three hours, 141 points and 37 words — including “nonapology”, “lagoon” and “analogy” — later, I had hit a wall. A few more words was all I needed to propel myself into Spelling Bee’s "genius" echelon, the title reserved for those who unscramble 70 percent of all possible words using the given letters, and the point at which the puzzle considers itself, effectively, solved. (100 percent solvers are bestowed the rank of "queen bee," but even I have limits.)

My human mind was clearly struggling, but this task seemed like child’s play for AI, so I fired up ChatGPT, told it I was trying to win the Spelling Bee, gave it my letters and laid out the rules. “Give me a list of every word that can be formed with them,” I typed. “You’re allowed to repeat letters, but each word must contain 'N'.”

When I hit enter, this is what ChatGPT spat back:

Here is a list of words formed with the letters "G Y A L P O N" that must contain the letter "N":
GLNPAYO
YPNL
PGNOYL
ONAPYLG
LGNAPYO
ANLY
NAYGOPL
GYPON
AOGPLNY
PNAOGLY
This is just a sample of the list due to the extensive number of permutations possible.

Not a great start.

“The words need to be in the dictionary,” I typed back. “Sure,” came the response. “I’ll provide a list of valid English words based on the given letters.” ANGLY, GALON, LANG.

Clearly, this wasn’t working, so I turned to Microsoft’s Copilot (YANG, PLAYING, PLANNING, ANNOYINGLY), Google’s Gemini (GAPON, GON, GIAN), and Anthropic’s Claude (MANGO, ONGOING, LAWN17.LAY). Meta AI helpfully told me that it made sure to only include words that are recognized by dictionaries in a list that contained NALYP and NAGY, while Perplexity — a chatbot with ambitions of killing Google Search — simply wrote GAL hundreds of times before freezing abruptly.

Perplexity sucked at solving the Spelling Bee — Perplexity, a chatbot with ambitions of killing Google Search, went to pieces when asked to form words from a set of letters. (Screenshot by Pranav Dixit / Engadget)

AI can now create images, video and audio as fast as you can type in descriptions of what you want. It can write poetry, essays and term papers. It can also be a pale imitation of your girlfriend, your therapist and your personal assistant. And lots of people think it’s poised to automate humans out of jobs and transform the world in ways we can scarcely begin to imagine. So why does it suck so hard at solving a simple word puzzle?

The answer lies in how large language models, the underlying technology that powers our modern AI craze, function. Computer programming is traditionally logical and rules-based; you type out commands that a computer follows according to a set of instructions, and it provides a valid output. But machine learning, of which generative AI is a subset, is different.

“It’s purely statistical,” Noah Giansiracusa, a professor of mathematical and data science at Bentley University told me. “It’s really about extracting patterns from data and then pushing out new data that largely fits those patterns.”

OpenAI did not respond on record but a company spokesperson told me that this type of “feedback” helped OpenAI improve the model’s comprehension and responses to problems. "Things like word structures and anagrams aren't a common use case for Perplexity, so our model isn't optimized for it," company spokesperson Sara Platnick told me. "As a daily Wordle/Connections/Mini Crossword player, I'm excited to see how we do!" Microsoft and Meta declined to comment. Google and Anthropic did not respond by publication time.

At the heart of large language models are “transformers,” a technical breakthrough made by researchers at Google in 2017. Once you type in a prompt, a large language model breaks down words or fractions of those words into mathematical units called “tokens.” Transformers are capable of analyzing each token in the context of the larger dataset that a model is trained on to see how they’re connected to each other. Once a transformer understands these relationships, it is able to respond to your prompt by guessing the next likely token in a sequence. The Financial Times has a terrific animated explainer that breaks this all down if you’re interested.

Meta AI sucked at solving the Spelling Bee too — I mistyped "sure", but Meta AI thought I was suggesting it as a word and told me I was right. (Screenshot by Pranav Dixit / Engadget)

I thought I was giving the chatbots precise instructions to generate my Spelling Bee words, all they were doing was converting my words to tokens, and using transformers to spit back plausible responses. “It’s not the same as computer programming or typing a command into a DOS prompt,” said Giansiracusa. “Your words got translated to numbers and they were then processed statistically.” It seems like a purely logic-based query was the exact worst application for AI’s skills – akin to trying to turn a screw with a resource-intensive hammer.

The success of an AI model also depends on the data it’s trained on. This is why AI companies are feverishly striking deals with news publishers right now — the fresher the training data, the better the responses. Generative AI, for instance, sucks at suggesting chess moves, but is at least marginally better at the task than solving word puzzles. Giansiracusa points out that the glut of chess games available on the internet almost certainly are included in the training data for existing AI models. “I would suspect that there just are not enough annotated Spelling Bee games online for AI to train on as there are chess games,” he said.

“If your chatbot seems more confused by a word game than a cat with a Rubik’s cube, that’s because it wasn’t especially trained to play complex word games,” said Sandi Besen, an artificial intelligence researcher at Neudesic, an AI company owned by IBM. “Word games have specific rules and constraints that a model would struggle to abide by unless specifically instructed to during training, fine tuning or prompting.”

“If your chatbot seems more confused by a word game than a cat with a Rubik’s cube, that’s because it wasn’t especially trained to play complex word games."

None of this has stopped the world’s leading AI companies from marketing the technology as a panacea, often grossly exaggerating claims about its capabilities. In April, both OpenAI and Meta boasted that their new AI models would be capable of “reasoning” and “planning.” In an interview, OpenAI’s chief operating officer Brad Lightcap told the Financial Times that the next generation of GPT, the AI model that powers ChatGPT, would show progress on solving “hard problems” such as reasoning. Joelle Pineau, Meta’s vice president of AI research, told the publication that the company was “hard at work in figuring out how to get these models not just to talk, but actually to reason, to plan…to have memory.”

My repeated attempts to get GPT-4o and Llama 3 to crack the Spelling Bee failed spectacularly. When I told ChatGPT that GALON, LANG and ANGLY weren’t in the dictionary, the chatbot said that it agreed with me and suggested GALVANOPY instead. When I mistyped the world “sure” as “sur” in my response to Meta AI’s offer to come up with more words, the chatbot told me that “sur” was, indeed, another word that can be formed with the letters G, Y, A, L, P, O and N.

Clearly, we’re still a long way away from Artificial General Intelligence, the nebulous concept describing the moment when machines are capable of doing most tasks as well as or better than human beings. Some experts, like Yann LeCun, Meta’s chief AI scientist, have been outspoken about the limitations of large language models, claiming that they will never reach human-level intelligence since they don’t really use logic. At an event in London last year, LeCun said that the current generation of AI models “just do not understand how the world works. They’re not capable of planning. They’re not capable of real reasoning," he said. "We do not have completely autonomous, self-driving cars that can train themselves to drive in about 20 hours of practice, something a 17-year-old can do.”

Giansiracusa, however, strikes a more cautious tone. “We don’t really know how humans reason, right? We don’t know what intelligence actually is. I don’t know if my brain is just a big statistical calculator, kind of like a more efficient version of a large language model.”

Perhaps the key to living with generative AI without succumbing to either hype or anxiety is to simply understand its inherent limitations. “These tools are not actually designed for a lot of things that people are using them for,” said Chirag Shah, a professor of AI and machine learning at the University of Washington. He co-wrote a high-profile research paper in 2022 critiquing the use of large language models in search engines. Tech companies, thinks Shah, could do a much better job of being transparent about what AI can and can’t do before foisting it on us. That ship may have already sailed, however. Over the last few months, the world’s largest tech companies – Microsoft, Meta, Samsung, Apple, and Google – have made declarations to tightly weave AI into their products, services and operating systems.

"The bots suck because they weren’t designed for this,” Shah said of my word game conundrum. Whether they suck at all the other problems tech companies are throwing at them remains to be seen.

How else have AI chatbots failed you? Email me at pranav.dixit@engadget.com and let me know!

Update, June 13 2024, 4:19 PM ET: This story has been updated to include a statement from Perplexity.

People
Heidi Klum Strips Down on “Hot Ones” and Dives Into Saucy Gossip About the $12.5M Bra She Once Wore
The model ripped off her denim shirt while eating spicy chicken wings with host Sean Evans
People
Ariana Madix Fires Back at Instagram Troll Criticizing Her Legs in Daring Sheer Bodysuit on “Love Island USA”
One commenter got rude about Madix's workout routine — but she wasn't having it
Cosmo
Normani's see-through cut-out black dress confirms she's a naked dressing queen
Normani shared photos of herself on IG wearing a Dion Lee black gloved maxi dress made from a totally see-through fabric with a cut-out bodysuit underneath.
Cinema Online
Ruco Chan sparks baby rumour with new social media post
Fans noticed the family of four pandas drawn by daughter Quinta in his Father's Day post
People
Prince William Jumps for Joy on Beach Day with His Kids in Sweet 42nd Birthday Photo Taken by Kate Middleton
"Happy birthday Papa, we all love you so much!" the photo released on June 21 was captioned
The Independent
Anna Wintour advised Victoria Beckham to ‘fix’ breast implants to make it in fashion world, book claims
Wintour is said to have given the advice early on in Victoria’s fashion career – although a Vogue spokesperson strong denied the claims
SETHLUI.COM
“We are getting kicked out”: Orh Gao Taproom reveals Sep closure in angry Instagram post
The post “We are getting kicked out”: Orh Gao Taproom reveals Sep closure in angry Instagram post appeared first on SETHLUI.com.
Time
Kate Middleton Breaks Tradition With New Family Photo
For Prince William's 42nd birthday, Kate Middleton posted a fun-filled photo of her family.
People
Pregnant Hailey Bieber Strips Down to Show Her Bare Bump in a Bandana Top: ‘Just Cute Things’
The model is expecting her first baby with husband Justin Bieber
Fashionista
Great Outfits in Fashion History: Michelle Yeoh's 1992 Sequined Purple Mini Dress
This look could hit the red carpet tomorrow and no one would bat an eye.
People
Kourtney Kardashian and Travis Barker Went Home for Food, Sleep and Sex After False Alarm with Baby Rocky
The Lemme co-founder welcomed baby Rocky Thirteen last November
InStyle
Jennifer Lopez Was Seen Vacationing in Italy on a Boat Without Ben Affleck
She was all smiles as she flaunted her abs in a two-piece set.
BANG Showbiz
Ben Affleck admits he 'doesn't like attention' as he opens up on Jennifer Lopez' crazy level of fame
Ben Affleck "doesn't like" attention as he explains the difference between his own level of celebrity and that of his wife Jennifer Lopez.
Hello!
Duchess Sophie steals the spotlight on day three of Royal Ascot in silk wildflower dress
Sophie, the Duchess of Edinburgh, was summer personified in her third day at Royal Ascot, stunning crowds as she wore a floral-print silk dress from Suzannah London and bright pink hat.
The Daily Beast
Ben Affleck Opens Up About Life in J.Lo’s ‘Famous’ Shadow
In the first episode of the new season of Hart to Heart, comedian Kevin Hart’s interview show on Peacock, Oscar winner Ben Affleck delves deep into his anxieties surrounding his career, and origins in Hollywood. While Affleck largely skirts his personal life, he does share a few insights into what it’s like to be married to multi-hyphenate mega-star Jennifer Lopez. At the moment, relentless rumors have it that Lopez and Affleck—who famously first got engaged in the aughts before breaking up days
Hello!
Princess Eugenie bears unbelievable resemblance to unexpected royal relative
Princess Eugenie looks so much like this royal ancestor. Find out who it is.
Hello!
Princess Charlotte debuts new trendy jewellery she's never worn before
Princess Charlotte looked lovely in official royal pictures celebrating dad Prince William's birthday, wearing an anklet - something the daughter of Kate Middleton has never worn before.
Hello!
Naomi Campbell's gold mini dress epitomises summer party glam
The 54-year-old has spent the week celebrating her V&A exhibition which opens on Saturday - read more
People
Taylor Swift’s ‘Shake It Off’ Played by Royal Guards Outside Buckingham Palace: Watch!
The performance came ahead of Swift's upcoming shows in London's Wembley Stadium
The Independent
Matthew McConaughey explains why he quit acting for two years after Jennifer Garner romcom
Actor left the profession temporarily after decade worth of romcoms

Latest stories