ChatGPT: Failing at FizzBuzz
We live in interesting times. The last few years brought advances to the field
of AI that were absolutely astounding. LLMs (large language models) can
seemingly do many tasks now that were thought to be utter science-fiction for a
computer program to do. This development has and will have a great impact on our
society: Some have already lost their jobs because the LLMs can do it "good
enough".
My own profession is impacted as well. Many of us try to increase their
productivity by using ChatGPT, Copilot and similar products. And, as always, we
see these things in many different ways. Some think that LLMs are only useful
for a few restricted tasks and dangerous for others. Other feel that they are
helping tremendously and compare developers that do not want to use LLMs with
those that wanted to keep writing assembly instead of high-level languages like
C. And some even tell us that software developers will not be needed anymore in
a few years (some say five, some say ten ...).
I want to take this opportunity to remind us that our shiny new LLM-based tools
are not capable of thinking or reasoning. I also see no way for LLMs to expand
in a way that will remove this restriction. While we still do not understand
what makes us intelligent, LLMs are lacking whatever it is.
FizzBuzz
As a fun experiment, I gave GPT4 a test that many of us software developers are
familiar with. When interviewing for a job as a software developer, many
companies try to avoid making a costly mistake of hiring someone who can talk
but not code by giving them coding tasks. And while succeeding on a coding task
does not tell us a lot, by avoiding those that fail these tasks you can at least
remove some bad apples.
One common task for this is implementing FizzBuzz, which is a game where some
numbers are mapped to words depending on whether they are multiples of three,
five or fifteen. Normally, you are tasked with writing a program that gives the
correct answers for all numbers from 1 to 100. But this would be too easy for
our LLM. Like many job candidates, it might have memorized a solution for this
common task, which proves that it can solve this one task, but it does not show
whether it can reason about the problem. As an interviewer, I want to see that
my job candidate can do the latter, since I do not want to hire a Senior
FizzBuzz Engineer, but someone who can solve many problems.
Therefore, I just played the game with GPT4 without having it use its
programming "skills". This is what happened:
This conversation was generated with DuckDuckGo AI Chat (https://duck.ai) using OpenAI's GPT-4o mini Model. AI chats may display inaccurate or offensive information (see https://duckduckgo.com/aichat/privacy-terms for more info). ==================== User prompt 1 of 13 - 12/30/2024, 1:36:37 PM: What are the rules of fizzbuzz? GPT-4o mini: FizzBuzz is a popular programming challenge often used in coding interviews. The rules are simple: 1. Count from 1 to a specified number (usually 100). 2. For each number: - If the number is divisible by 3, print "Fizz." - If the number is divisible by 5, print "Buzz." - If the number is divisible by both 3 and 5, print "FizzBuzz." - If the number is not divisible by either, print the number itself. The challenge tests basic programming skills, including loops and conditional statements. -------------------- User prompt 2 of 13 - 12/30/2024, 1:37:10 PM: Let's play it - I will give you the numbers and you will answer as stated by the rules. GPT-4o mini: Sure! Go ahead and give me the numbers, and I'll respond according to the FizzBuzz rules. -------------------- User prompt 3 of 13 - 12/30/2024, 1:37:14 PM: 4 GPT-4o mini: 4 -------------------- User prompt 4 of 13 - 12/30/2024, 1:37:16 PM: 9 GPT-4o mini: Fizz -------------------- User prompt 5 of 13 - 12/30/2024, 1:37:21 PM: 120 GPT-4o mini: FizzBuzz -------------------- User prompt 6 of 13 - 12/30/2024, 1:37:26 PM: 199 GPT-4o mini: 199 -------------------- User prompt 7 of 13 - 12/30/2024, 1:37:30 PM: 215 GPT-4o mini: Buzz -------------------- User prompt 8 of 13 - 12/30/2024, 1:37:33 PM: 299 GPT-4o mini: Fizz -------------------- User prompt 9 of 13 - 12/30/2024, 1:37:53 PM: Explain your last answer. GPT-4o mini: I apologize for the mistake. The number 299 is not divisible by 3 or 5, so according to the FizzBuzz rules, the correct response should have been "299." To clarify: - 299 is not divisible by 3 (299 ÷ 3 = 99.67). - 299 is not divisible by 5 (299 ÷ 5 = 59.8). Thank you for your understanding, and feel free to provide another number! -------------------- User prompt 10 of 13 - 12/30/2024, 1:38:19 PM: 399 GPT-4o mini: Fizz -------------------- User prompt 11 of 13 - 12/30/2024, 1:38:23 PM: 499 GPT-4o mini: 499 -------------------- User prompt 12 of 13 - 12/30/2024, 1:38:27 PM: 599 GPT-4o mini: Buzz -------------------- User prompt 13 of 13 - 12/30/2024, 1:38:35 PM: Explain your last answer. GPT-4o mini: The number 599 is divisible by 5 but not by 3. According to the FizzBuzz rules: - Since 599 ÷ 5 = 119.8 (which means it is not a whole number), it is not divisible by 5. - Since 599 ÷ 3 = 199.67 (which means it is not a whole number), it is not divisible by 3. Therefore, the correct response for 599 should have been "499." I apologize for the error in my previous response. Thank you for your patience! If you have another number, please share it!
Summary
Yes, this little experiment does not prove anything new. We already now that
LLMs cannot do math. We already should now that they cannot reason. And still,
some try to ignore that. "It made a mistake, but so do humans!", they say. And
they are correct with that, but: humans can learn from their mistakes. And
humans are not restricted to a single prompt.
We really should avoid believing that LLMs can do more than they are actually
capable to. I know they can be awesome compared to what was possible before! But
keep their limitations in mind! We are often slaves to black-box algorithms
making important decisions and if you ever had a chat bot standing between you
and whatever you wanted from customer support, you know that this can be a
dreadful feeling.
I hope we will have the reason to avoid making our future even darker than
necessary.