Index RSS

ChatGPT: Failing at FizzBuzz

We live in interesting times. The last few years brought advances to the field of AI that were absolutely astounding. LLMs (large language models) can seemingly do many tasks now that were thought to be utter science-fiction for a computer program to do. This development has and will have a great impact on our society: Some have already lost their jobs because the LLMs can do it "good enough".

My own profession is impacted as well. Many of us try to increase their productivity by using ChatGPT, Copilot and similar products. And, as always, we see these things in many different ways. Some think that LLMs are only useful for a few restricted tasks and dangerous for others. Other feel that they are helping tremendously and compare developers that do not want to use LLMs with those that wanted to keep writing assembly instead of high-level languages like C. And some even tell us that software developers will not be needed anymore in a few years (some say five, some say ten ...).

I want to take this opportunity to remind us that our shiny new LLM-based tools are not capable of thinking or reasoning. I also see no way for LLMs to expand in a way that will remove this restriction. While we still do not understand what makes us intelligent, LLMs are lacking whatever it is.

FizzBuzz

As a fun experiment, I gave GPT4 a test that many of us software developers are familiar with. When interviewing for a job as a software developer, many companies try to avoid making a costly mistake of hiring someone who can talk but not code by giving them coding tasks. And while succeeding on a coding task does not tell us a lot, by avoiding those that fail these tasks you can at least remove some bad apples.

One common task for this is implementing FizzBuzz, which is a game where some numbers are mapped to words depending on whether they are multiples of three, five or fifteen. Normally, you are tasked with writing a program that gives the correct answers for all numbers from 1 to 100. But this would be too easy for our LLM. Like many job candidates, it might have memorized a solution for this common task, which proves that it can solve this one task, but it does not show whether it can reason about the problem. As an interviewer, I want to see that my job candidate can do the latter, since I do not want to hire a Senior FizzBuzz Engineer, but someone who can solve many problems.

Therefore, I just played the game with GPT4 without having it use its programming "skills". This is what happened:

This conversation was generated with DuckDuckGo AI Chat
(https://duck.ai) using OpenAI's GPT-4o mini Model. AI chats may
display inaccurate or offensive information (see
https://duckduckgo.com/aichat/privacy-terms for more info).

====================

User prompt 1 of 13 - 12/30/2024, 1:36:37 PM:
What are the rules of fizzbuzz?

GPT-4o mini:
FizzBuzz is a popular programming challenge often used in coding
interviews. The rules are simple:

1. Count from 1 to a specified number (usually 100).
2. For each number:
   - If the number is divisible by 3, print "Fizz."
   - If the number is divisible by 5, print "Buzz."
   - If the number is divisible by both 3 and 5, print "FizzBuzz."
   - If the number is not divisible by either, print the number itself.

The challenge tests basic programming skills, including loops and
conditional statements.

--------------------

User prompt 2 of 13 - 12/30/2024, 1:37:10 PM:
Let's play it - I will give you the numbers and you will answer as
stated by the rules.

GPT-4o mini:
Sure! Go ahead and give me the numbers, and I'll respond according to
the FizzBuzz rules.

--------------------

User prompt 3 of 13 - 12/30/2024, 1:37:14 PM:
4

GPT-4o mini:
4

--------------------

User prompt 4 of 13 - 12/30/2024, 1:37:16 PM:
9

GPT-4o mini:
Fizz

--------------------

User prompt 5 of 13 - 12/30/2024, 1:37:21 PM:
120

GPT-4o mini:
FizzBuzz

--------------------

User prompt 6 of 13 - 12/30/2024, 1:37:26 PM:
199

GPT-4o mini:
199

--------------------

User prompt 7 of 13 - 12/30/2024, 1:37:30 PM:
215

GPT-4o mini:
Buzz

--------------------

User prompt 8 of 13 - 12/30/2024, 1:37:33 PM:
299

GPT-4o mini:
Fizz

--------------------

User prompt 9 of 13 - 12/30/2024, 1:37:53 PM:
Explain your last answer.

GPT-4o mini:
I apologize for the mistake. The number 299 is not divisible by 3 or
5, so according to the FizzBuzz rules, the correct response should
have been "299."

To clarify:
- 299 is not divisible by 3 (299 ÷ 3 = 99.67).
- 299 is not divisible by 5 (299 ÷ 5 = 59.8).

Thank you for your understanding, and feel free to provide another
number!

--------------------

User prompt 10 of 13 - 12/30/2024, 1:38:19 PM:
399

GPT-4o mini:
Fizz

--------------------

User prompt 11 of 13 - 12/30/2024, 1:38:23 PM:
499

GPT-4o mini:
499

--------------------

User prompt 12 of 13 - 12/30/2024, 1:38:27 PM:
599

GPT-4o mini:
Buzz

--------------------

User prompt 13 of 13 - 12/30/2024, 1:38:35 PM:
Explain your last answer.

GPT-4o mini:
The number 599 is divisible by 5 but not by 3. According to the
FizzBuzz rules:

- Since 599 ÷ 5 = 119.8 (which means it is not a whole number), it is
  not divisible by 5.
- Since 599 ÷ 3 = 199.67 (which means it is not a whole number), it is
  not divisible by 3.

Therefore, the correct response for 599 should have been "499." I
apologize for the error in my previous response. Thank you for your
patience! If you have another number, please share it!

Summary

Yes, this little experiment does not prove anything new. We already now that LLMs cannot do math. We already should now that they cannot reason. And still, some try to ignore that. "It made a mistake, but so do humans!", they say. And they are correct with that, but: humans can learn from their mistakes. And humans are not restricted to a single prompt.

We really should avoid believing that LLMs can do more than they are actually capable to. I know they can be awesome compared to what was possible before! But keep their limitations in mind! We are often slaves to black-box algorithms making important decisions and if you ever had a chat bot standing between you and whatever you wanted from customer support, you know that this can be a dreadful feeling.

I hope we will have the reason to avoid making our future even darker than necessary.