ChatGPT 'got absolutely wrecked' by Atari 2600 in beginner's chess match — OpenAI's newest model bamboozled by 1970s logic

@[email protected] · 1 month ago

ChatGPT 'got absolutely wrecked' by Atari 2600 in beginner's chess match — OpenAI's newest model bamboozled by 1970s logic

@[email protected] · 1 month ago

Ah, you used logic. That’s the issue. They don’t do that.

@[email protected] · 1 month ago

Using an LLM as a chess engine is like using a power tool as a table leg. Pretty funny honestly, but it’s obviously not going to be good at it, at least not without scaffolding.

@[email protected] · 1 month ago

is like using a power tool as a table leg.

Then again, our corporate lords and masters are trying to replace all manner of skilled workers with those same LLM “AI” tools.

And clearly that will backfire on them and they’ll eventually scramble to find people with the needed skills, but in the meantime tons of people will have lost their source of income.

@[email protected] · edit-2 1 month ago

If you believe LLMs are not good at anything then there should be relatively little to worry about in the long-term, but I am more concerned.

It’s not obvious to me that it will backfire for them, because I believe LLMs are good at some things (that is, when they are used correctly, for the correct tasks). Currently they’re being applied to far more use cases than they are likely to be good at – either because they’re overhyped or our corporate lords and masters are just experimenting to find out what they’re good at and what not. Some of these cases will be like chess, but others will be like code*.

(* not saying LLMs are good at code in general, but for some coding applications I believe they are vastly more efficient than humans, even if a human expert can currently write higher-quality less-buggy code.)

@[email protected] · 1 month ago

I believe LLMs are good at some things

The problem is that they’re being used for all the things, including a large number of tasks that thwy are not well suited to.

@[email protected] · 1 month ago

yeah, we agree on this point. In the short term it’s a disaster. In the long-term, assuming AI’s capabilities don’t continue to improve at the rate they have been, our corporate overlords will only replace people for whom it’s actually worth it to them to replace with AI.

@[email protected] · 1 month ago

Can i fistfight ChatGPT next? I bet I could kick its ass, too :p

@[email protected] · 1 month ago

The Atari chess program can play chess better than the Boeing 747 too. And better than the North Pole. Amazing!

@[email protected] · 1 month ago

Are either of those marketed as powerful AI?

@[email protected] · 1 month ago

Neither of those things are marketed as being artificially intelligent.

@[email protected] · 1 month ago

Marketers aren’t intelligent either, so I see no reason to listen to them.

Optional · 1 month ago

You’re not going to slimeball investors out of three hundred billion dollars with that attitude, mister.

🇰 🌀 🇱 🇦 🇳 🇦 🇰 🇮 · 1 month ago

There was a chess game for the Atari 2600? :O

I wanna see them W I D E pieces.

@[email protected] · 1 month ago

Here you go (online emulator): https://www.retrogames.cz/play_716-Atari2600.php

@[email protected] · 1 month ago

WTF? I just played that just long enough for my queen to take over their queen, and it turned my queen into a rook?

Is that even a legit rule in any variation of chess rules?

@[email protected] · 1 month ago

I wasn’t aware of that either, now I’m kinda curious to try to find it in my 512 Atari 2600 ROMs archive…

oni ᓚᘏᗢ · 1 month ago

This made my day

@[email protected] · 1 month ago

Get your booty on the floor tonight.

@[email protected] · 1 month ago

I’m often impressed at how good chatGPT is at generating text, but I’ll admit it’s hilariously terrible at chess. It loves to manifest pieces out of thin air, or make absurd illegal moves, like jumping its king halfway across the board and claiming checkmate

Rhaedas · 1 month ago

It can be bad at the very thing it’s designed to do. It can repeat phrases often, something that isn’t great for writing. But why wouldn’t it, it’s all about probability so common things said will pop up more unless you adjust the variables that determine the randomness.

Blaster M · 1 month ago

ChatGPT is playing Anarchy Chess

@[email protected] · 1 month ago

Yeah! I’ve loved watching Gothem Chess’ videos on these. Always have been good for a laugh.

@[email protected] · edit-2 1 month ago

Next, pit ChatGPT against 1K ZX Chess in a ZX81.

@[email protected] · 1 month ago

LLM are not built for logic.

@[email protected] · 1 month ago

And yet everybody is selling to write code.

The last time I checked, coding was requiring logic.

@[email protected] · 1 month ago

To be fair, a decent chunk of coding is stupid boilerplate/minutia that varies environment to environment, language to language, library to library.

So LLM can do some code completion, filling out a bunch of boilerplate that is blatantly obvious, generating the redundant text mandated by certain patterns, and keeping straight details between languages like “does this language want join as a method on a list with a string argument, or vice versa?”

Problem is this can be sometimes more annoying than it’s worth, as miscompletions are annoying.

@[email protected] · 1 month ago

Fair point.

I liked the “upgraded autocompletion”, you know, an completion based on the context, just before the time that they pushed it too much with 20 lines of non sense…

Now I am thinking of a way of doing the thing, then I receive a 20 lines suggestion.

So I am checking if that make sense, losing my momentum, only to realize the suggestion us calling shit that don’t exist…

Screw that.

@[email protected] · 1 month ago

The amount of garbage it spits out in autocomplete is distracting. If it’s constantly making me 5-10% less productive the many times it’s wrong, it should save me a lot of time when it is right, and generally, I haven’t found it able to do that.

Yesterday I tried to prompt it to change around 20 call sites for a function where I had changed the signature. Easy, boring and repetitive, something that a junior could easily do. And all the models were absolutely clueless about it (using copilot)

@[email protected] · edit-2 1 month ago

I suppose it’s an interesting experiment, but it’s not that surprising that a word prediction machine can’t play chess.

@[email protected] · 1 month ago

Because people want to feel superior because they ~~don’t know how to use a ChatBot~~ can count the number of "r"s in the word “strawberry”, lol

@[email protected] · 1 month ago

Yeah, just because I can’t count the number of r’s in the word strawberry doesn’t mean I shouldn’t be put in charge of the US nuclear arsenal!

@[email protected] · 1 month ago

That is more a failure of the person who made that decision than a failing of ChatBots, lol

@[email protected] · 1 month ago

Agreed, which is why it’s important to have articles out in the wild that show the shortcomings of AI. If all people read is all the positive crap coming out of companies like OpenAI then they will make stupid decisions.

@[email protected] · 1 month ago

Anyone who puts a chatbot anywhere is definitely a failure, yeah.

@[email protected] · 1 month ago

A strange game. How about a nice game of Global Thermonuclear War?

ada · 1 month ago

No thank you. The only winning move is not to play

@[email protected] · 1 month ago

Lmao! 🤣 that made me spit!!

@[email protected] · 1 month ago

Frak off, toaster

@[email protected] · 1 month ago

Rhaedas · 1 month ago

I’ve heard the only way to win is to lock down your shelter and strike first.

Endymion_Mallorn · 1 month ago

JOSHUA

@[email protected] · 1 month ago

They used ChatGPT 4o, instead of using o1 or o3.

Obviously it was going to fail.

@[email protected] · edit-2 1 month ago

Other studies (not all chess based or against this old chess AI) show similar lackluster results when using reasoning models.

Edit: When comparing reasoning models to existing algorithmic solutions.

@[email protected] · 1 month ago

this is because an LLM is not made for playing chess

Steve Dice · 1 month ago

2025 Mazda MX-5 Miata ‘got absolutely wrecked’ by Inflatable Boat in beginner’s boat racing match — Mazda’s newest model bamboozled by 1930s technology.

@[email protected] · 1 month ago

If you don’t play chess, the Atari is probably going to beat you as well.

LLMs are only good at things to the extent that they have been well-trained in the relevant areas. Not just learning to predict text string sequences, but reinforcement learning after that, where a human or some other agent says “this answer is better than that one” enough times in enough of the right contexts. It mimics the way humans learn, which is through repeated and diverse exposure.

If they set up a system to train it against some chess program, or (much simpler) simply gave it a tool call, it would do much better. Tool calling already exists and would be by far the easiest way.

It could also be instructed to write a chess solver program and then run it, at which point it would be on par with the Atari, but it wouldn’t compete well with a serious chess solver.