ChatGPT 'got absolutely wrecked' by Atari 2600 in beginner's chess match — OpenAI's newest model bamboozled by 1970s logic

@[email protected] · 10 days ago

ChatGPT 'got absolutely wrecked' by Atari 2600 in beginner's chess match — OpenAI's newest model bamboozled by 1970s logic

@[email protected] · 9 days ago

Using an LLM as a chess engine is like using a power tool as a table leg. Pretty funny honestly, but it’s obviously not going to be good at it, at least not without scaffolding.

@[email protected] · 8 days ago

is like using a power tool as a table leg.

Then again, our corporate lords and masters are trying to replace all manner of skilled workers with those same LLM “AI” tools.

And clearly that will backfire on them and they’ll eventually scramble to find people with the needed skills, but in the meantime tons of people will have lost their source of income.

@[email protected] · edit-2 8 days ago

If you believe LLMs are not good at anything then there should be relatively little to worry about in the long-term, but I am more concerned.

It’s not obvious to me that it will backfire for them, because I believe LLMs are good at some things (that is, when they are used correctly, for the correct tasks). Currently they’re being applied to far more use cases than they are likely to be good at – either because they’re overhyped or our corporate lords and masters are just experimenting to find out what they’re good at and what not. Some of these cases will be like chess, but others will be like code*.

(* not saying LLMs are good at code in general, but for some coding applications I believe they are vastly more efficient than humans, even if a human expert can currently write higher-quality less-buggy code.)

@[email protected] · 9 days ago

Ah, you used logic. That’s the issue. They don’t do that.

@[email protected] · 9 days ago

Can i fistfight ChatGPT next? I bet I could kick its ass, too :p

@[email protected] · edit-2 9 days ago

removed by mod

xep · 9 days ago

If you’re writing a novel simulation for a non-trivial system, it might be best to learn to code so you can identify any issues in the simulation later. It’s likely that LLMs do not have the information required to generate good code for this context.

@[email protected] · 9 days ago

You’re right. I’m not relying on this shit. It’s a tool. Fucking up the gui is fine, but making any changes I don’t research to my simulator core could fuck up my whole project. It’s a tool that likes to cater to you, and you have to work around that - really, not too different from how much pressure you put on a grinder. You gotta learn how to work it. And, you’re sentiment is correct. My lack of programming experience is a big hurdle I have to account for and make safeguards against. It would be a huge help if I started from the basics. But, I mean, I also can’t rub two sticks together to heat my home. Doesn’t mean I can’t use this tool to produce reliable results.

@[email protected] · 9 days ago

The tough guys and sigma males of yester-year used to say things like “If I were homeless, I would just bathe in the creek using the natural animal fats from the squirrel I caught for dinner as soap, win a new job by explaining my 21-days-in-7 workweek ethos, and buy a new home using my shares in my dad’s furniture warehouse as collateral against the loan. It’s not impossible to get back on your feet.”

But with the advent of AI, which, actually, is supposed to do things for you, it’s completely different now.

I also can’t rub two sticks together to heat my home.

Dude, that fucking sucks. What is wrong with you?

@[email protected] · 9 days ago

You’re so fucking silly. You gonna study cell theory to see how long you should keep vegetables in your fridge? Go home. Save science for people who understand things.

@[email protected] · 9 days ago

Save science for people who understand things.

Does this not strike you as the least bit ironic?

@[email protected] · 9 days ago

It’s not that hard to beat dumb 6 year old who’s only purpose is mine your privacy to sell you ads or product place some shit for you in future.

@[email protected] · 7 days ago

this is because an LLM is not made for playing chess

@[email protected] · 9 days ago

Can ChatGPT actually play chess now? Last I checked, it couldn’t remember more than 5 moves of history so it wouldn’t be able to see the true board state and would make illegal moves, take it’s own pieces, materialize pieces out of thin air, etc.

Pamasich · 9 days ago

There are custom GPTs which claim to play at a stockfish level or be literally stockfish under the hood (I assume the former is still the latter just not explicitly). Haven’t tested them, but if they work, I’d say yes. An LLM itself will never be able to play chess or do anything similar, unless they outsource that task to another tool that can. And there seem to be GPTs that do exactly that.

As for why we need ChatGPT then when the result comes from Stockfish anyway, it’s for the natural language prompts and responses.

Robust Mirror · edit-2 9 days ago

It could always play it if you reminded it of the board state every move. Not well, but at least generally legally. And while I know elites can play chess blind, the average person can’t, so it was always kind of harsh to hold it to that standard and criticise it not being able to remember more than 5 moves when most people can’t do that themselves.

Besides that, it was never designed to play chess. It would be like insulting Watson the Jeopardy bot for losing against the Atari chess bot, it’s not what it was designed to do.

@[email protected] · edit-2 9 days ago

It can’t, but that didn’t stop a bunch of gushing articles a while back about how it had an ELO of 2400 and other such nonsense. Turns out you could get it to have an ELO of 2400 under a very very specific set of circumstances, that include correcting it every time it hallucinated pieces or attempted to make illegal moves.

bountygiver [any] · 9 days ago

and still lose to stockfish even after conjuring 3 queens out of thin air lol

@[email protected] · 9 days ago

ChatGPT must adhere honorably to the rules that its making up on the spot. Thats Dallas

Pamasich · 9 days ago

Isn’t the Atari just a game console, not a chess engine?

Like, Wikipedia doesn’t mention anything about the Atari 2600 having a built-in chess engine.

If they were willing to run a chess game on the Atari 2600, why did they not apply the same to ChatGPT? There are custom GPTs which claim to use a stockfish API or play at a similar level.

Like this, it’s just unfair. Both platforms are not designed to deal with the task by themselves, but one of them is given the necessary tooling, the other one isn’t. No matter what you think of ChatGPT, that’s not a fair comparison.

@[email protected] · 9 days ago

GPTs which claim to use a stockfish API

Then the actual chess isn’t LLM. If you are going stockfish, then the LLM doesn’t add anything, stockfish is doing everything.

The whole point is the marketing rage is that LLMs can do all kinds of stuff, doubling down on this with the branding of some approaches as “reasoning” models, which are roughly “similar to ‘pre-reasoning’, but forcing use of more tokens on disposable intermediate generation steps”. With this facet of LLM marketing, the promise would be that the LLM can “reason” itself through a chess game without particular enablement. In practice, people trying to feed in gobs of chess data to an LLM end up with an LLM that doesn’t even comply to the rules of the game, let alone provide reasonable competitive responses to an oppone.

@[email protected] · 9 days ago

Sometimes it seems like most of these AI articles are written by AIs with bad prompts.

Human journalists would hopefully do a little research. A quick search would reveal that researches have been publishing about this for over a year so there’s no need to sensationalize it. Perhaps the human journalist could have spent a little time talking about why LLMs are bad at chess and how researchers are approaching the problem.

LLMs on the other hand, are very good at producing clickbait articles with low information content.

Lovable Sidekick · edit-2 9 days ago

In this case it’s not even bad prompts, it’s a problem domain ChatGPT wasn’t designed to be good at. It’s like saying modern medicine is clearly bullshit because a doctor loses a basketball game.

@[email protected] · 9 days ago

Gotham chess has a video of making chatgpt play chess against stockfish. Spoiler: chatgpt does not do well. It plays okay for a few moves but then the moment it gets in trouble it straight up cheats. Telling it to follow the rules of chess doesn’t help.

This sort of gets to the heart of LLM-based “AI”. That one example to me really shows that there’s no actual reasoning happening inside. It’s producing answers that statistically look like answers that might be given based on that input.

For some things it even works. But calling this intelligence is dubious at best.

@[email protected] · edit-2 9 days ago

Because it doesn’t have any understanding of the rules of chess or even an internal model of the game state, it just has the text of chess games in its training data and can reproduce the notation, but nothing to prevent it from making illegal moves, trying to move or capture pieces that don’t exist, incorrectly declaring check/checkmate, or any number of nonsensical things.

@[email protected] · 9 days ago

ChatGPT versus Deepseek is hilarious. They both cheat like crazy and then one side jedi mind tricks the winner into losing.

@[email protected] · 9 days ago

Hallucinating 100% of the time 👌

@[email protected] · 9 days ago

LLM are not built for logic.

@[email protected] · 9 days ago

And yet everybody is selling to write code.

The last time I checked, coding was requiring logic.

@[email protected] · 9 days ago

To be fair, a decent chunk of coding is stupid boilerplate/minutia that varies environment to environment, language to language, library to library.

So LLM can do some code completion, filling out a bunch of boilerplate that is blatantly obvious, generating the redundant text mandated by certain patterns, and keeping straight details between languages like “does this language want join as a method on a list with a string argument, or vice versa?”

Problem is this can be sometimes more annoying than it’s worth, as miscompletions are annoying.

@[email protected] · 9 days ago

Fair point.

I liked the “upgraded autocompletion”, you know, an completion based on the context, just before the time that they pushed it too much with 20 lines of non sense…

Now I am thinking of a way of doing the thing, then I receive a 20 lines suggestion.

So I am checking if that make sense, losing my momentum, only to realize the suggestion us calling shit that don’t exist…

Screw that.

@[email protected] · 9 days ago

The amount of garbage it spits out in autocomplete is distracting. If it’s constantly making me 5-10% less productive the many times it’s wrong, it should save me a lot of time when it is right, and generally, I haven’t found it able to do that.

Yesterday I tried to prompt it to change around 20 call sites for a function where I had changed the signature. Easy, boring and repetitive, something that a junior could easily do. And all the models were absolutely clueless about it (using copilot)

@[email protected] · 9 days ago

Hardly surprising. Llms aren’t -thinking- they’re just shitting out the next token for any given input of tokens.

Steve Dice · 8 days ago

That’s exactly what thinking is, though.

Steve Dice · 8 days ago

2025 Mazda MX-5 Miata ‘got absolutely wrecked’ by Inflatable Boat in beginner’s boat racing match — Mazda’s newest model bamboozled by 1930s technology.

🇰 🌀 🇱 🇦 🇳 🇦 🇰 🇮 · 10 days ago

There was a chess game for the Atari 2600? :O

I wanna see them W I D E pieces.

@[email protected] · 10 days ago

Here you go (online emulator): https://www.retrogames.cz/play_716-Atari2600.php

@[email protected] · 9 days ago

WTF? I just played that just long enough for my queen to take over their queen, and it turned my queen into a rook?

Is that even a legit rule in any variation of chess rules?

@[email protected] · 10 days ago

I wasn’t aware of that either, now I’m kinda curious to try to find it in my 512 Atari 2600 ROMs archive…

@[email protected] · 10 days ago

They used ChatGPT 4o, instead of using o1 or o3.

Obviously it was going to fail.

@[email protected] · edit-2 9 days ago

Other studies (not all chess based or against this old chess AI) show similar lackluster results when using reasoning models.

Edit: When comparing reasoning models to existing algorithmic solutions.

@[email protected] · 9 days ago

All these comments asking “why don’t they just have chatgpt go and look up the correct answer”.

That’s not how it works, you buffoons, it trains off of datasets long before it releases. It doesn’t think. It doesn’t learn after release, it won’t remember things you try to teach it.

Really lowering my faith in humanity when even the AI skeptics don’t understand that it generates statistical representations of an answer based on answers given in the past.