ChatGPT 'got absolutely wrecked' by Atari 2600 in beginner's chess match — OpenAI's newest model bamboozled by 1970s logic

@[email protected] · 1 month ago

@[email protected] · 1 month ago

They used ChatGPT 4o, instead of using o1 or o3.

Obviously it was going to fail.

@[email protected] · edit-2 1 month ago

Other studies (not all chess based or against this old chess AI) show similar lackluster results when using reasoning models.

Edit: When comparing reasoning models to existing algorithmic solutions.