Chat GPT appears to hallucinate or outright lie about everything

@[email protected] · 10 months ago

Chat GPT appears to hallucinate or outright lie about everything

@[email protected] · edit-2 10 months ago

there was a time when it confidently claimed that Turkey had some landmass in the southern hemisphere, it has come a long way since then

Karyoplasma · 10 months ago

From today:

Complete bullshit, it’s from a Katy Perry interview.

@[email protected] · 10 months ago

It did not simply analyze the best type of graphics card for the situation.

Yes it certainly didn’t: It’s a large language model, not some sort of knowledge engine. It can’t analyze anything, it only generates likely text strings. I think this is still fundamentally misunderstood widely.

@[email protected] · 10 months ago

I think this is still fundamentally misunderstood widely.

The fact that it’s being sold as artificial intelligence instead of autocomplete doesn’t help.

Or Google and Microsoft trying to sell it as a replacement for search engines.

It’s malicious misinformation all the way down.

Christer Enfors · 10 months ago

Agreed. As far as I know, there is no actual artificial intelligence yet, only simulated intelligence.

🇰 🌀 🇱 🇦 🇳 🇦 🇰 🇮 🏆 · edit-2 10 months ago

Imagine text gen AI as just a big hat filled with slips of paper and when you ask it for something, it’s just grabbing random shit out of the hat and arranging it so it looks like a normal sentence.

Even if you filled it with only good information, it will still cross those things together to form an entirely new and novel response, which would invariably be wrong as it mixes info about multiple subjects together even if all the information individually was technically accurate.

They are not intelligent. They aren’t even better than similar systems that existed before LLMs!

@[email protected] · 10 months ago

“Converted what I said into the truth”

Now I’m not against the point you’re making in any way, I think the bots are hardcore yes men.

Buut… I have a 1060 and I got it around when No Man’s Sky came out, and I did try it on my 4k LED TV. It did run, but it also stuttered quite a bit.

Now I’m currently thinking of updating my card, as I’ve updated the rest of the PC last year. A 3070 is basically what I’m considering, unless I can find a nice 4000 series with good VRAM.

My point here being that this isn’t the best example you could have given, as I’ve basically had that conversation several times in real life, exactly like that, as “it runs” is somewhat subjective.

LLM’s obviously have trouble with subjective things, as we humans do too.

But again, I agree with the point you’re trying to make. You can get these bots to say anything. It amused me that the blocks are much more easily circumvented just by telling them to ignore something or by talking hypothetically. Idk but at least very strong text based erotica was easy to get out of them last year, which I think should not have been the case, probably.

mozz · 10 months ago

May I offer you a fairly convincing explanation

subignition · 10 months ago

This is the best article I’ve seen yet on the topic. It does mention the “how” in brief, but this analogy really explains the “why” Gonna bookmark this in case I ever need to try to save another friend or family member from drinking the Flavor-Aid

@[email protected] · 10 months ago

So, they’ve basically accidentally (or intentionally) made Eliza with extra steps (and many orders of magnitude more energy consumption).

mozz · 10 months ago

I mean, it’s clearly doing something which is impressive and useful. It’s just that the thing that it’s doing is not intelligence, and dressing it up convincingly imitate intelligence may not have been good for anyone involved in the whole operation.

@[email protected] · 10 months ago

Impressive how…? It’s just statistics-based very slightly fancier autocomplete…

And useful…? It’s utterly useless for anything that requires the text it generates to be reliable and trustworthy… the most it can be somewhat reliably used for is as a somewhat more accurate autocomplete (yet with a higher chance for its mistakes to go unnoticed) and possibly, if trained on a custom dataset, as a non-quest-essential dialogue generator for NPCs in games… in any other use case it’ll inevitably cause more harm than good… and in those two cases the added costs aren’t remotely worth the slight benefits.

It’s just a fancy extremely expensive toy with no real practical uses worth its cost.

The only people it’s useful to are snake oil salesmen and similar scammers (and even then only in the short run, until model collapse makes it even more useless).

All it will have achieved in the end is an increase in enshittification, global warming, and distrust in any future real AI research.

jeeva · 10 months ago

I enjoyed reading this, thank you.

@[email protected] · 10 months ago

Well, you’re wrong. Its right a lot of the time.

You have a fundamental misunderstanding of how LLMs are supposed to work. They’re mostly just text generation machines.

In the case of more useful ones like Bing or Perplexity, they’re more like advanced search engines. You can get really fast answers instead of personally trawling the links it provides and trying to find the necessary information. Of course, if it’s something important, you need to verify the answers they provide, which is why they provide links to the sources they used.

@[email protected] · 10 months ago

Except they also aren’t reliable at parsing and summarizing links, so it’s irresponsible to use their summary of a link without actually going to the link and seeing for yourself.

It’s a search engine with confabulation and extra steps.

@[email protected] · 10 months ago

Except they also aren’t reliable at parsing and summarizing links

Probably 90%+ of the time they are.

so it’s irresponsible to use their summary

You missed this part:

if it’s something important

@[email protected] · 10 months ago

I think this article does a good job of exploring and explaining how LLM attempts at text summarization could be more accurately described as “text shortening”; a subtle but critical distinction.

@[email protected] · 10 months ago

90% reliability is not anywhere remotely in the neighborhood of acceptable, let alone good.

No, I didn’t miss anything. All misinformation makes you dumber. Filling your head with bullshit that may or may not have any basis in reality is always bad, no matter how low the stakes.

@[email protected] · 10 months ago

Agree to disagree, I suppose.

@[email protected] · edit-2 10 months ago

You can’t just handwave away your deliberate participation in making humanity dumber by shoveling known bullshit as a valid source of truth.

@[email protected] · 10 months ago

I guess it’s a good thing I’m not doing that, then.

@[email protected] · 10 months ago

Wasting a ridiculous amount of energy for the sole purpose of making yourself dumber is literally all you’re doing every single time you use an LLM as a search engine.

@[email protected] · 10 months ago

Perplexity has been great for my ADHD brain and researching for my master’s.

@[email protected] · 10 months ago

Agreed. Often times there are questions I just wouldn’t bother asking if it weren’t for Perplexity.

paraphrand · 10 months ago

Those first set of specs it quoted are actually the original min specs that Oculus and Valve promoted for the Rift and Vive when they were new.

Ever since then there have not been new “official” min specs. But it’s true that higher spec if better and that newer headsets are higher res and could use higher spec stuff.

Also, a “well actually” on this would be that those are the revised min specs that were put out a few years after the initial specs. It use to be a GTX 970 was min spec. But they changed that to the 1060.

What is failing here is the model actually being smart. If it was smart it would have reasoned that time moves on and it would have considered better mins pecs for current hardware. But instead it just regurgitated the min specs that were once commonly quoted by Oculus/Meta and Valve.

@[email protected] · 10 months ago

It’s actually not really wrong. There are many VR games you can get away with low specs for.

Yes when you suggested a 3070 it just took that and rolled with it.

It’s basically advanced autocomplete, so when you suggest a 3070 it thinks the best answer should probably use a 3070. It’s not good at knowing when to say “no”.

Interesting it did know to come up with a newer AMD card to match the 3070, as well as increasing the other specs to more modern values.

@[email protected] · 10 months ago

It’s incorrect to ask chatgpt such questions in the first place. I thought we’ve figured that out 18 or so months ago.

@[email protected] · 10 months ago

Why? It actually answered the question properly, just not to the OP’s satisfaction.

@[email protected] · 10 months ago

because it could have just as easily confidentiality said something incorrect. You only know it’s correct by going through the process of verifying it yourself, which is why it doesn’t make sense to ask it anything like this in the first place.

@[email protected] · 10 months ago

I mean… I guess? But the question was answered correctly, I was playing Beat Saber on my 1060 with my Vive and Quest 2.

@[email protected] · 10 months ago

It doesn’t matter that it was correct. There isn’t anything that verifies what it’s saying, which is why it’s not recommended to ask it questions like that. You’re taking a risk if you’re counting on the information it gives you.

@[email protected] · 10 months ago

There’s no way they used Gemini and decided it’s better than GPT.

I asked Gemini: “Why can great apes eat raw meat but it’s not advised for humans?”. It said because they have a “stronger stomach acid”. I then asked “what stomach acid is stronger than HCL and which ones do apes use?”. And was met with the response: “Apes do not produce or utilize acids in the way humans do for chemical processes.”.

So I did some research and apes actually have almost neutral stomach acid and mainly rely on enzymes. Absolutely not trustworthy.

Daemon Silverstein · 10 months ago

deleted by creator

@[email protected] · 10 months ago

If I narrow down the scope, or ask the same question a different way, there’s a good chance I reach the answer I’m looking for.

https://chatgpt.com/share/ca367284-2e67-40bd-bff5-2e1e629fd3c0

Toes♀ · 10 months ago

I think some of the issue is that the bulk of its knowledge is from a few years back and it relies on searching the internet to fill the gap. But it prefers the older database it was trained against.

@[email protected] · 10 months ago

That’s exactly the issue here. ChatGPT’s current training set ends right around the time the Meta Quest 3 came out. It’s not going to have any discussions in there of No Man’s Sky with tech that wasn’t out yet.

Dave. · edit-2 10 months ago

Most times what I get when asking it coding questions is a half-baked response that has a logic error or five in it.

Once I query it about one of those errors it replies with, “You’re right, X should be Y because of (technical reason Z). Here’s the updated code that fixes it”.

It will then give me some code that does actually work, but does dumb things, like recalculating complex but static values inside a loop. When I ask if there’s any performance improvements it can do, suddenly it’s full of helpful ways to improve the code that can make it run 10 to 100 times faster and fix those issues. Apparently if I want performant code, I have to explicitly ask for it.

For some things it will offer solutions that don’t solve the issue that I raise, no matter how many different ways I phrase the issue and try and coax it towards a solution. At that point, it basically can’t, and it gets bogged down to minor alterations that don’t really achieve anything.

Sometimes when it hits that point I can say “start again, and use (this methodology)” and it will suddenly hit upon a solution that’s workable.

So basically, right now it’s good for regurgitating some statistically plausible information that can be further refined with a couple of good questions from your side.

Of course, for that to work you have to know the domain you’re working in fairly well already otherwise you’re shit out of luck.

@[email protected] · edit-2 10 months ago

LLMs are basically just really fancy search engines. The reason the initial code is garbage is that it’s cut and pasted together from random crap the LLM found on the net under various keywords. It gets more performant when you ask because then the LLM is running a different search. The first search was “assemble some pieces of code to accomplish X”, while the second search was “given this sample of code find parts of it that could be optimized”, two completely different queries.

As noted in another comment the true fatal flaw of LLMs is that they don’t really have a threshold for just saying " I don’t know that" as they are inherently probabilistic in nature. When asked something they can’t find an answer for they assemble a lexically probable response from similar search results even in cases where it’s wildly wrong. The more uncommon and niche your search is the more likely this is to happen. In other words they work well for finding very common information, and increasingly worse the less common that information is.

@[email protected] · edit-2 10 months ago

All AI share a central design flaw of being what people think they should return based on weighted averages of ‘what people are saying’ with a little randomization to spice things up. They are not designed to return factual information because they are not actually intelligent so they don’t know fact from fiction.

ChatGPT is designed to ‘chat’ with you like a real person, who happens to be agreeable so you will keep chatting with it. Using it for any kind of fact based searching is the opposite of what it is designed to do.

@[email protected] · 10 months ago

based on weighted averages of ‘what people are saying’ with a little randomization to spice things up

That is massively oversimplified and not really how neural networks work. Training a neural network is not just calculating averages. It adjusts a very complex network of nodes in such a way that certain input generates certain output. It is entirely possible that during that training process, abstract mechanisms like logic get trained into the system as well, because a good NN can produce meaningful output even on input that is unlike anything it has ever seen before. Arguably that is the case with ChatGPT as well. It has been proven to be able to solve maths/calculating tasks it has never seen before in its training data. Give it a poem that you wrote yourself and have it write an analysis and interpretation - it will do it and it will probably be very good. I really don’t subscribe to this “statistical parrot” narrative that many people seem to believe. Just because it’s not good at the same tasks that humans are good at doesn’t mean it’s not intelligent. Of course it is different from a human brain, so differences in capabilities are to be expected. It has no idea of the physical world, it is not trained to tell truth from lies. Of course it’s not good at these things. That doesn’t mean it’s crap or “not intelligent”. You don’t call a person “not intelligent” just because they’re bad at specific tasks or don’t know some facts. There’s certainly room for improvement with these LLMs, but they’ve only been around in a really usable state for like 2 years or so. Have some patience and in the meantime use it for all the wonderful stuff it’s capable of.

JackGreenEarth · 10 months ago

Not all AIs, since many AIs (maybe even most) are not LLMs. But for LLMs, you’re right. Minor nitpick.

@[email protected] · 10 months ago

It’s literally just Markov chains with extra steps

@[email protected] · 10 months ago

It does remind me of that recent Joe Scott video about the split brain. One part of the brain would do something and the other part of the brain that didn’t get the info because of the split just makes up some semi-plausible answer. It’s like one part of the brain does work at least partially like an LLM.

It’s more like our brain is like a corporation, with a spokesperson, a president and vice president and a number of departments that with semi-independently. Having an LLM is like having only the spokesperson and not the rest of the work force in that building that makes up an AGI.

@[email protected] · 10 months ago

An LLM is like having the receptionist provide detailed information from what they have heard other people talk about in the lobby.

@[email protected] · 10 months ago

An LLM is like having the receptionist provide detailed information from what they have heard other people talk about in the lobby.

Zerlyna · 10 months ago

Yes!!! It doesn’t know Trump has been convicted and told me that even when I give it sources, it won’t upload to a central database for privacy reasons. 🤷‍♀️

Ogmios · edit-2 10 months ago

I wonder if you can get it to say anything bad about any specific person. Might just be that they nuked the ability entirely to avoid lawsuits.

Zerlyna · edit-2 10 months ago

Once I give it links to what it accepts as “reputable sources” (npr, ap, etc.) it concedes politely. But I’m gonna try it now lol.

@[email protected] · 10 months ago

LLM models can’t be updated (i.e., learn), they have to be retrained from scratch… and that can’t be done because all sources of new information are polluted enough with AI to cause model collapse.

So they’re stuck with outdated information, or, if they are being retrained, they get dumber and crazier with each iteration due to the amount of LLM generated crap on the training data.