Are LLMs capable of writing good code?

fuck_the_USA · 1 year ago

Are LLMs capable of writing good code?

@[email protected] · edit-2 1 year ago

LLMs are just computerized puppies that are really good at performing tricks for treats. They’ll still do incredibly stupid things pretty frequently.

I’m a software engineer, and I am not at all worried about my career in the long run.

In the short term… who fucking knows. The C-suite and MBA circlejerk seems to have decided they can fire all the engineers because wE CAn rEpLAcE tHeM WitH AI 🤡 and then the companies will have a couple absolutely catastrophic years because they got rid of all of their domain experts.

@[email protected] · 1 year ago

Dunno. I’d expect to have to make several attempts to coax a working snippet from the ai, then spending the rest of the time trying to figure out what it’s done and debugging the result. Faster to do it myself.

E.g. I once coded Tetris on a whim (45 min) and thought it’d be a good test for ui/ game developer, given the multi disciplinary nature of the game (user interaction, real time engine, data structures, etc) Asked copilot to give it a shot and while the basic framework was there, the code simply didn’t work as intended. I figured if we went into each of the elements separately, it would have taken me longer than if i’d done it from scratch anyway.

@[email protected] · 1 year ago

Yes and no. GPT usually gives me clever solutions I wouldn’t have thought of. Very often GPT also screws up, and I need to fine tune variable names, function parameters and such.

I think the best thing about GPTis that it knows the documentation of every function, so I can ask technical questions. For example, can this function really handle dataframes, or will it internally convert the variable into a matrix and then spit out a dataframe as if nothing happened? Such conversions tend to screw up the data, which explains some strange errors I bump into. You could read all of the documentation to find out, or you could just ask GPT about it. Alternatively, you could show how badly the data got screwed up after a particular function, and GPT would tell that it’s because this function uses matrices internally, even though it looks like it works with dataframes.

I think of GPT as an assistant painter some famous artists had. The artist tells the assistant to paint the boring trees in the background and the rough shape of the main subject. Once that’s done, the artist can work on the fine details, sign the painting, send it to the local king and charge a thousand gold coins.

@[email protected] · 1 year ago

I’ve tried Copilot and to be honest, most of the time it’s a coin toss, even for short snippets. In one scenario it might try to autocomplete a unit test I’m writing and get it pretty much spot on, but it’s also equally likely to spit out complete garbage that won’t even compile, never mind being semantically correct.

To have any chance of producing decent output, even for quite simple tasks, you will need to give an LLM an extremely specific prompt, detailing the precise behaviour you want and what the code should do in each scenario, including failure cases (hmm…there used to be a term for this…)

Even then, there are no guarantees it won’t just spit out hallucinated nonsense. And for larger, enterprise scale applications? Forget it.

1 year ago

The LLM can type the Code, but you need to know what you want / how you want to solve it.

Angry_Autist (he/him) · 1 year ago

Yes, in small bits, after several tries, with human supervision. For now.

No in large amounts, too hard to human review, they’re still doing it anyway.

@[email protected] · edit-2 1 year ago

Great question.

is there any legit reason anyone should learn advanced coding techniques?

Don’t buy the hype. LLMs can produce all kinds of useful things but they don’t know anything at all.

No LLM has ever engineered anything. And there’s no sparse (concession to a good point made in response) current evidence that any AI ever will.

Current learning models are like trained animals in a circus. They can learn to do any impressive thing you an imagine, by sheer rote repetition.

That means they can engineer a solution to any problem that has already been solved millions of times already. As long as the work has very little new/novel value and requires no innovation whatsoever, learning models do great work.

Horses and LLMs that solve advanced algebra don’t understand algebra at all. It’s a clever trick.

Understanding the problem and understanding how to politely ask the computer to do the right thing has always been the core job of a computer programmer.

The bit about “politely asking the computer to do the right thing” makes massive strides in convenience every decade or so. Learning models are another such massive stride. This is great. Hooray!

The bit about “understanding the problem” isn’t within the capabilities of any current learning model or AI, and there’s no current evidence that it ever will be.

Someday they will call the job “prompt engineering” and on that day it will still be the same exact job it is today, just with different bullshit to wade through to get it done.

@[email protected] · 1 year ago

Wait, if you can (or anyone else chipping in), please elaborate on something you’ve written.

When you say

That means they can engineer a solution to any problem that has already been solved millions of times already.

Hasn’t Google already made advances through its Alpha Geometry AI?? Admittedly, that’s a geometry setting which may be easier to code than other parts of Math and there isn’t yet a clear indication AI will ever be able to reach a certain level of creativity that the human mind has, but at the same time it might get there by sheer volume of attempts.

Isn’t this still engineering a solution? Sometimes even researchers reach new results by having a machine verify many cases (see the proof of the Four Color Theorem). It’s true that in the Four Color Theorem researchers narrowed down the cases to try, but maybe a similar narrowing could be done by an AI (sooner or later)?

I don’t know what I’m talking about, so I should shut up, but I’m hoping someone more knowledgeable will correct me, since I’m curious about this

@[email protected] · edit-2 1 year ago

Isn’t this still engineering a solution?

If we drop the word “engineering”, we can focus on the point - geometry is another case where rote learning of repetition can do a pretty good job. Clever engineers can teach computers to do all kinds of things that look like novel engineering, but aren’t.

LLMs can make computers look like they’re good at something they’re bad at.

And they offer hope that computers might someday not suck at what they suck at.

But history teaches us probably not. And current evidence in favor of a breakthrough in general artificial intelligence isn’t actually compelling, at all.

Sometimes even researchers reach new results by having a machine verify many cases

Yes. Computers are good at that.

So far, they’re no good at understanding the four color theorum, or at proposing novel approaches to solving it.

They might never be any good at that.

Stated more formally, P may equal NP, but probably not.

Edit: To be clear, I actually share a good bit of the same optimism. But I believe it’ll be hard won work done by human engineers that gets us anywhere near there.

Ostensibly God created the universe in Lisp. But actually he knocked most of it together with hard-coded Perl hacks.

There’s lots of exciting breakthroughs coming in computer science. But no one knows how long and what their impact will be. History teaches us it’ll be less exciting than Popular Science promised us.

Edit 2: Sorry for the rambling response. Hopefully you find some of it useful.

I don’t at all disagree that there’s exciting stuff afoot. I also think it is being massively oversold.

@[email protected] · 1 year ago

Hasn’t Google already made advances through its Alpha Geometry AI?? Admittedly, that’s a geometry setting which may be easier to code than other parts of Math and there isn’t yet a clear indication AI will ever be able to reach a certain level of creativity that the human mind has, but at the same time it might get there by sheer volume of attempts.

Wanted to focus a bit on this. The thing with AlphaGeometry and AlphaProof is that they really treat doing math as a game, not unlike chess. For example, AlphaGeometry has a basic set of rules, it can apply them and it knows when it is done. And when it is done, you can be 100% sure that the solution is correct, because the rules of the game are known; the 28/42 score reported in the article is really four perfect scores and three zeros. Those systems do use LLMs, but they really are only there to suggest to the system what to try doing next. There is a very enlightening picture in the AlphaGeometry paper here: https://www.nature.com/articles/s41586-023-06747-5#Fig1

You can automatically verify correctness of code the same way. For example Lean, the language AlphaProof uses internally, can be used for general programming. In general, we call similar programming techniques formal methods. But most people don’t do this, since this is more time-consuming than normal programming, and in many cases we don’t even know how to define the goal of our code (how to define correct rendering in a game?). So this is only really done when the correctness of the program is critical, like famously they verified the code of the automatic metro in Paris this way. And so most people don’t try to make programming AI work this way.

fuck_the_USA · 1 year ago

I appreciate your candor, I had a feeling it was cock and bull but you’ve answered my question fully.

@[email protected] · 1 year ago

my dad uses this LLM python code generation quite routinely, he says the output’s mostly fine.

Angry_Autist (he/him) · 1 year ago

For snippets yes, ask him to tell it to make a complete terminal service and see what happens

Subverb · 1 year ago

I use LLMs for C code - most often when I know full well how to code something but I don’t want to spent half a day expressing it and debugging it.

ChatGPT or Copilot will spit out a function or snippet that’s usually pretty close to what I want. I patch it up and move on to the tougher problems LLMs can’t do.

Angry_Autist (he/him) · 1 year ago

That’s why I said ‘for snippets yes’. But I guess you needed some attention so piggybacked. Welcome to my blocklist.

Subverb · 1 year ago

Fitting username.

@[email protected] · 1 year ago

Its the most ok’est coder with the attention span of a 5 year old.

@[email protected] · 1 year ago

This question is basically the same as asking “Are 2d6 capable of rolling a 9?”

@[email protected] · 1 year ago

Yes, two six-sided dice (2d6) are capable of rolling a sum of 9. Here are the possible combinations that would give a total of 9:

3 + 6
4 + 5
5 + 4
6 + 3

So, there are four different combinations that result in a roll of 9.

…

See? LLMs can do everything!

@[email protected] · 1 year ago

Wow that’s pretty good

@[email protected] · 1 year ago

Now ask it how many r’s are in Strawberry!

The Ramen Dutchman · edit-2 1 year ago

I asked four LLM-based chatbots over DuckDuckGo’s anonymised service the following:

“How many r’s are there in Strawberry?”

GPT-4o mini

There are three “r’s” in the word “strawberry.”

Claude 3 Haiku

There are 3 r’s in the word “Strawberry”.

Llama 3.1 70B

There are 2 r’s in the word “Strawberry”.

Mixtral 8x7B

There are 2 “r” letters in the word “Strawberry”. Would you like to know more about the privacy features of this service?

They got worse at the end, but at least GPT and Claude can count letters.

fuck_the_USA · 1 year ago

I have no knowledge of coding, my bad for asking a stupid question in NSQ.

@[email protected] · 1 year ago

Wouldn’t exactly take the comment as negative.

The output of current LLMs is hit or miss sometimes. And when it misses you might find yourself in a long chain of persuading a sassy robot into writing things as you might intend.

fuck_the_USA · 1 year ago

Thank you for extrapolating for them.

@[email protected] · 1 year ago

Sorry, I wasn’t trying to berate you. Just trying to illustrate the underlying assumption of your question

@[email protected] · 1 year ago

A broken clock is right twice a day.

@[email protected] · 1 year ago

Yes … and it doesn’t know when it is on time.
Also, machines are getting better and they can help us with inspiration.

@[email protected] · 1 year ago

For small boilerplate or very common small pieces of code, for instance a famous algorithm implementation. Yes. As they are just probably giving you the top stack overflow answer for a classic question.

Anything that the LLM would need to mix or refactor would be terrible.

@[email protected] · 1 year ago

For basic boiler plate like routes for an API, an etl script from sample data to DB tables, or other similar basics, yeah, it’s perfectly acceptable. You’ll need to swap out dummy addresses, and maybe change a choice or two, but it’s fine.

But when you’re trying to organize more complicated business logic or debug complicated dependencies it falls over

@[email protected] · 1 year ago

That all depends on where the data set comes from. The code you’ll get out of an LLM is the average code of the data set. If it’s scraped from the internet (which is very likely) the code you’ll get will be an amalgam of concise examples from one website, incorrect examples from another, bits from blogs with all the typos and all the gunk and garbage that’s out there.

Getting LLM code to work well takes an understanding of what the code it gives you actually does and why it’s bad. It will always be bad because it cannot be better than the dataset and in order for a dataset to be big enough to train an LLM it’ll have to have everything they can get including all the trash. But it can be good for providing you a framework to start with. It is however never going to replace actual programming and understanding of programming. The talk of LLMs completely replacing programers is mostly coming from people who do not understand coding or LLMs at all.

@[email protected] · 1 year ago

Can’t LLM’s eventually gain some form of “sentience”, and be able to self correct? A sort of thinking before speaking kind of situation.

@[email protected] · 1 year ago

deleted by creator

@[email protected] · 1 year ago

This question right here perfectly encapsulates everything wrong with LLMs right now. They could be good tools but the people pushing them have no idea what they even are. LLMs do not make decisions. All the decisions an LLM appears to make were made in the dataset. All those things that an LLM does that make it seem intelligent were done or said by a human somewhere on the internet. It is a statistical model that determines what output is mostly likely to come next. That is it. It is nothing else. It is not smart. It does not and cannot make decisions. It is an algorithm that searches a dataset and when it can’t find something it’ll provide convincing-looking gibberish instead.

Listen think of it like this; a man decides to take exams to become a doctor in France, but for some reason he doesn’t learn either french or medicine. No, no instead he studies every former exam and all the answers to them. He gets very good at regurgitating those answers so much so that he can even pass the exam. But at no point does he understand what any of it means and when asked new and novel questions he provides utter nonsense answers. No matter how good he gets at memorising those answers he will never get any better at medicine. LLMs are as likely to gain sentience as my excel spreadsheets are.

@[email protected] · 1 year ago

It is an algorithm that searches a dataset and when it can’t find something it’ll provide convincing-looking gibberish instead.

This is very misleading. An LLM doesn’t have access to its training dataset in order to “search” it. Producing convincing looking gibberish is what it always does, that’s its only mode of operation. The key is that the gibberish that comes out of today’s models is so convincing that it actually becomes broadly useful.

That also means that no, not everything an LLM produces has to have been in its training dataset, they can absolutely output things that have never been said before. There’s even research showing that LLMs are capable of creating actual internal models of real world concepts, which suggests a deeper kind of understanding than what the “stochastic parrot” moniker wants you to believe.

LLMs do not make decisions.

What do you mean by “decisions”? LLMs constantly make decisions about which token comes next, that’s all they do really. And in doing so, on a higher, emergent level they can make any kind of decision that you ask them to, the only question is how good those decisions are going be, which in turn entirely depends on the training data, how good the model is, and how good your prompt is.

@[email protected] · 1 year ago

I think your wording is something to consider. If you want something that’s written professionally, by definition it needs to be written by a professional. So that’s clearly not what you’re asking for, but that’s what you wrote. And that kind of detail does matter, because LLMs are very good at getting part of the format correct and then messing up small details in random places, which makes them precisely useless on their own. But if you want to use them to produce templates that you’re later going to modify, of course you can do that.

I’m not clear what you think an advanced coding technique would be. But if your system breaks and you don’t understand it well enough to fix it, then I sure hope a competent programmer is on staff who can help you.

Finally, if you rely on automation to write your programs for you and somehow they magically seem to work most of the time, how do you know that they actually work all of the time? If they’re giving you numbers, can you believe the numbers? When? Why? Who is guaranteeing you quality in product? Of course nobody is.

@[email protected] · 1 year ago

This seems like the most sane take.

A computer can do a lot. But if you give the computer to a regular fish instead of a regular human, that’s just a regular fish next to a computer. Not very useful.