• @[email protected]
    link
    fedilink
    English
    132 years ago

    I get the whole community resource and all that hoorah, but what bothers me the most is that C*O somewhere that’s padding his bonus and CV, waiting for the ship to sink so he can move on to the next thing where he can sing praises to the AI revolution.

  • @[email protected]
    link
    fedilink
    12 years ago

    When you check the traffic of website, it seem a bit late to take such a action.

    It seems pretty good, especially vscode extension but people already implement there many generative ai solutions out there

    • @[email protected]
      link
      fedilink
      English
      332 years ago

      Hence recursion since Google just takes you back, which leads to stack overflow because there is no exit condition.

      • @[email protected]
        link
        fedilink
        English
        62 years ago

        This bullshit happens too often lmao

        “Googles problem, finds post”

        “Why are you asking this use Google”

        Gee, thanks

      • The Giant Korean
        link
        fedilink
        English
        82 years ago

        Which would be especially messed up if your original question was about recursion.

    • AnonymousLlama
      link
      fedilink
      62 years ago

      “to keep the quality of answers high, we may arbitrarily close questions, regardless of how many upvotes it gets and how helpful it is” - stackoverflow

  • Carlos Solís
    link
    fedilink
    English
    272 years ago

    Stack Overflow is unique as a page, in the sense that its contributions are under a license that allows for reuse (Creative Commons Share-Alike) as long as the individual users are properly credited. Does this mean that OverflowAI keeps the credit metadata and knows who wrote each individual part of an answer?

    • @[email protected]
      link
      fedilink
      English
      92 years ago

      Then I’m guilty of breaking the license. I have always been stealing code from Stack Overflow. Well, since I’m a senior dev right now I steal only from answers.

    • @[email protected]
      link
      fedilink
      English
      172 years ago

      AI doesn’t work that way. No one wrote “part of the answer.” It’s more like each contributor casted a vote on what the next token should be and it randomly picks one of the top ten voted tokens. (Very very roughly.)

      • Carlos Solís
        link
        fedilink
        English
        42 years ago

        Fair enough, but at least there should be a way for OverflowAI to list which contributors had the strongest link to the given answer, right?

        • ShustOne
          link
          fedilink
          English
          62 years ago

          Check out the article and feature video. It does appear to link to answers it pulled from. Bing and Bard do the same. Posters saying it’s impossible are mistaken.

          • @[email protected]
            link
            fedilink
            English
            2
            edit-2
            2 years ago

            If it’s doing a search for the code, pulling it in to the context, and then spitting it back out in slightly modified form, then it can attribute the source it pulled in. That’s a very different thing from the AI because code that is pulled into context by a search had a strong influence on the output. The output is still generated the same way but it would be reasonable to credit the author of the code that is pulled in. However, the code in the training data cannot be credited. How you would pull in just the right piece of code in the first place though is a bit of a mystery to me.

            • TehPers
              link
              fedilink
              English
              2
              edit-2
              2 years ago

              There are a few ways of finding which code is relevant, but one way is to use some sort of vector database to perform the search using embeddings generated from the Qs, As, and query.

              Embeddings are essentially semantic representations of the text which can be compared to each other for similarity.

          • Carlos Solís
            link
            fedilink
            English
            42 years ago

            Thanks for the TLDW - I could ogle a bit of the article but since I was at work, I couldn’t just play the video out loud.

          • wagesj45
            link
            fedilink
            42 years ago

            Posters aren’t saying that its impossible to put search results through an LLM and ask it to cite the source it reads. They’re saying that the neural networks, as used today in LLMs, do not store token attribution in the vocabulary or per node. You can implement a system for the neural network to work in that provides it the proper input (search results) and prodding (a prompt that encourages the network to biasing toward citation), not that the single LLM can conceptualize of that on its own.

        • @[email protected]
          link
          fedilink
          English
          14
          edit-2
          2 years ago

          Edit: definitely read the other responses because apparently there are some techniques I wasn’t aware of and don’t understand nearly as well as I understand the underlying AI technology - and I’m only an enthusiast layman.

          I don’t think there is any way of doing that. AI is like a huge matrix that says ‘if (’ is followed by

          ’ x’: 60%

          ’ foo’: 19%

          ’ person’: 9%

          Etc.

          And then it does it all over again for the next token based on randomly selecting one of the tokens and then saying ‘if ( person’ is followed by

          ‘.id’: 30%

          ‘.name’: 27%

          Etc.

          So just to write a simple ‘if person.name.startsWith(“foo”) {’ is the aggregate result of thousands of contributors - really pretty much every author of every code snippet ingested from the training material.

          There is no single author even if the code matches existing code token for token. The only exception would be code that is so esoteric that there is only a single author writing code that does a particular thing. But even in that case, there is nothing in the probability matrix to indicate that a particular sequence of tokens is unique to a certain author. Best you could do is full text search a line of code to see if it matches anything in the training data and if there is a very small set of authors to whom credit might be assigned. That might be possible, but it would be an add-on (and significant performance hit) to the actual AI itself. Sort of like how browser integrated AI just runs a search and feeds the result into the context to make the output more likely to contain information in the top results.

          • TehPers
            link
            fedilink
            English
            3
            edit-2
            2 years ago

            It depends. The base model, sure you can’t really figure out what percentage of it came from which data source since there’s just too many data sources and that information is lost along the way. They’re likely not using the entirety of SO to generate answers though. Retraining LLMs is ungodly expensive, so they can’t retrain it every time a new Q or A is created, and even retraining on a regular basis would be impractical.

            Instead, without knowing exactly how they’re doing it of course, my guess is they’re pulling relevant Q&As from their database, then using those results to improve the response (for example by providing them as context). If you’re interested, look into retrieval-augmented generation.

    • ShustOne
      link
      fedilink
      English
      52 years ago

      It does seem to do that in the feature video. It appears to link to all the answers it pulled from.

  • @[email protected]
    link
    fedilink
    English
    302 years ago

    It really puts their stance on “no AI generated answers” in a different light.

    Basically, “no AI generated answers unless we do it”.

    • @[email protected]
      link
      fedilink
      English
      32 years ago

      Well, using ai-generated answers to train their own ai would bring down the quality of answers and worse quality means lesser money. Don’t you want them to make any money??!!

  • @[email protected]
    link
    fedilink
    English
    142 years ago

    I look forward to the AI trend fizzling out. It’s only slightly less silly than the cryptocurrency trend was.

      • @[email protected]
        link
        fedilink
        22 years ago

        This artificial pseudointelligence exists because there’s the “gee whiz, that’s cool” of a computer talking like a person, and a bunch of hype chasers looking to cash in. Much like cryptocurrency before it, and the dot-com boom before that, there is little substance to it, and most of it will be commercially irrelevant a decade from now.

  • @[email protected]
    link
    fedilink
    English
    3
    edit-2
    2 years ago

    Hah, good to know that even on [email protected] there are people who agree that stack overflow moderation is too draconian to ask questions in anymore. It’s a good resource, though, so an LLM will probably be the answer to make the knowledge base more usable without angering its elder gods.

    • @[email protected]
      cake
      link
      fedilink
      12 years ago

      Probably the same data that ChatGPT or Google Bard has been trained on which to me makes the distinction moot

  • @[email protected]
    link
    fedilink
    English
    152 years ago

    I’m not liking the announced changes to search. That sounds like we will be losing the lexical search and in exchange we will be getting the same technology that allows google to answer questions different to the one we asked.

    How many minutes between starting to use OverflowAI until we get something like “As a large language model trained by the Stack Exchange Network i can not answer duplicated questions”.

    • @[email protected]
      cake
      link
      fedilink
      12 years ago

      That’s when I go back to ChatGPT or Google Bard. It’s helped me with problems and less aggravation than SO

  • @[email protected]
    link
    fedilink
    English
    52 years ago

    Well that explains why they did a 180 on their “no AI” rule, which has the mods in a tizzy.

    Who knows, maybe it’ll cut back on the toxicity in the sense that you don’t have to interact with toxic people ¯\_(ツ)_/¯

    • wagesj45
      link
      fedilink
      182 years ago

      I thought the point was a mental BDSM exercise where you come to others for help and are instead punished for your ignorance.

    • @[email protected]
      link
      fedilink
      English
      102 years ago

      I use ChatGPT frequently for programming and I’ve found it to be pretty good.

      The key is using it conversational nature as this gets better results.

      Start simple and expand. You can’t just ask it wrote huge chunks of code.

      • @[email protected]
        link
        fedilink
        English
        52 years ago

        Yeah works well, as long as the code is rather simple and it occurred rather often in the training set. But I seldom use it currently (got a little bit more complex stuff going on). It’s good though to find new stuff (as it often introduces a new library I haven’t known yet). But actual code… I’m writing myself (tried it often, and the quality just isn’t there… and I think it even got worse over the last couple of months as also studies suggest)

    • @[email protected]
      cake
      link
      fedilink
      English
      42 years ago

      Agreed. I got ChatGPT to convert python code to JavaScript and I got a buggy code sample back with new bugs.

      • @[email protected]
        link
        fedilink
        12 years ago

        I’ve found it great for asking documentation questions. It saves me a ton of time having to search through documentation myself. The problem is when it encounters something it doesn’t have information on, it’ll just confidently make shit up, and if you’re not enough of an expert to recognize when that happens, you can be mislead. It still saves me time, but I use it as a recall tool to get me started when I’m learning to do something new, I’d never use the code it puts out without reading through it line by line. I’m also experienced enough to know when it’s wrong and how to refactor its examples. People new to programming could get set down the wrong path by over relying on gpt to teach them.

    • @[email protected]
      link
      fedilink
      32 years ago

      The code it gives me generally just throws me into the debug stage, skipping right over the me writing buggy code stage.

      • @[email protected]
        link
        fedilink
        English
        22 years ago

        Good summary. For some people iterating over existing code is preferred.

        For others writing new code (and not maintaining it) feels better.

    • @[email protected]
      link
      fedilink
      22 years ago

      I’ve gotten really good results asking chat gpt for programming help. Problem is that it’s wrong like 10% of the time, and when it’s wrong it’s very confidently incorrect. That wasn’t a problem for me because I knew when it was wrong and could course correct it and get the correct solution and it still saved me time and helped me eventually get to the right solution. But if someone who’s still getting started is trying to use chat gpt to learn, they could easily be mislead because they won’t know when its output is wrong.

        • @[email protected]
          link
          fedilink
          22 years ago

          Definitely depends on the type of question. I find for documentation type questions I get the 90% good answers, like how do I do something with this library, it’s good, which makes sense because that libraries documentation is probably in the training data. But for more open ended questions, like how do I solve this problem, I see similar performance to what you’re saying. I think it’s a good retrieval and synthesises tool which can really save a ton of time if you already have a high level plan of action and just use it to fill in some specific details.

  • @[email protected]
    cake
    link
    fedilink
    English
    132 years ago

    I understand Google and Microsoft getting into it as it makes sense as a “better” Google search but for StackOverflow that sounds like they have just given up on their current platform.