• @[email protected]
    link
    fedilink
    164 months ago

    I just think it’s odd how many verbs chat gpt uses like “crucial”, “essential”, and “leverage”. Like I don’t use that shit in regular conversations or papers. It’s like a small hint that it wants to be caught.

    • @[email protected]
      link
      fedilink
      144 months ago

      The training data probably includes a lot more formal writing. As the major selling point of chatgpt is it sounding like it “knows” things. More “complex” verbiage is helpful to that. This type of writing is more common in things like textbooks and scientific writing in general which have been at least part of its training data.

    • @[email protected]
      link
      fedilink
      24 months ago

      Yeah, it’s overly formal, but I do use each of those in regular conversion, just a lot more sparingly than AI seems to.

    • @[email protected]
      link
      fedilink
      13
      edit-2
      4 months ago

      LLMs, in fact, have slop profiles (aka overused tokens/phrases) common to the family/company, often from “inbreeding” by training on their own output.

      Sometimes you can tell if new model “stole” output from another company this way. For instance, Deepseek R1 is suspiciously similar to Google Gemini, heh.

      This longform writing benchmark tries to test/measure this (click the I on each model for infographics):

      https://eqbench.com/creative_writing_longform.html

      As well as some some disparate attempts on GitHub (actually all from the eqbench dev): https://github.com/sam-paech/slop-forensics

      https://github.com/sam-paech/antislop-vllm