Reddit says Microsoft’s Bing, Anthropic, and Perplexity have scraped its data without permission. “It has been a real pain in the ass to block these companies.”

  • @[email protected]
    link
    fedilink
    English
    16
    edit-2
    1 year ago

    An absolutely prodigious back catalog of high quality images, interviews, and explainers. A treasure trove of historical content that’s been heavily indexed and participant-weighted for relevancy. And the bulk of it predates the infestation of AI, so its valuable just as sampling data of original human content for further iterative development of ChatGPT and other LLMs.

    • RBG
      link
      fedilink
      English
      11 year ago

      I don’t know about the AI part. The major companies had plenty of time scraping everything on the internet, or am I simplyifing the effort too much in my head?