• TorJansen
    link
    fedilink
    English
    31 month ago

    And soon, the already AI-flooded net will be filled with so much nonsense that it becomes impossible for anyone to get some real work done. Sigh.

  • Dr. Moose
    link
    fedilink
    English
    2
    edit-2
    1 month ago

    Considering how many false positives Cloudflare serves I see nothing but misery coming from this.

    • @[email protected]
      link
      fedilink
      English
      51 month ago

      In terms of Lemmy instances, if your instance is behind cloudflare and you turn on AI protection, federation breaks. So their tools are not very helpful for fighting the AI scraping.

        • @[email protected]
          link
          fedilink
          English
          21 month ago

          I’m not sure what can be done at the free tier. There is a switch to turn on AI not blocking, and it breaks federation.

          You can’t whitelist domains because federation could come from and domain. Maybe you could somehow whitelist /inbox for the ActivityPub communication, but I’m not sure how to do that in Cloudflare.

    • @[email protected]
      link
      fedilink
      English
      51 month ago

      Lol I work in healthcare and Cloudflare regularly blocks incoming electronic orders because the clinical notes “resemble” SQL injection. Nurses type all sorts of random stuff in their notes so there’s no managing that. Drives me insane!

  • @[email protected]
    link
    fedilink
    English
    281 month ago

    I’m imagining a sci-fi spin on this where AI generators are used to keep AI crawlers in a loop, and they accidentally end up creating some unique AI culture or relationship in the process.

      • @[email protected]
        link
        fedilink
        English
        19
        edit-2
        1 month ago

        We truly are getting dumber as a species. We’re facing climate change but running some of the most power hungry processers in the world to spit out cooking recipes and homework answers for millions of people. All to better collect their data to sell products to them that will distract them from the climate disaster our corporations have caused. It’s really fun to watch if it wasn’t so sad.

    • @[email protected]
      link
      fedilink
      English
      11 month ago

      It certainly sounds like they generate the fake content once and serve it from cache every time: “Rather than creating this content on-demand (which could impact performance), we implemented a pre-generation pipeline that sanitizes the content to prevent any XSS vulnerabilities, and stores it in R2 for faster retrieval.”

  • @[email protected]
    link
    fedilink
    English
    1
    edit-2
    1 month ago

    You have Thirteen hours in which to solve the labyrinth before your baby AI becomes one of us, forever.

  • DigitalDilemma
    link
    fedilink
    English
    381 month ago

    Surprised at the level of negativity here. Having had my sites repeatedly DDOSed offline by Claudebot and others scraping the same damned thing over and over again, thousands of times a second, I welcome any measures to help.

    • @[email protected]
      link
      fedilink
      English
      241 month ago

      I think the negativity is around the unfortunate fact that solutions like this shouldn’t be necessary.

    • @[email protected]
      link
      fedilink
      English
      41 month ago

      thousands of times a second

      Modify your Nginx (or whatever web server you use) config to rate limit requests to dynamic pages, and cache them. For Nginx, you’d use either fastcgi_cache or proxy_cache depending on how the site is configured. Even if the pages change a lot, a cache with a short TTL (say 1 minute) can still help reduce load quite a bit while not letting them get too outdated.

      Static content (and cached content) shouldn’t cause issues even if requested thousands of times per second. Following best practices like pre-compressing content using gzip, Brotli, and zstd helps a lot, too :)

      Of course, this advice is just for “unintentional” DDoS attacks, not intentionally malicious ones. Those are often much larger and need different protection - often some protection on the network or load balancer before it even hits the server.

  • @[email protected]
    link
    fedilink
    English
    81 month ago

    Cloudflare kind of real for this. I love it.

    It makes perfect sense for them as a business, infinite automated traffic equals infinite costs and lower server stability, but at the same time how often do giant tech companies do things that make sense these days?

  • @[email protected]
    link
    fedilink
    English
    41 month ago

    I am not happy with how much internet relies on cloudflare. However, they have a strong set of products

  • @[email protected]
    link
    fedilink
    English
    441 month ago

    So the world is now wasting energy and resources to generate AI content in order to combat AI crawlers, by making them waste more energy and resources. Great! 👍

    • @[email protected]
      link
      fedilink
      English
      10
      edit-2
      1 month ago

      The energy cost of inference is overstated. Small models, or “sparse” models like Deepseek are not expensive to run. Training is a one-time cost that still pales in comparison to, like, making aluminum.

      Doubly so once inference goes more on-device.

      Basically, only Altman and his tech bro acolytes want AI to be cost prohibitive so he can have a monopoly. Also, he’s full of shit, and everyone in the industry knows it.

      AI as it’s implemented has plenty of enshittification, but the energy cost is kinda a red herring.

  • _cryptagion [he/him]
    link
    fedilink
    English
    11 month ago

    Now this is a AI trap worth using. Don’t waste your money and resources hosting something yourself, let Cloudflare do it for you if you don’t want AI scraping your shit.

  • Deebster
    link
    fedilink
    English
    18
    edit-2
    1 month ago

    So they rewrote Nepenthes (or Iocaine, Spigot, Django-llm-poison, Quixotic, Konterfai, Caddy-defender, plus inevitably some Rust versions)

    Edit, but with ✨AI✨ and apparently only true facts

    • 野麦さん
      link
      fedilink
      English
      1
      edit-2
      1 month ago

      It’s the consequences of the MIT and Apache licenses showing up in real time.

      GPL your software, people!