• Aatube
    link
    fedilink
    122 years ago

    robots.txt is purely textual; you can’t run JavaScript or log anything. Plus, one who doesn’t intend to follow robots.txt wouldn’t query it.

    • @[email protected]
      link
      fedilink
      English
      152 years ago

      You’re second point is a good one, but you absolutely can log the IP which requested robots.txt. That’s just a standard part of any http server ever, no JavaScript needed.

      • @[email protected]
        link
        fedilink
        English
        92 years ago

        You’d probably have to go out of your way to avoid logging this. I’ve always seen such logs enabled by default when setting up web servers.

    • @[email protected]
      link
      fedilink
      English
      452 years ago

      If it doesn’t get queried that’s the fault of the webscraper. You don’t need JS built into the robots.txt file either. Just add some line like:

      here-there-be-dragons.html
      

      Any client that hits that page (and maybe doesn’t pass a captcha check) gets banned. Or even better, they get a long stream of nonsense.

    • @[email protected]
      link
      fedilink
      English
      102 years ago

      People not intending to follow it is the real reason not to bother, but it’s trivial to track who downloaded the file and then hit something they were asked not to.

      Like, 10 minutes work to do right. You don’t need js to do it at all.