DuckDuckGo, Bing, Mojeek, and other search engines are not returning full Reddit results any more.

  • @[email protected]
    link
    fedilink
    English
    211 year ago

    I’m not understanding what stops a search engine from scraping a publicly accessible website. ?

    • @[email protected]
      link
      fedilink
      English
      171 year ago

      robots.txt, I guess? Yes, you can just ignore it, but you shouldn’t, if you develop a responsible web scraper.

      • Hot PotatoOP
        link
        fedilink
        English
        181 year ago

        Also, rate limiting. A publicly accessible website doesn’t mean that it will allow scrapers to read millions of pages each week. They can easily identify and block scrapers because of the pattern of their activity. I don’t know if Reddit has rate-limiting, but I wouldn’t be surprised if they implement one.

      • @[email protected]
        link
        fedilink
        English
        31 year ago

        Doesn’t seem legal that a robots.txt could pick and choose who scrapes. Seems like legally it would have to be all or nothing. Here’s hoping one of the search engines ignores it and makes it a legal case.

        • capital
          link
          fedilink
          English
          51 year ago

          You’d probably feel differently if it were your service. Should you be able to control who scrapes your sites or should that be all or nothing?

          For the record, I fucking hate what the internet is becoming. I naively believed that even if shit got cordoned off into the walled gardens that are mobile phone apps, the web would remain as open as it was. This is a terrible sign of things to come.

          • @[email protected]
            link
            fedilink
            English
            1
            edit-2
            1 year ago

            No, I wouldn’t feel differently. In fact letting search engines scrape and point to your content is what leads people to your site. It’s free advertising. If you’re going to let one search engine in, you should let them all in. If you want to be public, be public. Otherwise put up a login firewall and go private.

            • capital
              link
              fedilink
              English
              21 year ago

              It’s not just search engines. Lots of people on Mastodon were using robots.txt to block ChatGPT (and any other LLM company they knew of) from scraping their sites/blogs.

              I disagree, to a point. I want to be able to control my services to the greatest extent possible, including picking who scrapes me.

              On the other hand, orgs as large as Google doing this poses a real threat to how the internet works right now which I hate.

        • @[email protected]
          link
          fedilink
          English
          131 year ago

          Actually currently it contains this:

          User-agent: *
          Disallow: /
          

          Well, that actually is a blanket ban for everyone, so something else must be at play here.

  • Todd Bonzalez
    link
    fedilink
    English
    561 year ago

    Oh good, I can’t view most reddit threads without an account anymore, so it’ll be nice to see those results go away.

    • @[email protected]
      link
      fedilink
      English
      21 year ago

      I wasn’t aware of this, when did it start? So far, it has never happened to me not to be able to view reddit threads

      • @[email protected]
        link
        fedilink
        English
        11 year ago

        They’re so scared from ai scrapers stealing “their” content that they blocked wide ranges of IP address

        I can’t see it from my VPN for example

        • @[email protected]
          link
          fedilink
          English
          7
          edit-2
          1 year ago

          Yes you can. I literally just did it now to check, and it works fine. I get the “download the app” nonsense, edit the URL, then I can see the content just fine. It works every time.

          • @[email protected]
            link
            fedilink
            English
            31 year ago

            I use reddit without being logged in a bunch. Worked fine earlier today! I’m on Firefox if that matters.

  • @[email protected]
    link
    fedilink
    English
    171 year ago

    Whenever google isn’t giving me proper results (so almost every time) I add “Lemmy” to the search and it gives me… Reddit links.

  • MudMan
    link
    fedilink
    201 year ago

    Okay, hear me out, we should make a different service, like a network of computers where people can freely post information on some sort of page that other people can freely access without needing to log in to a million services. Maybe we could also add like little conversation boards to those to allow people to ask and answer questions, too.

    Wild, I know, but I think there is some opportunity there.

      • Karyoplasma
        link
        fedilink
        English
        81 year ago

        Yeah, not gonna happen. Fact is, Johnny Everyday doesn’t give a shit he’s flooded with ads, tracked on every click and is now forced into exclusivity deals because “somebody has to pay for this stuff and it ain’t me”. Likewise, the people giving the answers don’t care that they literally work for free for some megacorporations. It’s a rotten system and society is to blame.

        We are not making a difference here either, not fooling myself into thinking that. I mainly use Lemmy because I can access it with any app I want while sitting on the loo.

  • @[email protected]
    link
    fedilink
    English
    981 year ago

    They should include reddit in the list of search engines that don’t work well with reddit

  • @[email protected]
    link
    fedilink
    English
    661 year ago

    Every time I click a Reddit link now it’s just “download the app to verify your age” regardless of what it is

    • @[email protected]
      link
      fedilink
      English
      501 year ago

      I feel your pain.

      I edit the URL to remove the first part of the URL and replace it with “http://old.reddit.com”. That still seems to work, last I checked, but I fully expect it to be killed any day now.

      • @[email protected]
        link
        fedilink
        English
        381 year ago

        There’s a firefox extension “old reddit redirect” that’ll do this for you. Been using it for years. But yeah any day now I expect old reddit to be offline.

        • @[email protected]
          link
          fedilink
          English
          51 year ago

          I don’t, I’m convinced it’s a “if you raise the price of the hotdog, I will kill you” kind of deal (as in, some of the devs still use the old UI for themselves)

            • @[email protected]
              link
              fedilink
              English
              2
              edit-2
              1 year ago

              That’s different, they didn’t remove the apis that enable mobile apps, they just made them unreasonably expensive. But you can theoretically still use them

  • katy ✨
    link
    fedilink
    English
    4
    edit-2
    1 year ago

    i like how the only reason i’d use reddit is to search “site:reddit.com/r/thesimpsons random simpsons quote” to see what the simpsons reddit says about the episode.

  • _haha_oh_wow_
    link
    fedilink
    English
    33
    edit-2
    1 year ago

    Weird, most of the results I get from Google’s search are from Quora (and they fucking suck). Google as a search engine has been going downhill for a while now. Reddit has becomes an increasingly spammy shithole full of corporate and political astroturfing too.