• @[email protected]
    link
    fedilink
    381 year ago

    Well that’s part of the thing. Web scraping doesn’t get covered by policies. Like, they could ban your ip or any accounts you have, but web scraping itself will always be acceptable. It’s why projects like NewPipe and Invidious don’t care about YouTube cease and desist letters.

    • @[email protected]
      link
      fedilink
      31 year ago

      Oops look like this community hasn’t been reviewed. Login if you still want to see the content.

      • @[email protected]
        link
        fedilink
        31 year ago

        Yea, I’ve seen those pop-ups when trying to find something out. It sucks but isn’t a significant barrier to web scraping

      • @[email protected]
        link
        fedilink
        English
        11 year ago

        Doesn’t that only happen on the mobile version? Either way, it’s stupid and annoying. Google should start de-ranking sites that add barriers to content, but I know they never will.

        • @[email protected]
          link
          fedilink
          11 year ago

          I tried that on my desktop. So long as you are not actually logged in you cannot see the communities that are too small for a review or too adult after a review.

          • @[email protected]
            link
            fedilink
            English
            21 year ago

            Ugh, what a fucking shitshow. I know it won’t happen quickly or easily, but I’m hoping to see more people on federated platforms in the next decade or two. It’s the only way for us to take the internet back from these greedy bastards.

    • @[email protected]
      link
      fedilink
      English
      11 year ago

      Is it any different for an “API”? I don’t think there’s a very big difference between an HTTP endpoint that returns HTML and an HTTP endpoint that returns JSON.

      • folkrav
        link
        fedilink
        11 year ago

        Parsing absolutely comes with a lot more overhead. Especially since many websites integrate a lot of JS interactivity nowadays, you oftentimes don’t get the full contents you’re looking for straight out of the HTML you’re getting out of your HTTP request, depending on the site.

      • @[email protected]
        link
        fedilink
        English
        11 year ago

        In what way?

        HTML definitely provides more overhead than json if you only care about the data.

        • @[email protected]
          link
          fedilink
          English
          11 year ago

          Legally. OC stated that NewPipe doesn’t worry about legal threats because they scrape instead of using an official API.