- cross-posted to:
- technology@lemmit.online
- news@lemmy.world
- cross-posted to:
- technology@lemmit.online
- news@lemmy.world
Reddit says Microsoft’s Bing, Anthropic, and Perplexity have scraped its data without permission. “It has been a real pain in the ass to block these companies.”
To commemorate Steve “Greedy Pigboy” Huffman’s assertiveness, I’ve made some memes. Enjoy.
deleted by creator
Scraping isn’t illegal, they can’t do anything
It’s also just indexing not some data harvesting too.
Anything legally.
Don’t look up the eBay stalking acandal
I stopped appending reddit to my search terms. I won’t go back to Google search; especially after the GamersNexus video featuring Wendell from Level1 Techs.
what search engine do you use?
DuckDuckGo lately. I know it’s powered by Bing but it seems to autonomous our searches as it gets our results from Bing. Also, it seems to get reddit results again as of last evening.
Brave Search, not because I like the company and their crypto/ai shenanigans but they have some nifty features and (supposedly) an independent index.
DDG
I don’t think the content on Reddit is their to sell…unless resistors are getting a cut. That site is a dumpster and needs to die already.
Well Reddit should just sue these companies and see if these companies are actually breaking any laws. Holding sizeable chunk of the internet hostage also sounds like something the EU and US might want to look in to as it very much sounds like anti-competitive conduct or market manipulation.
Also if these companies want to have greater ownership over the content generated by their users they should also be much more liable for the content posted to their sites. I mean when something like the Section 230 was written they probably did not take this in to account. If these companies want to start selling user generated content then they should simply lose the immunity from liability.
they should also be much more liable for the content posted to their sites.
why do people insist on making me defend reddit.
Reddit would lose badly that’s why they don’t sue. US’ 9th circuit ruled that scraping Linkedin is legal and Bing is not even scraping but indexing the data. Easiest case ever.
It’s almost impossible to block web scraping especially someone with Microsoft or Perplexity resources.
Its clearly an attempt to blackmail indexers into license deal as paying something to reddit could be actually cheaper than battling anti robots.
While I don’t disagree with the general idea, Section 230 would introduce an uncontrollable risk into running any website with user-generated content and would essentially shut them down.
If the site isn’t selling data, they wouldn’t lose 230 protection. So that would only be a risk for the companies selling their users’ data, not your regular forum or something.
That gets really murky though. For example:
- news sites w/ comment sections - they’re profiting from ads and subscriptions, so how much of that has to do with the comments?
- ecommerce - reviews on Amazon and eBay could be considered advertising for the product. Who’s liable, the ecommerce site, the merchant, or the poster?
- product websites - how much are posted “reviews” considered advertising for the product? There may not be direct sales on the website, but surely someone’s review would impact sales elsewhere
- for-profit services with a discussion forum - these would be on a separate site from the revenue-generating service, but still associated with the brand and thus likely contributing to advertisements for the product
It’s a lot more obvious for social media sites like Facebook since user-generated content is the service, but there are a lot of for-profit entities where user-generated content is highly relevant, but not the core service. Would those sites be essentially forced to either moderate or eliminate user interaction?
There’s a lot of complexity here.
Reddit only exists because of an open net and sharing content, noe they just suddenly determined that an open net is bad.
A common strategy, but it fucking sucks.
Fuck you reddit.
is there even anything of value on reddit?
Yes, absolutely. Any time I need to buy a product I don’t know much about, I look for an enthusiast community with a FAQ. Most of the active, high-quality communities are on Reddit.
I would like decentralized services to replace that, but that’s a slow process, if it happens at all.
An absolutely prodigious back catalog of high quality images, interviews, and explainers. A treasure trove of historical content that’s been heavily indexed and participant-weighted for relevancy. And the bulk of it predates the infestation of AI, so its valuable just as sampling data of original human content for further iterative development of ChatGPT and other LLMs.
I don’t know about the AI part. The major companies had plenty of time scraping everything on the internet, or am I simplyifing the effort too much in my head?
Reddit remains as valuable as ever. It’s amusing that you think it imploded a year ago just because a small number of users migrated here
It sort of did, thousands of useful comments were turned to gibberish, the mobile web site turned to shot, and the mobile app stopped properly working for communitys with specific content warnings.
Completely. Lemmy is far too small to have the value Reddit does.
I left Reddit due to their API bullshit, but I so miss all of the hobby communities I was part of, that has like-minded members, and a plethora of resources. It’s not easy to impossible to start communities such as reeftanks, homesteading, literature, bookcirclejerk, etc. on a platform as small as Lemmy. And beyond starting one, the quality and quantity will never match Reddit’s because Lemmy just doesn’t have the same reach.
Lemmy is great if you like Linux, like Star Trek, or are trans, but other than that, it’s missing so, so many demographics that make a wholistic platform.
trans
I feel this so hard, the sheer number of openly LGBTQ+ people here really skews the demographics of the site. I’m not saying it’s a problem, just saying that LGBTQ+ people are dramatically over-represented here. It’s an interesting contributor to lemmy culture, and I wonder how much that impacts homogeneity here (e.g. upvotes and downvotes for certain types of content).
But yeah, it’s missing a lot of demographics.
That said, I’m really into Linux (been using for >15 years), so that’s cool I guess.
As a cis straight man I’m taking this as a learning opportunity until the demographics level out. An inherently inclusive bias will be more helpful early on than more niche communities anyways.
Sure. Again, I’m not saying it’s bad, just that the bias seems to exist.
There are certainly worse biases that exist, such as very little representation from people on the right side of the spectrum, so hate against half the population seems to get a pass and downvotes silence constructive comments/posts just due to political bias. That’s incredibly frustrating, and I think the high focus on supporting LGBTQ+ people goes along with that (i.e. the message that conservatives “hate” LGBTQ+ people, which is only true for the more extreme end of conservatism).
That said, I do like the support LGBTQ+ people get, I just wish the demographics were a bit more diverse without sacrificing the culture. I live and work in a conservative area, but my company has built a pretty inclusive culture (at least for the area), so I think it’s totally possible.
Oh man I don’t miss that at all. Moderating out a pervasive delusion isn’t bias, any more than we’re biased in favor of a round Earth. On Reddit there were constant “enlightened centrists” who kept making appeals to moderation.
There’s nothing of value to be gained from conservatives. The “good” ones who don’t say the homophobia out loud are still voting for politicians who do. If it was just the extreme end, then Trump wouldn’t be their nominee. Hate is their normal now.
“If there’s a Nazi at the table and 10 other people sitting there talking to him, then you got a table with 11 Nazis.”
This is exactly what I’m talking about: casually dismissing half of the population based on little more than association. That drives division and pushes people into echo chambers.
Understood. I am disagreeing with you. If that wasn’t obvious, then I fear you may have missed my point.
Half of America supporting fascism is reason to create somewhere - anywhere - where that shit is shut down. You’re free to go associate with freeze peach Nazis on X, Facebook, Nostr, wherever. I don’t want any part of that and prefer a server that moderates them out. Paradox of tolerance and all that.
If you all believed the Earth was flat, then I would prefer the “echo chamber” of people saying “no, we checked, it’s round”. There simply being a lot of believers doesn’t imply an idea has merit, and we don’t have infinity time for BS.
A lot of older posts are still relevant to specific hobbies. I will look up information on paper, some guitar information, but most posts from the last two years are not worth looking at.
There is also so much regurgitated LLM shit.
Reddit CEO can shove reddit up his ass sideways. The whole thing.
He can put his dick in /dontputyourdickinthat
Aside: I give Lemmy serious props for not reproducing some of these communities btw.
and spez will pay you for creating content.
/s
U/spez the former moderator of r/jailbait? Who might have connections to Ghilisaine Maxwell? Him?
Oh and PS - posting OC porn on Reddit is a very weird process and not transparent
It would be interesting if any large companies got behind promoting and endorsing federated media to get around this sort of situation.
I suspect paying money is easier. And they probally assume their need to do so will be temporary. AGI will fix everything, right? Feel it. Feel the AGI.
Just what we need, more walled gardens and exclusivity deals. And of course, another way of monetizing your data, because we don’t have enough of that already.
Search results are already fucked enough as it is. We don’t need to start carving up the internet and and dividing it among different search engines.
We shouldn’t accept this behavior or other companies will follow!
I’ve said once and I’ll say it again. Either the information on your site is free to all or to none. You can’t have some people/entities pay and some not!
You can. We didn’t need to like it but they can. Besides, isn’t that how many magazines work? Pay for articles and such
Not really, the people who write the articles are actually employed by those magazine companies, and everyone who wants to get one, needs to pay for one.
Does LLMmy have a robots.txt against scrapers?
What’s “reddit”?
I think it is when you use the toilet - it is what you flush away.