@[email protected] to Lemmy [email protected] • 2 years agoChad scrapersh.itjust.worksimagemessage-square99fedilinkarrow-up11.05K
arrow-up11.05KimageChad scrapersh.itjust.works@[email protected] to Lemmy [email protected] • 2 years agomessage-square99fedilink
minus-square@[email protected]linkfedilink163•2 years agoEveryone loves the idea of scraping, no one likes maintaining scrapers that break once a week because the CSS or HTML changed.
minus-square@[email protected]linkfedilink21•2 years agoThis one. One of the best motivators. Sense of satisfaction when you get it working and you feel unstoppable (until the next subtle changes happens anyway)
minus-square@[email protected]linkfedilink28•2 years agoI loved scraping until my ip was blocked for botting lol. I know there’s ways around it it’s just work though
minus-squarePennomilinkfedilinkEnglish40•2 years agoI successfully scraped millions of Amazon product listings simply by routing through TOR and cycling the exit node every 10 seconds.
minus-squareferretlinkfedilinkEnglish5•2 years agolmao, yeah, get all the exit nodes banned from amazon.
minus-squarePennomilinkfedilinkEnglish12•2 years agoThat’s the neat thing, it wouldn’t because traffic only spikes for 10s on any particular node. It perfectly blends into the background noise.
minus-square@[email protected]cakelinkfedilinkEnglish3•2 years agoQueue Office Space style error and scrape for 10 hours on each node.
minus-square@[email protected]linkfedilink7•2 years agoI’m coding baby’s first bot over here lol, I could probably do better
minus-square@[email protected]linkfedilink11•2 years agoOr in the case of wikipedia, every table on successive pages for sequential data is formatted differently.
Everyone loves the idea of scraping, no one likes maintaining scrapers that break once a week because the CSS or HTML changed.
spite can be a great motivator, though
This one. One of the best motivators. Sense of satisfaction when you get it working and you feel unstoppable (until the next subtle changes happens anyway)
I feel this
I loved scraping until my ip was blocked for botting lol. I know there’s ways around it it’s just work though
I successfully scraped millions of Amazon product listings simply by routing through TOR and cycling the exit node every 10 seconds.
That’s a good idea right there, I like that
This guy scrapes
lmao, yeah, get all the exit nodes banned from amazon.
That’s the neat thing, it wouldn’t because traffic only spikes for 10s on any particular node. It perfectly blends into the background noise.
Queue Office Space style error and scrape for 10 hours on each node.
You guys use IP’s?
I’m coding baby’s first bot over here lol, I could probably do better
Token ring for me baybeee
Just use AI to make changes ¯_(ツ)_/¯
Here take these: \\
¯_(ツ)_/¯\\ Thanks
Or in the case of wikipedia, every table on successive pages for sequential data is formatted differently.