Viewing a single comment thread. View all comments

tyler1128 t1_jc1vmhb wrote

I do wonder the legality of offering a side market for buying twitter data scraped from what the website freely gives you. I'm sure those "hacker" forums still sell sock proxy lists on the daily. That plus beautiful soup and not being stupid in how you do it should be both a weekend prototype level project, and pretty cheap. It's been a while since I've done something like that, but socks proxies are a dime a dozen more or less. Now, you are probably utilizing hacked servers, but you aren't hacking them so pleading ignorance would probably do just fine. Plus, Twitter is hardly capable of keeping running now, not sure their scraper detection is exactly "state of the art".

6

FamousSuccess t1_jc27bzt wrote

I'm not sure if the data will be sold, rather than just tools to gather it.

Even still, from what I've seen in the past not much stands in the way of "ownership" of tweets/FB posts/Social media. It tends to fall in the public IP territory

6

tyler1128 t1_jc27nqp wrote

I'm personally thinking about writing a service to sell the data at something like 1/10,000th the cost twitter is charging or less. It'd cache most of the tweet data in LRU form up to a specific data limit in a central database, and dynamically grab new data in the case it isn't already there. There's also be a constantly running scraper for new data to throw it in the central DB cache. Only think stopping me is understanding the legal ramifications. On-demand access to historical data is too slow for large cohorts.

3

FamousSuccess t1_jc2ry05 wrote

Well. Keep in mind that google effectively sells advertising based on user data, and their services/users depend entirely on content and data of non google entities.

So I’d say if google can build a business on other entities public data, so can you.

Not a perfect parallel but a parallel nonetheless

1

dubiousadvocate t1_jc2jom8 wrote

I don’t think legality enters into it. At worst it’s a EULA violation. Like any public facing website. Grounds for banning the account but these would be throw away accounts to begin. Musk would whine about it but he’d probably also embrace the artificial user numbers at the same time.

One thing we’ve all learned about the man during this debacle is he’s self destructively impulsive and undisciplined.

2

Mr_ToDo t1_jc2lulp wrote

Well it doesn't use the API, and assuming that it doesn't use a login then it's probably not bound by the EULA since it would all be public data with no agreement to see it.

Could be a bit of fun if it removes the login prompt, but it's pretty random normally and if there isn't an actual hard limit to what you can load then removing it is likely just a technicality at best(It seems more concerned about how long I stare at old tweets then how far down I scroll. I know sometimes I've gone years down if I don't stop scrolling)

2

haux_haux t1_jc3q8f1 wrote

Didn't LinkedIn sue an organisation for scraping a while back. Did that fly?

1

bobartig t1_jc42cxs wrote

There’s been a lot of misreporting regarding the recent HiQ v. LinkedIn case from the 9th Circuit. The best write up I've encountered is by an Internet and Web Scraping attorney, Kieran McCarthy

The key takeaway is that in the 9th Circuit (which has the most developed law in this area) web scraping a publicly available website doesn’t necessarily constitute a CFAA violation, but that doesn’t mean what you did was either legal, or that you won’t face legal liability.

3

dubiousadvocate t1_jc3uc8z wrote

I haven't heard about that. I'm curious too.

Of course anyone can file a SLAP lawsuit and hope to intimidate legal behavior through financial burden.

1