Viewing a single comment thread. View all comments

sothatsit t1_j5hhb31 wrote

I’ve actually done some work on this and the real issue here is that:

  1. You’d need a lot of text from other sources with people’s real names.
  2. You’d need the user to have written a lot of Reddit comments or posts.
  3. The style of user’s writing would need to match between Reddit and your other source.

If you’re interested though, I made the following library for my Master’s thesis, which can be used for this: https://github.com/TycheLibrary/Tyche

However, it would need more work to get close to identifying thousands, never mind millions, of users.

3