Viewing a single comment thread. View all comments

Loquzofaricoalaphar OP t1_j5h59id wrote

So like if you fed it 200 peoples samples you were looking and then fed it Reddit? Perhaps all of Reddit would be tricky because some might not have public text and it would be difficult to label all the text on Facebook or link-en, etc.

2

PredictorX1 t1_j5h5pb5 wrote

The biggest technical challenges I see:

  1. Having enough reference samples from known people
  2. The difference how people write on Reddit and how they write elsewhere (professional articles, e-mail, etc.: presumably used as reference)
  3. If too many Reddit users are being considered, it may all dissolve into mush (estimated probabilities would all be low)
3

Loquzofaricoalaphar OP t1_j5h6s4z wrote

That is interesting to think about. I’m biased to think text patterns have lots of variables and are fairly unique. Perhaps it’s more of a model than compute problem to analyze it at scale and not get mush.

1