Viewing a single comment thread. View all comments

Main_Mathematician77 t1_j8h1by4 wrote

Imo You’re not going to be able to provide a reliable service currently with out of the box solutions. The systems aren’t reliable enough to be certain especially when it can lead to false positives that can falsely defame someone

3

ateqio OP t1_j8h1po3 wrote

I'm totally aware of that and I will be putting a disclaimer in front page, not buried in a Terms and Conditions link somewhere.

The tools currently available can ruin a student's life by not explicitly mentioning it.

I want to address that issue by providing a solution that comes at top of the search and informing professors about limitations as explicitly as possible

3

Main_Mathematician77 t1_j8h3v8z wrote

The best thing I can thing of that relates to this is based off LAIONs style attribution knn index search for their 5B image dataset. A similar approach could be done for text - search over text for similar samples. But again no guarantee however it’s fairly interpretable. the dataset of generations from chatgpt for 100M users is growing fast and searching over it is most likely improbable at the current pricing options . Also, As you said using gpt2 to measure perplexity is good for catching gpt generated text, but it’s not a perfect solution imo

1