blablanonymous t1_iswe0za wrote on October 19, 2022 at 4:38 AM

Reply to comment by gravitas_shortage in [D] GPT-3 is a DREAM for citation-farmers - Threat Model Tuesday #1 by TiredOldCrow

Won’t you need a labeled training set to make that work?

stevewithaweave t1_isy1dji wrote on October 19, 2022 at 3:13 PM

I think you generate your own fake papers as the label. And mix it in with real papers

blablanonymous t1_isystb6 wrote on October 19, 2022 at 6:13 PM

But you wouldn’t you need to have a set of real papers you’re actually very confident they are real?

stevewithaweave t1_isyt7es wrote on October 19, 2022 at 6:15 PM

Anything before 2005 lol

blablanonymous t1_isytbad wrote on October 19, 2022 at 6:16 PM

🤣😂

the_mighty_skeetadon t1_iszprhg wrote on October 19, 2022 at 9:42 PM

That can't be the only method, because if your model for generating fake papers differs significantly from somebody else's model, you will be both unable to detect those fake papers and unable to detect that you're failing.

Better is to have fake papers rejected from journals labeled thusly and to synthetically generate more fake papers with a wide variety of known approaches.

stevewithaweave t1_iszusz3 wrote on October 19, 2022 at 10:17 PM

I think the original commenter was referring to an architecture similar to GANs. I agree that including examples of fake papers would improve the model but is not required