Viewing a single comment thread. View all comments

niclas_wue OP t1_j4fqqy6 wrote

Thanks for asking! My first prototype collected all new arxiv papers in certain ML-related categories via the API, however I quickly realized that this would be way to costly. Right now, I collect all papers from PapersWithCode's "Top" (last 30 days) and the "Social" Tab, which is based on Twitter likes and retweets. Finally, I filter using this formula:

p.number_of_likes + p.number_of_retweets > 20 or p.number_github_stars > 100

In rare cases, when the paper is really long or not parsable with "grobid", I will exclude the paper for now.

10