sayoonarachu t1_j0zlw4i wrote
Reply to comment by macORnvidia in laptop for Data Science and Scientific Computing: proart vs legion 7i vs thinkpad p16/p1-gen5 by macORnvidia
No. I was just using pandas (cpu) for simple quick regex and removing and replacing text rows. It was just for a hobby project. The data was scraped from Midjourney and Stable diffusion discord so there were millions of rows of duplicate prompts and poor quality prompts which I had pandas delete and in the end the number of unique rows with more than 50 characters amounted to about 700k which was then used to train gpt-neo 125m.
I didn't know about cudf. Thanks 😅
Viewing a single comment thread. View all comments