in-your-own-words t1_irmnurg wrote on October 9, 2022 at 12:59 PM

Reply to comment by redditnit21 in [D] CSV File to training and testing split by redditnit21

Yes, there are dozens of ways of doing it. I encourage you to figure it out yourself. If you can't design and implement your own test & evaluation experiments for ML, you will end up doing the world more harm than good by dabbling in it. The entire ML field suffers from extremely weak T&E, and lots of people just learning to stuff inputs into functions.

Some hints:

There may be functions within standard machine learning software libraries that produce train/test splits given tabular data input.
There may be functions that will produce a random permutation of rows of a table.
There may be functions that produce random permutations of numbers from 0 to N, where you specify N. If N is the number of rows in your table, you could create a new column of these random numbers and then sort the table on that column.
You may want to consider class imbalance in your dataset. If this is the case, apply your train/test split independently to class 1 and class 0 such that your resulting split contains the same proportion of 1 and 0 in both train and test partitions.
Consider using an outer crossvalidation approach, where you do your experiment for k different train/test splits. When you report your metrics, look at the distribution of each metric over k experiments. Report the median, interquartile range, 5th and 95th percentiles, and outliers for each metric over k experiment trials.
version control your code and tag the commit that produces the results you report. Include this tag or the commit hash with your reporting of results.