Viewing a single comment thread. View all comments

TheLionKing2020 t1_iu1bcw8 wrote

Well, you don't need to train on all of these data

First take samples of 10k, 50k and 100k and see if you have different results. Do you get different number of clusters?

3

jesusfbes OP t1_iu3915z wrote

That was an initial idea, probably it is what I would do. However, it is good to now about efficient approaches

2

TheLionKing2020 t1_iu3g14t wrote

Also before going to make tests over 100k of samples check if you can lower the dimensions: feature selection, low variance, PCA, etc.

1