Viewing a single comment thread. View all comments

resented_ape t1_iybprpj wrote

FWIW, I don't think that is what the paper demonstrates. The Picasso method the authors introduce uses a totally different cost function based on distance reconstruction. For a specific set of metrics the authors are interested in, they say that Picasso produces results comparable with UMAP and t-SNE. But it's not the UMAP or t-SNE objective.

With the scRNAseq dataset in the python notebook at the github page for picasso, I found that for the metrics one might usually be interested in with UMAP and t-SNE, e.g. neighborhood preservation (what proportion of the k-nearest neighbors in the input and output space are preserved) or Spearman rank correlation (or triplet ordering preservation) of input vs output distances, Picasso did (quite a bit) worse than UMAP and t-SNE.

This might not be relevant to for downstream scRNAseq workflows -- I will take the authors' word on that. At any rate, on my machine Picasso runs very slowly, and I found its own output to be visually unsatisfactory with the default settings with other datasets that I tried it with (e.g. MNIST), so I have been unable to generate a similar analysis for a wide range of datasets. So take that for what it's worth.

1