Submitted by Simusid t3_11okrni in MachineLearning
Simusid OP t1_jbt91tb wrote
Reply to comment by ID4gotten in [Discussion] Compare OpenAI and SentenceTransformer Sentence Embeddings by Simusid
My main goal was to just visualize the embeddings to see if they are grossly different. They are not. That is just a qualitative view. My second goal was to use the embeddings with a trivial supervised classifier. The dataset is labeled with four labels. So I made a generic network to see if there was any consistency in the training. And regardless of hyperparameters, the OpenAI embeddings seemed to always outperform the SentenceTransformer embeddings, slightly but consistency.
This was not meant to be rigorous. I did this to get a general feel of the quality of the embeddings, plus to get a little experience with the OpenAI API.
quitenominal t1_jbtr6g7 wrote
fwiw this has also been my finding when comparing these two embeddings for classification tasks. Better, but not enough to justify the cost
polandtown t1_jbu2zqe wrote
Learning here, but how are you axes defined? Some kind of factor(s) or component(s) extracted from each individual embedding? Thanks for the visualization, as it made me curious and interested! Good work!
Simusid OP t1_jbu3q8m wrote
Here is some explanation about UMAP axes and why they should usually be ignored: https://stats.stackexchange.com/questions/527235/how-to-interpret-axis-of-umap
Basically it's because they are nonlinear.
onkus t1_jbwftny wrote
Doesn’t this also make it essentially impossible to compare the two figures you’ve shown?
Thog78 t1_jbyh4w1 wrote
What you're looking for when comparing UMAPs is if the local relationships are the same. Try to recognize clusters and see their neighbors, or whether they are distinct or not. A much finer colored clustering based on another reduction (typically PCA) helps with that. Without clustering, you can only try to recognize landmarks from their size and shape.
[deleted] t1_jbyaq18 wrote
[deleted]
polandtown t1_jbu56lb wrote
Thanks!
[deleted] t1_jbtcsig wrote
[deleted]
Viewing a single comment thread. View all comments