How to choose best parameters for ensuring local coherent structures with maximum spread #703
Unanswered
PranayMehta
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I am working on survey text data and I want to implement clustering to group survey responses into multiple topics/clusters. For this, I use BERT model and transform each record into 768D array. Since I want to apply dimensionality reduction , I am trying out UMAP to see if 2D plan retains the structure.
Sample data points would look like
sentences = ['I am worried about the weather tomorrow' , ' I have a lot of homework', 'I am yet to take a covid19 vaccine']In general, I have found that this library and algorithm is super helpful in helping me identify regions in 2d plane. As an example, there are a bunch of responses that talk about topic A and others about topic B and C and so on. My task is to keep these topics as isolated as possible.
However, I am a little confused with the number of parameters that are available in the UMAP method. @lmcinnes It would be super helpful if I can tune the UMAP algorithm to do so. As of now, I tried with several values of parameters for
What I do not understand at the moment is
spreadparameter work ? What is the range that this param can take?metricvalue passed?Here are some of the plots that I got by varying the parameters
There are clusters that are forming well but then there is 1 big cluster that kind of gets clumped in the middle . any help is appreciated :-)
Beta Was this translation helpful? Give feedback.
All reactions