Replies: 1 comment 1 reply
-
|
Hi @carobs9! 🫡 Great question — UMAP + HDBSCAN on millions of samples is incredibly powerful, but also sensitive to parameter tuning and workflow design. Here's what might be happeningYou're reducing from 384 → 15 → 2 dimensions. But even with 15D, you're seeing only one blob, this usually points to inadequate preservation of local structure or HDBSCAN seeing noise instead of real clusters. Let’s go step-by-step to unlock nuanced clusters: 🔍 Step 1: Understand what HDBSCAN "sees"HDBSCAN relies on:
🛠️ Step 2: Fix the embedding pipelineTry this: Then, apply HDBSCAN on X_15d, not on the 2D: Why not 2D? Because: Let me know if this solves your issue or if anything remains unclear, happy to help refine it further. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
I am trying to fit GPU-supported UMAP on a 3.8 million samples dataset to later apply HDBSCAN on the reduced embeddings, which have 15 dimensions. Last, I reduce these 15 dimensions embeddings to two dimensions for visualization.
Unfortunately, the results are not as expected. Instead of obtaining many different clusters, it seems like I get one big cluster that does not contain any structural information from my original 384-dim embeddings. I have tried to tweak parameters like n_epochs, n_neighbors or min_dist, but I still get one big cluster. I have also tried to reduce the initial embeddings to 10 or 5 dimensions instead of 15.
Are there any tweaks that can be done to get more nuanced clusters?
Here are my specifications:
Thanks in advance!
Beta Was this translation helpful? Give feedback.
All reactions