-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Problem
Article visualization colors cluster into only 2-3 color variants (purple/red and green/orange) instead of spreading across the full color spectrum.
Current Implementation
The compute_content_hue() function in embeddings/embeddings.py derives hue by:
- Computing the mean embedding vector for the article
- Hashing it with SHA-256
- Using
hash_int % 360to get a hue
This produces deterministic but poorly distributed colors when articles are semantically similar.
Proposed Solution
Use UMAP to reduce the mean embedding to 1D, then map that value to hue (0-360). This approach:
- Spreads articles along their primary semantic axis
- Similar articles get nearby (but distinct) colors
- Very different articles get contrasting colors
- Scales well as more articles are added
Implementation
Modify compute_content_hue() to:
- Use UMAP with
n_components=1on the mean embedding - Normalize the resulting value to 0-360 range
- Return as hue
This may require processing all articles together for optimal spread, or using a pre-fitted UMAP model.
Metadata
Metadata
Assignees
Labels
No labels