Conversation
add scatterplot to display cosine distance between points, click on one point to display corresponding text+distance
SQUASH: functional code
Display only the relevant information nuggets on the scatter plot/bar chart for the currently selected document, without accumulating data from previously viewed documents (irrelevant information nuggets)
adjust plot layout to prevent annotation boxes from getting out of the plot window in fullscreen mode
change colormap of scatterplot to represent distances (same color for same distance)
… display best guesses from other documents
…embedding in grid
There was a problem hiding this comment.
Pull Request Overview
This PR introduces comprehensive visualization capabilities to WannaDB, rebasing PR #6 with updated dependencies. The visualizations provide deeper insights into the system's mechanisms by displaying dimension-reduced embeddings, data insights, and interactive charts to help users understand how the matching process works.
Key changes include:
- Implementation of 3D grids for visualizing dimension-reduced embeddings
- Data insights section showing effects of user feedback
- Bar charts displaying cosine similarity values for nuggets
- User interaction tracking and accessibility features
Reviewed Changes
Copilot reviewed 23 out of 30 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| wannadb_ui/wannadb_api.py | Adds PCA dimension reduction to preprocessing pipeline |
| wannadb_ui/visualizations.py | New comprehensive visualization module with 3D grids, bar charts, and scatter plots |
| wannadb_ui/study.py | New tracking system for user interaction monitoring |
| wannadb_ui/main_window.py | Integrates visualization controls and information popups into main UI |
| wannadb_ui/interactive_matching.py | Updates UI components to support visualization features |
| wannadb_ui/data_insights.py | New data insights area showing feedback effects |
| wannadb_ui/common.py | Adds visualization-related enums, classes, and information popup dialogs |
| wannadb/utils.py | New utility functions for duplicate detection and accessible colors |
| wannadb/preprocessing/dimension_reduction.py | New PCA and t-SNE dimension reduction implementations |
| wannadb/preprocessing/other_processing.py | Adds duplicate nugget cleaning functionality |
| wannadb/matching/matching.py | Enhanced matching with change tracking and visualization support |
| wannadb/data/signals.py | New signals for dimension-reduced embeddings and current threshold |
| wannadb/data/data.py | Adds duplicate detection and confirmed matches tracking |
| wannadb/change_captor.py | New change tracking system for user feedback effects |
…, interactive matching, main window, and visualizations (Copilot PR Review)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Rebased #6 and updated dependencies.
We integrated several visualizations allowing for deeper insights into the system's mechanisms.
These visualizations cover:
3D Grid visualizing dimension-reduced embeddings of all best guesses of all documents (NuggetListWidget)
Data Insights section displaying the effects of the user's latest feedback (NuggetListWidget)
3D-Grid visualizing dimension-reduced embeddings of all nuggets of the currently opened document (DocumentWidget)
Bar Chart displaying the cosine similarity of all nuggets of the currently opened document (DocumentWidget)