feat: implement BUD-12 identical media deduplication#71
Merged
Conversation
Adds perceptual hash (pHash) based deduplication for image uploads per the BUD-12 spec (hzrd149/blossom#96). Backend: - Compute pHash synchronously inside fs.put for every image upload; store the result on NewFileResult and propagate to FileUpload - Insert the phash row inside the add_file transaction alongside the uploads row, satisfying the FK constraint on upload_phash - On PUT /upload and PUT /media, query find_similar_images using the already-computed hash; return 409 Conflict with X-Identical-Media and X-Reason headers when a match is found within the configured Hamming distance - New settings: identical_media_dedup (bool) and identical_media_dedup_distance (u32, default 0) Frontend: - IdenticalMediaError class in blossom.ts captures the existing sha256 from X-Identical-Media on 409 responses - Upload view shows a side-by-side comparison panel of the user's upload vs the existing server blob at full size (max-h-96) - Mirror button calls PUT /mirror to register the existing blob to the user's account, then dismisses the panel - Config editor: bool toggle for identical_media_dedup and integer field for identical_media_dedup_distance with min/max bounds
There was a problem hiding this comment.
Pull request overview
Implements BUD-12 “identical media deduplication” end-to-end: the backend computes/stores image pHash and rejects perceptually-identical uploads with a 409 + X-Identical-Media, and the UI surfaces the conflict with a compare/mirror flow plus config toggles.
Changes:
- Backend: compute pHash during upload, persist it transactionally, and add a 409 response path when identical media is detected.
- Frontend: add
IdenticalMediaErrorparsing for 409s and show a side-by-side comparison panel with a “Mirror to my account” action. - Admin UI/config: add settings for enabling identical-media dedup and bounding the pHash distance.
Reviewed changes
Copilot reviewed 8 out of 10 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| ui_src/src/views/upload.tsx | Handles 409 identical-media conflicts and renders compare/mirror panel. |
| ui_src/src/upload/blossom.ts | Adds IdenticalMediaError and centralized upload response handling for 409s. |
| ui_src/src/components/config-editor.tsx | Adds integer field support + new known config entries for dedup settings. |
| src/settings.rs | Introduces settings for identical-media dedup enablement and distance. |
| src/routes/blossom.rs | Adds 409 response variant + dedup check during upload stream processing. |
| src/routes/admin.rs | Updates tests/fixtures for new settings/upload fields under feature flags. |
| src/filesystem.rs | Computes pHash during fs.put and carries it in NewFileResult. |
| src/db.rs | Adds phash to FileUpload and stores upload_phash bands in the upload transaction. |
| src/bin/r96util.rs | Updates util to populate new phash field under feature flags. |
| Cargo.lock | Bumps route96 dependency version to 0.6.0. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Remove dead mirrored? state field and unreachable success branch from the identical media panel; panel is dismissed on mirror success - Use length-checked try_into() for phash bytes in fs.put; log a warning and leave phash as None rather than silently storing an all-zero hash on unexpected byte length - Reuse #handleUploadResponse in mirror() so 409 X-Identical-Media responses from PUT /mirror are also surfaced as IdenticalMediaError - Revoke the previous localUrl object URL via functional state update when a new upload starts, preventing object URL leaks
…est header) Per the updated spec, clients can echo back X-Identical-Media: <sha256> in a subsequent upload request to signal they are intentionally uploading a distinct copy. The server skips identical-media detection in that case. Backend: - BlossomAuth now extracts X-Identical-Media request header (hex-decoded to bytes) and passes it through process_upload -> process_stream - The BUD-12 dedup check is skipped when acknowledged_identical is Some Frontend: - upload() and media() accept an optional acknowledgedSha256 param and send it as an X-Identical-Media request header - The identical media panel gains an Upload anyway button that retries the original upload with the acknowledged sha256, then dismisses
…from nip96 - Replace all manual tokio::fs::remove_file calls in blossom.rs and nip96.rs with state.fs.delete, which centralises file removal through the filesystem layer - Add FileStore::delete helper that maps an id to its storage path and removes it - Remove the stale fire-and-forget phash spawn_blocking block from the nip96 upload handler; phash is now always computed synchronously inside fs.put and stored in add_file's transaction
Servers MAY choose whether to honour the client X-Identical-Media acknowledgement header. When true (default), clients can bypass deduplication by echoing back the sha256 from a prior 409. When false, the server always enforces deduplication regardless of client intent.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements the BUD-12 identical media deduplication spec: hzrd149/blossom#96
When enabled, image uploads are checked against existing blobs using perceptual hashing (pHash). If a visually identical image is already stored, the server returns
409 ConflictwithX-Identical-Media: <sha256>pointing to the existing blob.Backend changes
fs.put: pHash is computed synchronously for every image upload and stored onNewFileResult— always, not just when dedup is enabled, so the DB is populated regardlessdb.add_file: theupload_phashrow is inserted inside the same transaction as theuploadsrow, satisfying the FK constraintprocess_stream: readsblob.phash(already computed) and callsfind_similar_images; returns409 ConflictwithX-Identical-Media+X-Reasonheaders on a matchidentical_media_dedup— enable/disable the feature (requiresmedia-compressionfeature flag)identical_media_dedup_distance— max pHash Hamming distance to consider images identical (default0; recommended0–3)Frontend changes
blossom.ts:IdenticalMediaErrorclass captures the existing sha256 from theX-Identical-Mediaheader on 409 responsesPUT /mirrorto register the existing blob to the user's account, then dismisses the panelidentical_media_dedupand a bounded integer field foridentical_media_dedup_distance