Skip to content

feat: implement BUD-12 identical media deduplication#71

Merged
v0l merged 5 commits into
mainfrom
feat/bud-12-identical-media
Mar 13, 2026
Merged

feat: implement BUD-12 identical media deduplication#71
v0l merged 5 commits into
mainfrom
feat/bud-12-identical-media

Conversation

@v0l
Copy link
Copy Markdown
Owner

@v0l v0l commented Mar 13, 2026

Summary

Implements the BUD-12 identical media deduplication spec: hzrd149/blossom#96

When enabled, image uploads are checked against existing blobs using perceptual hashing (pHash). If a visually identical image is already stored, the server returns 409 Conflict with X-Identical-Media: <sha256> pointing to the existing blob.

Backend changes

  • fs.put: pHash is computed synchronously for every image upload and stored on NewFileResult — always, not just when dedup is enabled, so the DB is populated regardless
  • db.add_file: the upload_phash row is inserted inside the same transaction as the uploads row, satisfying the FK constraint
  • process_stream: reads blob.phash (already computed) and calls find_similar_images; returns 409 Conflict with X-Identical-Media + X-Reason headers on a match
  • New config settings:
    • identical_media_dedup — enable/disable the feature (requires media-compression feature flag)
    • identical_media_dedup_distance — max pHash Hamming distance to consider images identical (default 0; recommended 03)

Frontend changes

  • blossom.ts: IdenticalMediaError class captures the existing sha256 from the X-Identical-Media header on 409 responses
  • Upload view: side-by-side comparison panel shown when a 409 is received — displays the user's uploaded image alongside the existing server blob at full size so the user can verify they match
  • Mirror button: calls PUT /mirror to register the existing blob to the user's account, then dismisses the panel
  • Config editor: toggle for identical_media_dedup and a bounded integer field for identical_media_dedup_distance

Adds perceptual hash (pHash) based deduplication for image uploads per
the BUD-12 spec (hzrd149/blossom#96).

Backend:
- Compute pHash synchronously inside fs.put for every image upload;
  store the result on NewFileResult and propagate to FileUpload
- Insert the phash row inside the add_file transaction alongside the
  uploads row, satisfying the FK constraint on upload_phash
- On PUT /upload and PUT /media, query find_similar_images using the
  already-computed hash; return 409 Conflict with X-Identical-Media
  and X-Reason headers when a match is found within the configured
  Hamming distance
- New settings: identical_media_dedup (bool) and
  identical_media_dedup_distance (u32, default 0)

Frontend:
- IdenticalMediaError class in blossom.ts captures the existing sha256
  from X-Identical-Media on 409 responses
- Upload view shows a side-by-side comparison panel of the user's
  upload vs the existing server blob at full size (max-h-96)
- Mirror button calls PUT /mirror to register the existing blob to the
  user's account, then dismisses the panel
- Config editor: bool toggle for identical_media_dedup and integer
  field for identical_media_dedup_distance with min/max bounds
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements BUD-12 “identical media deduplication” end-to-end: the backend computes/stores image pHash and rejects perceptually-identical uploads with a 409 + X-Identical-Media, and the UI surfaces the conflict with a compare/mirror flow plus config toggles.

Changes:

  • Backend: compute pHash during upload, persist it transactionally, and add a 409 response path when identical media is detected.
  • Frontend: add IdenticalMediaError parsing for 409s and show a side-by-side comparison panel with a “Mirror to my account” action.
  • Admin UI/config: add settings for enabling identical-media dedup and bounding the pHash distance.

Reviewed changes

Copilot reviewed 8 out of 10 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
ui_src/src/views/upload.tsx Handles 409 identical-media conflicts and renders compare/mirror panel.
ui_src/src/upload/blossom.ts Adds IdenticalMediaError and centralized upload response handling for 409s.
ui_src/src/components/config-editor.tsx Adds integer field support + new known config entries for dedup settings.
src/settings.rs Introduces settings for identical-media dedup enablement and distance.
src/routes/blossom.rs Adds 409 response variant + dedup check during upload stream processing.
src/routes/admin.rs Updates tests/fixtures for new settings/upload fields under feature flags.
src/filesystem.rs Computes pHash during fs.put and carries it in NewFileResult.
src/db.rs Adds phash to FileUpload and stores upload_phash bands in the upload transaction.
src/bin/r96util.rs Updates util to populate new phash field under feature flags.
Cargo.lock Bumps route96 dependency version to 0.6.0.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ui_src/src/views/upload.tsx Outdated
Comment thread src/filesystem.rs Outdated
Comment thread ui_src/src/upload/blossom.ts
Comment thread ui_src/src/views/upload.tsx Outdated
v0l added 4 commits March 13, 2026 10:38
- Remove dead mirrored? state field and unreachable success branch from
  the identical media panel; panel is dismissed on mirror success
- Use length-checked try_into() for phash bytes in fs.put; log a
  warning and leave phash as None rather than silently storing an
  all-zero hash on unexpected byte length
- Reuse #handleUploadResponse in mirror() so 409 X-Identical-Media
  responses from PUT /mirror are also surfaced as IdenticalMediaError
- Revoke the previous localUrl object URL via functional state update
  when a new upload starts, preventing object URL leaks
…est header)

Per the updated spec, clients can echo back X-Identical-Media: <sha256>
in a subsequent upload request to signal they are intentionally uploading
a distinct copy. The server skips identical-media detection in that case.

Backend:
- BlossomAuth now extracts X-Identical-Media request header (hex-decoded
  to bytes) and passes it through process_upload -> process_stream
- The BUD-12 dedup check is skipped when acknowledged_identical is Some

Frontend:
- upload() and media() accept an optional acknowledgedSha256 param and
  send it as an X-Identical-Media request header
- The identical media panel gains an Upload anyway button that retries
  the original upload with the acknowledged sha256, then dismisses
…from nip96

- Replace all manual tokio::fs::remove_file calls in blossom.rs and
  nip96.rs with state.fs.delete, which centralises file removal through
  the filesystem layer
- Add FileStore::delete helper that maps an id to its storage path and
  removes it
- Remove the stale fire-and-forget phash spawn_blocking block from the
  nip96 upload handler; phash is now always computed synchronously
  inside fs.put and stored in add_file's transaction
Servers MAY choose whether to honour the client X-Identical-Media
acknowledgement header. When true (default), clients can bypass
deduplication by echoing back the sha256 from a prior 409. When false,
the server always enforces deduplication regardless of client intent.
@v0l v0l merged commit 7bce1a2 into main Mar 13, 2026
1 check passed
@v0l v0l deleted the feat/bud-12-identical-media branch March 13, 2026 10:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants