Skip to content

PerthImplicitWatermarker Limits Maximum Frequency to ~16.812 kHz, Altering High Frequencies #8

@mathkartel

Description

@mathkartel

When using PerthImplicitWatermarker with run_name="implicit", the watermarking process significantly alters high frequencies (> 16 kHz), limiting the maximum frequency to approximately 16.812 kHz, while the original audio contains frequencies up to ~21.878 kHz. This results in a loss of spectral content, which is critical for applications requiring perceptually identical audio.

To Reproduce

Steps to reproduce the behavior:

  1. Use PerthImplicitWatermarker in a Python application on Windows 10/11.
  2. Load a stereo WAV file (44.1 kHz, PCM_16, ~1411 kbps) with librosa.load (dtype=float32).
  3. Apply watermark using watermarker.apply_watermark(audio, sample_rate=44100).
  4. Compare the frequency spectrum of the original and watermarked audio using librosa.stft.
  5. Observe that frequencies above ~16.812 kHz are significantly reduced (e.g., spectral energy drops from 0.2815 to 0.0178, diff_ratio=0.9369).

Expected behavior

The watermarked audio should preserve the full frequency spectrum of the original (up to ~21.878 kHz for 44.1 kHz audio) to ensure perceptual identity, with minimal or no alteration to high frequencies.

Screenshots/Logs

Example log from my application:

INFO:main:Original audio loaded: shape=(2, 13793343), sample_rate=44100, channels=stereo, dtype=float32
INFO:main:No metadata found in original file
INFO:main:Detected WAV subtype: PCM_16
loaded PerthNet (Implicit) at step 250,000
INFO:main:Watermarker initialized
WARNING:main:Watermark length mismatch in channel 0: marked=13793157, original=13793343, padding to match
WARNING:main:Watermark length mismatch in channel 1: marked=13793157, original=13793343, padding to match
INFO:main:Watermark applied for Matheus
INFO:main:Metrics for Matheus: SNR=15.93dB, MSE=0.003981
WARNING:main:Moderate SNR for Matheus: 15.93dB
INFO:main:Confidence for Matheus: 0.9856
INFO:main:Marked WAV saved: Uploads\81c5aea4-ba44-48ed-9d22-27034b91492b\marked\Holy Priest - Agilator (Original Mix).wav
INFO:main:Database committed

Environment

  • OS: Windows 10/11
  • Python Version: 3.x
  • Library: resemble-perth (latest version)
  • Dependencies: librosa, numpy, soundfile, pydub, mutagen
  • Application: FastAPI-based audio watermarking server

Request

Could you provide guidance on configuring PerthImplicitWatermarker to preserve high frequencies (> 16 kHz) or suggest an alternative model/run_name that avoids altering the spectrum? If this is a limitation of the implicit model, is there a workaround or another library recommendation (e.g., audiowmark)?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions