-
Notifications
You must be signed in to change notification settings - Fork 34
Description
When using PerthImplicitWatermarker with run_name="implicit", the watermarking process significantly alters high frequencies (> 16 kHz), limiting the maximum frequency to approximately 16.812 kHz, while the original audio contains frequencies up to ~21.878 kHz. This results in a loss of spectral content, which is critical for applications requiring perceptually identical audio.
To Reproduce
Steps to reproduce the behavior:
- Use
PerthImplicitWatermarkerin a Python application on Windows 10/11. - Load a stereo WAV file (44.1 kHz, PCM_16, ~1411 kbps) with
librosa.load(dtype=float32). - Apply watermark using
watermarker.apply_watermark(audio, sample_rate=44100). - Compare the frequency spectrum of the original and watermarked audio using
librosa.stft. - Observe that frequencies above ~16.812 kHz are significantly reduced (e.g., spectral energy drops from 0.2815 to 0.0178, diff_ratio=0.9369).
Expected behavior
The watermarked audio should preserve the full frequency spectrum of the original (up to ~21.878 kHz for 44.1 kHz audio) to ensure perceptual identity, with minimal or no alteration to high frequencies.
Screenshots/Logs
Example log from my application:
INFO:main:Original audio loaded: shape=(2, 13793343), sample_rate=44100, channels=stereo, dtype=float32
INFO:main:No metadata found in original file
INFO:main:Detected WAV subtype: PCM_16
loaded PerthNet (Implicit) at step 250,000
INFO:main:Watermarker initialized
WARNING:main:Watermark length mismatch in channel 0: marked=13793157, original=13793343, padding to match
WARNING:main:Watermark length mismatch in channel 1: marked=13793157, original=13793343, padding to match
INFO:main:Watermark applied for Matheus
INFO:main:Metrics for Matheus: SNR=15.93dB, MSE=0.003981
WARNING:main:Moderate SNR for Matheus: 15.93dB
INFO:main:Confidence for Matheus: 0.9856
INFO:main:Marked WAV saved: Uploads\81c5aea4-ba44-48ed-9d22-27034b91492b\marked\Holy Priest - Agilator (Original Mix).wav
INFO:main:Database committed
Environment
- OS: Windows 10/11
- Python Version: 3.x
- Library: resemble-perth (latest version)
- Dependencies: librosa, numpy, soundfile, pydub, mutagen
- Application: FastAPI-based audio watermarking server
Request
Could you provide guidance on configuring PerthImplicitWatermarker to preserve high frequencies (> 16 kHz) or suggest an alternative model/run_name that avoids altering the spectrum? If this is a limitation of the implicit model, is there a workaround or another library recommendation (e.g., audiowmark)?