When I use the 16kHz model to reconstruct the signals, the output wave form consistently exhibits a slight temporal shift relative to the groud-truth signal. This misalignment leads to a significant degradation in SI-SDR.
But when I first resample the signals to 44.1kHz and process them by 44.1kHz model, the reconstructed waveform aligns much more accurately with the reference signals, resulting in a substantially more reasonable SI-SDR.
When I use the 16kHz model to reconstruct the signals, the output wave form consistently exhibits a slight temporal shift relative to the groud-truth signal. This misalignment leads to a significant degradation in SI-SDR.
But when I first resample the signals to 44.1kHz and process them by 44.1kHz model, the reconstructed waveform aligns much more accurately with the reference signals, resulting in a substantially more reasonable SI-SDR.