|
1 | 1 | # Speaker Anonymization |
2 | 2 |
|
3 | | -The code, descriptions and a link to the demo will be added soon. |
| 3 | +This repository contains the speaker anonymization system developed at the Institute for Natural Language Processing |
| 4 | +(IMS) at the University of Stuttgart, Germany. The system is described in our paper [*Speaker Anonymization with |
| 5 | +Phonetic Intermediate Representations*](https://arxiv.org/abs/2207.04834) that will be |
| 6 | +published at |
| 7 | +Interspeech 2022. |
| 8 | + |
| 9 | +**In addition to the code, we are going to provide a live demo soon.** |
| 10 | + |
| 11 | +## System Description |
| 12 | +The system is based on the Voice Privacy Challenge 2020 which is included as submodule. It uses the basic idea of |
| 13 | +speaker embedding anonymization with neural synthesis, and uses the data and evaluation framework of the challenge. |
| 14 | +For a detailed description of the system, please read our paper linked above. |
| 15 | + |
| 16 | + |
| 17 | + |
| 18 | + |
| 19 | +## Installation |
| 20 | +Clone this repository with all its submodules: |
| 21 | +``` |
| 22 | +git clone --recurse-submodules https://github.com/DigitalPhonetics/speaker-anonymization.git |
| 23 | +``` |
| 24 | + |
| 25 | +In order to be able to use the framework of the Voice Privacy Challenge 2020 for evaluation, you need to install it |
| 26 | +first. According to [the challenge repository](https://github.com/Voice-Privacy-Challenge/Voice-Privacy-Challenge-2020), this should simply be |
| 27 | +``` |
| 28 | +cd Voice-Privacy-Challenge-2020 |
| 29 | +./install.sh |
| 30 | +``` |
| 31 | +However, on our systems, we had to make certain adjustments and also decided to use a more light-weight environment |
| 32 | +that minimizes unnecessary components. If you are interested, you can see our steps in |
| 33 | +[alternative_challenge_framework_installation.md](alternative_challenge_framework_installation.md). Just as a note: It is |
| 34 | +very possible that those would not directly work on your system and would need to be modified. |
| 35 | + |
| 36 | +**Note: this step will download and install Kaldi, and might lead to complications. Additionally, make sure that you |
| 37 | +are running the install script on a device with access to GPUs and CUDA.** |
| 38 | + |
| 39 | +Additionally, install the [requirements](requirements.txt) (in the base directory of this repository): |
| 40 | +``` |
| 41 | +pip install -r requirements.txt |
| 42 | +``` |
| 43 | + |
| 44 | +## Getting started |
| 45 | +Before the actual execution of our pipeline system, you first need to download and prepare the challenge data and |
| 46 | +the evaluation models. For |
| 47 | +this, you will need a password provided by the organizers of the Voice Privacy Challenge. Please contact them (see |
| 48 | +information on [their repository](https://github.com/Voice-Privacy-Challenge/Voice-Privacy-Challenge-2020) or |
| 49 | +[website](https://www.voiceprivacychallenge.org/)) for |
| 50 | +this access. |
| 51 | + |
| 52 | +You can do this by either |
| 53 | + |
| 54 | +### a) Executing our lightweight scripts: |
| 55 | +This will only download and prepare the necessary models and datasets. Note that these scripts are simply extracts |
| 56 | +of the challenge run script. |
| 57 | +``` |
| 58 | +cd setup_scripts |
| 59 | +./run_download_data.sh |
| 60 | +./run_prepare_data.sh |
| 61 | +``` |
| 62 | + |
| 63 | +or by |
| 64 | +### b) Executing the challenge run script: |
| 65 | +This will download and prepare everything necessary AND run the baseline system of the Voice Privacy Challenge 2020. |
| 66 | +Note that you will need to have installed the whole framework by the challenge install script before. |
| 67 | +``` |
| 68 | +cd Voice-Privacy-Challenge-2020/baseline |
| 69 | +./run.sh |
| 70 | +``` |
| 71 | + |
| 72 | +### Running the pipeline |
| 73 | +The system pipeline controlled in [run_inference.py](run_inference.py). You can run it via |
| 74 | +``` |
| 75 | +python run_inference.py --gpu <gpu_id> |
| 76 | +``` |
| 77 | +with <gpu_id> being the ID of the GPU the code should be executed on. If this option is not specified, it will run |
| 78 | +on CPU (not recommended). |
| 79 | + |
| 80 | +The script will anonymize the development and test data of LibriSpeech and VCTK in three steps: |
| 81 | +1. ASR: Recognition of the linguistic content, output in form of text or phone sequences |
| 82 | +2. Anonymization: Modification of speaker embeddings, output as torch vectors |
| 83 | +3. TTS: Synthesis based on recognized transcription and anonymized speaker embedding, output as audio files (wav) |
| 84 | + |
| 85 | +Each module produces intermediate results that are saved to disk. A module is only executed if previous intermediate |
| 86 | +results for dependent pipeline combination do not exist or if recomputation is forced. Otherwise, the previous |
| 87 | +results are loaded. Example: The ASR module is |
| 88 | +only executed if there are no transcriptions produced by exactly that ASR model. On the other hand, the TTS is |
| 89 | +executed if (a) the ASR was performed directly before (new transcriptions), and/or (b) the anonymization was |
| 90 | +performed directly before (new speaker embeddings), and/or (c) no TTS results exist for this combination of models. |
| 91 | + |
| 92 | +If you want to change any settings, like the particular models or datasets, you can adjust the *settings* dictionary |
| 93 | +in [run_inference.py](run_inference.py). If you want to force recomputation for a specific module, add its tag to |
| 94 | +the *force_compute* list. |
| 95 | + |
| 96 | +Immediately after the anonymization pipeline terminates, the evaluation pipeline is started. It performs some |
| 97 | +preparation steps and then executes the evaluation part of the challenge run script (this extract can be found in |
| 98 | +[evaluation/run_evaluation.sh](../speaker-anonymization/evaluation/run_evaluation.sh)). |
| 99 | + |
| 100 | +Finally, for clarity, the most important parts of the evaluation results as well as the used settings are copied to |
| 101 | +the [results](results) directory. |
| 102 | + |
| 103 | + |
| 104 | +## Models |
| 105 | +The following table lists all models for each module that are reported in the paper and are included in this |
| 106 | +repository. Each model is given by its name in the directory and the name used in the paper. In the *settings* |
| 107 | +dictionary in [run_inference.py](run_inference.py), the model name should be used. The *x* for default names the |
| 108 | +models that are used in the main configuration of the system. |
| 109 | + |
| 110 | +| Module | Default| Model name | Name in paper| |
| 111 | +|--------|--------|------------|--------------| |
| 112 | +| ASR | x | asr_tts-phn_en.zip | phones | |
| 113 | +| | | asr_stt_en | STT | |
| 114 | +| | | asr_tts_en.zip | TTS | |
| 115 | +| Anonymization | x | pool_minmax_ecapa+xvector | pool | |
| 116 | +| | | pool_raw_ecapa+xvector | pool raw | |
| 117 | +| | | random_in-scale_ecapa+xvector | random | |
| 118 | +| TTS | x | trained_on_ground_truth_phonemes.pt| Libri100| |
| 119 | +| | | trained_on_asr_phoneme_outputs.pt | Libri100 + finetuned | |
| 120 | +| | | trained_on_libri600_asr_phoneme_outputs.pt | Libri600 | |
| 121 | +| | | trained_on_libri600_ground_truth_phonemes.pt | Libri600 + finetuned | |
0 commit comments