Skip to content

Commit be1a719

Browse files
author
Sarina Meyer
committed
Added code to paper "Speaker Anonymization with Phonetic Intermediate Representations"
1 parent 0034362 commit be1a719

30 files changed

+1915
-1
lines changed

.gitignore

Lines changed: 134 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
# Byte-compiled / optimized / DLL files
2+
__pycache__/
3+
*.py[cod]
4+
*$py.class
5+
6+
# C extensions
7+
*.so
8+
9+
# Distribution / packaging
10+
.Python
11+
build/
12+
develop-eggs/
13+
dist/
14+
downloads/
15+
eggs/
16+
.eggs/
17+
lib/
18+
lib64/
19+
parts/
20+
sdist/
21+
var/
22+
wheels/
23+
pip-wheel-metadata/
24+
share/python-wheels/
25+
*.egg-info/
26+
.installed.cfg
27+
*.egg
28+
MANIFEST
29+
30+
# PyInstaller
31+
# Usually these files are written by a python script from a template
32+
# before PyInstaller builds the exe, so as to inject date/other infos into it.
33+
*.manifest
34+
*.spec
35+
36+
# Installer logs
37+
pip-log.txt
38+
pip-delete-this-directory.txt
39+
40+
# Unit test / coverage reports
41+
htmlcov/
42+
.tox/
43+
.nox/
44+
.coverage
45+
.coverage.*
46+
.cache
47+
nosetests.xml
48+
coverage.xml
49+
*.cover
50+
*.py,cover
51+
.hypothesis/
52+
.pytest_cache/
53+
54+
# Translations
55+
*.mo
56+
*.pot
57+
58+
# Django stuff:
59+
*.log
60+
local_settings.py
61+
db.sqlite3
62+
db.sqlite3-journal
63+
64+
# Flask stuff:
65+
instance/
66+
.webassets-cache
67+
68+
# Scrapy stuff:
69+
.scrapy
70+
71+
# Sphinx documentation
72+
docs/_build/
73+
74+
# PyBuilder
75+
target/
76+
77+
# Jupyter Notebook
78+
.ipynb_checkpoints
79+
80+
# IPython
81+
profile_default/
82+
ipython_config.py
83+
84+
# pyenv
85+
.python-version
86+
87+
# pipenv
88+
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
89+
# However, in case of collaboration, if having platform-specific dependencies or dependencies
90+
# having no cross-platform support, pipenv may install dependencies that don't work, or not
91+
# install all needed dependencies.
92+
#Pipfile.lock
93+
94+
# PEP 582; used by e.g. github.com/David-OConnor/pyflow
95+
__pypackages__/
96+
97+
# Celery stuff
98+
celerybeat-schedule
99+
celerybeat.pid
100+
101+
# SageMath parsed files
102+
*.sage.py
103+
104+
# Environments
105+
.env
106+
.venv
107+
env/
108+
venv/
109+
ENV/
110+
env.bak/
111+
venv.bak/
112+
113+
# Spyder project settings
114+
.spyderproject
115+
.spyproject
116+
117+
# Rope project settings
118+
.ropeproject
119+
120+
# mkdocs documentation
121+
/site
122+
123+
# mypy
124+
.mypy_cache/
125+
.dmypy.json
126+
dmypy.json
127+
128+
# Pyre type checker
129+
.pyre/
130+
131+
models/
132+
original_speaker_embeddings/
133+
corpora/
134+
results/

.gitmodules

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
[submodule "Voice-Privacy-Challenge-2020"]
2+
path = Voice-Privacy-Challenge-2020
3+
url = https://github.com/Voice-Privacy-Challenge/Voice-Privacy-Challenge-2020
4+
[submodule "IMS-Toucan"]
5+
path = IMS-Toucan
6+
url = https://github.com/Flux9665/IMS-Toucan
7+
branch = vp_inference/1912a835c4b3de20f5190797e684f10aa45a76d9

IMS-Toucan

Submodule IMS-Toucan added at 1912a83

README.md

Lines changed: 119 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,121 @@
11
# Speaker Anonymization
22

3-
The code, descriptions and a link to the demo will be added soon.
3+
This repository contains the speaker anonymization system developed at the Institute for Natural Language Processing
4+
(IMS) at the University of Stuttgart, Germany. The system is described in our paper [*Speaker Anonymization with
5+
Phonetic Intermediate Representations*](https://arxiv.org/abs/2207.04834) that will be
6+
published at
7+
Interspeech 2022.
8+
9+
**In addition to the code, we are going to provide a live demo soon.**
10+
11+
## System Description
12+
The system is based on the Voice Privacy Challenge 2020 which is included as submodule. It uses the basic idea of
13+
speaker embedding anonymization with neural synthesis, and uses the data and evaluation framework of the challenge.
14+
For a detailed description of the system, please read our paper linked above.
15+
16+
![architecture](../speaker-anonymization/figures/architecture.png)
17+
18+
19+
## Installation
20+
Clone this repository with all its submodules:
21+
```
22+
git clone --recurse-submodules https://github.com/DigitalPhonetics/speaker-anonymization.git
23+
```
24+
25+
In order to be able to use the framework of the Voice Privacy Challenge 2020 for evaluation, you need to install it
26+
first. According to [the challenge repository](https://github.com/Voice-Privacy-Challenge/Voice-Privacy-Challenge-2020), this should simply be
27+
```
28+
cd Voice-Privacy-Challenge-2020
29+
./install.sh
30+
```
31+
However, on our systems, we had to make certain adjustments and also decided to use a more light-weight environment
32+
that minimizes unnecessary components. If you are interested, you can see our steps in
33+
[alternative_challenge_framework_installation.md](alternative_challenge_framework_installation.md). Just as a note: It is
34+
very possible that those would not directly work on your system and would need to be modified.
35+
36+
**Note: this step will download and install Kaldi, and might lead to complications. Additionally, make sure that you
37+
are running the install script on a device with access to GPUs and CUDA.**
38+
39+
Additionally, install the [requirements](requirements.txt) (in the base directory of this repository):
40+
```
41+
pip install -r requirements.txt
42+
```
43+
44+
## Getting started
45+
Before the actual execution of our pipeline system, you first need to download and prepare the challenge data and
46+
the evaluation models. For
47+
this, you will need a password provided by the organizers of the Voice Privacy Challenge. Please contact them (see
48+
information on [their repository](https://github.com/Voice-Privacy-Challenge/Voice-Privacy-Challenge-2020) or
49+
[website](https://www.voiceprivacychallenge.org/)) for
50+
this access.
51+
52+
You can do this by either
53+
54+
### a) Executing our lightweight scripts:
55+
This will only download and prepare the necessary models and datasets. Note that these scripts are simply extracts
56+
of the challenge run script.
57+
```
58+
cd setup_scripts
59+
./run_download_data.sh
60+
./run_prepare_data.sh
61+
```
62+
63+
or by
64+
### b) Executing the challenge run script:
65+
This will download and prepare everything necessary AND run the baseline system of the Voice Privacy Challenge 2020.
66+
Note that you will need to have installed the whole framework by the challenge install script before.
67+
```
68+
cd Voice-Privacy-Challenge-2020/baseline
69+
./run.sh
70+
```
71+
72+
### Running the pipeline
73+
The system pipeline controlled in [run_inference.py](run_inference.py). You can run it via
74+
```
75+
python run_inference.py --gpu <gpu_id>
76+
```
77+
with <gpu_id> being the ID of the GPU the code should be executed on. If this option is not specified, it will run
78+
on CPU (not recommended).
79+
80+
The script will anonymize the development and test data of LibriSpeech and VCTK in three steps:
81+
1. ASR: Recognition of the linguistic content, output in form of text or phone sequences
82+
2. Anonymization: Modification of speaker embeddings, output as torch vectors
83+
3. TTS: Synthesis based on recognized transcription and anonymized speaker embedding, output as audio files (wav)
84+
85+
Each module produces intermediate results that are saved to disk. A module is only executed if previous intermediate
86+
results for dependent pipeline combination do not exist or if recomputation is forced. Otherwise, the previous
87+
results are loaded. Example: The ASR module is
88+
only executed if there are no transcriptions produced by exactly that ASR model. On the other hand, the TTS is
89+
executed if (a) the ASR was performed directly before (new transcriptions), and/or (b) the anonymization was
90+
performed directly before (new speaker embeddings), and/or (c) no TTS results exist for this combination of models.
91+
92+
If you want to change any settings, like the particular models or datasets, you can adjust the *settings* dictionary
93+
in [run_inference.py](run_inference.py). If you want to force recomputation for a specific module, add its tag to
94+
the *force_compute* list.
95+
96+
Immediately after the anonymization pipeline terminates, the evaluation pipeline is started. It performs some
97+
preparation steps and then executes the evaluation part of the challenge run script (this extract can be found in
98+
[evaluation/run_evaluation.sh](../speaker-anonymization/evaluation/run_evaluation.sh)).
99+
100+
Finally, for clarity, the most important parts of the evaluation results as well as the used settings are copied to
101+
the [results](results) directory.
102+
103+
104+
## Models
105+
The following table lists all models for each module that are reported in the paper and are included in this
106+
repository. Each model is given by its name in the directory and the name used in the paper. In the *settings*
107+
dictionary in [run_inference.py](run_inference.py), the model name should be used. The *x* for default names the
108+
models that are used in the main configuration of the system.
109+
110+
| Module | Default| Model name | Name in paper|
111+
|--------|--------|------------|--------------|
112+
| ASR | x | asr_tts-phn_en.zip | phones |
113+
| | | asr_stt_en | STT |
114+
| | | asr_tts_en.zip | TTS |
115+
| Anonymization | x | pool_minmax_ecapa+xvector | pool |
116+
| | | pool_raw_ecapa+xvector | pool raw |
117+
| | | random_in-scale_ecapa+xvector | random |
118+
| TTS | x | trained_on_ground_truth_phonemes.pt| Libri100|
119+
| | | trained_on_asr_phoneme_outputs.pt | Libri100 + finetuned |
120+
| | | trained_on_libri600_asr_phoneme_outputs.pt | Libri600 |
121+
| | | trained_on_libri600_ground_truth_phonemes.pt | Libri600 + finetuned |

Voice-Privacy-Challenge-2020

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
# Alternative Installation of the Framework for the Voice Privacy Challenge 2020
2+
Unfortunately, the installation is not always as easy as the organizers imply in their [install
3+
script](Voice-Privacy-Challenge-2020/install.sh), and installs several tools that are only necessary if the primary
4+
baseline of the challenge should be executed. To adapt the script to our devices and pipeline, we shortened and
5+
modified it, and exchanged some components.
6+
7+
**Note: To run the code in this repository, it is NOT necessary to use the installation steps described in this
8+
document. Instead, you can also simply use the original [install
9+
script](Voice-Privacy-Challenge-2020/install.sh). If you use this document, be aware that you probably have to
10+
modify several steps to make it work for you.**
11+
12+
## Installation Steps
13+
This guide expects that you cloned the repository included submodules. Once you followed the installation steps
14+
described in the following, continue with the *Getting started* section in the [main README](README.md).
15+
16+
### 1. Environment creation
17+
The original installation script would create a conda environment but conda would include many packages that are not
18+
always needed. We therefore 'manually' create a virtual environment within the
19+
repository:
20+
```
21+
virtualenv venv --python=python3.8
22+
source venv/bin/activate
23+
pip install -r Voice-Privacy-Challenge-2020/requirements.txt
24+
```
25+
Instead of the last line, if you want to install all requirements for the whole repository, you can instead run
26+
```
27+
pip install -r requirements.txt
28+
```
29+
(If this does not work, install the requirements files listed in it separately)
30+
31+
Finally, we have to make the install script skip the step of creating an environment by creating the required check
32+
file:
33+
```
34+
touch Voice-Privacy-Challenge-2020/.done-venv
35+
```
36+
37+
### 2. Adapting Kaldi
38+
The version of Kaldi in the framework is not up to date, and even the up to date one does not officially support our
39+
gcc version. We have to change that:
40+
```
41+
cd Voice-Privacy-Challenge-2020/kaldi
42+
git checkout master
43+
vim src/configure
44+
```
45+
In src/configure, change the min supported gcc version:
46+
```
47+
- MIN_UNSUPPORTED_GCC_VER="10.0"
48+
- MIN_UNSUPPORTED_GCC_VER_NUM=100000;
49+
+ MIN_UNSUPPORTED_GCC_VER="12.0"
50+
+ MIN_UNSUPPORTED_GCC_VER_NUM=120000;
51+
```
52+
53+
### 3. CUDA and MKL
54+
Due to several installed versions of CUDA and MKL, and very specific requirements of Kaldi, we have to specify the
55+
paths to them in the [setup_scripts/install_challenge_framework.sh](../speaker-anonymization/setup_scripts/install_challenge_framework.sh) file.
56+
57+
### 4. Installation
58+
Once everything above is resolved, you simply have to run the adapted install script:
59+
```
60+
cd setup_scripts
61+
./install_challenge_framework.sh
62+
```

anonymization/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
from .pool_anonymizer import PoolAnonymizer
2+
from .random_anonymizer import RandomAnonymizer

anonymization/base_anonymizer.py

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
from pathlib import Path
2+
import torch
3+
4+
from .speaker_embeddings import SpeakerEmbeddings
5+
6+
7+
class BaseAnonymizer:
8+
9+
def __init__(self, vec_type='xvector', device=None, emb_level='spk', **kwargs):
10+
# Base class for speaker embedding anonymization.
11+
self.vec_type = vec_type
12+
self.emb_level = emb_level
13+
14+
if isinstance(device, torch.device):
15+
self.device = device
16+
elif isinstance(device, str):
17+
self.device = torch.device(device)
18+
elif isinstance(device, int):
19+
self.device = torch.device(f'cuda:{device}')
20+
else:
21+
self.device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
22+
23+
def load_parameters(self, model_dir: Path):
24+
# Template method for loading parameters special to the anonymization method. Not implemented.
25+
raise NotImplementedError('load_parameters')
26+
27+
def save_parameters(self, model_dir: Path):
28+
# Template method for saving parameters special to the anonymization method. Not implemented.
29+
raise NotImplementedError('save_parameters')
30+
31+
def load_embeddings(self, emb_dir: Path):
32+
# Load previously extracted or generated speaker embeddings from disk.
33+
embeddings = SpeakerEmbeddings(self.vec_type, device=self.device, emb_level=self.emb_level)
34+
embeddings.load_vectors(emb_dir)
35+
return embeddings
36+
37+
def save_embeddings(self, embeddings, emb_dir):
38+
# Save speaker embeddings to disk.
39+
embeddings.save_vectors(emb_dir)
40+
41+
def anonymize_data(self, data_dir: Path, vector_dir: Path, emb_level='spk'):
42+
# Template method for anonymizing a dataset. Not implemented.
43+
raise NotImplementedError('anonymize_data')
44+
45+
def _get_speaker_embeddings(self, data_dir: Path, vector_dir: Path, emb_level='spk'):
46+
# Retrieve original speaker embeddings, either by extracting or loading them.
47+
vectors = SpeakerEmbeddings(vec_type=self.vec_type, emb_level=emb_level, device=self.device)
48+
if vector_dir.exists():
49+
vectors.load_vectors(in_dir=vector_dir)
50+
else:
51+
vectors.extract_vectors_from_audio(data_dir=data_dir)
52+
vectors.save_vectors(out_dir=vector_dir)
53+
return vectors

0 commit comments

Comments
 (0)