Skip to content

TigreGotico/synthetic_dataset_generator

Repository files navigation

Wake Word Dataset Generation and Augmentation Scripts 🎙️

This repository contains a collection of Python scripts and accompanying shell scripts designed for creating, augmenting, and normalizing datasets specifically for Wake Word (WW) Detection models.


Scripts Overview

File Name Type Description
record_dataset.py Python Script An interactive tool for manually recording a custom wake word dataset. It uses the Silero VAD (Voice Activity Detection) model to detect voice activity, helping the user record positive wake word samples, negative phrases, and ambient background noise.
ovos_ww_synth.py Python Script The core script for synthetic data generation. It generates wake word audio samples using multiple Text-to-Speech (TTS) engines (e.g., Edge, Google, Piper) and optionally incorporates voice conversion (VC) to simulate multiple speakers.
ovos_ww_synth.sh Shell Script A driver script that simplifies the execution of ovos_ww_synth.py. It is designed to handle the synthesis process for one or more wake words in a specified language, including logging and concurrent job management.
vc_ww_synth.sh Shell Script A specific example script utilizing the chatterbox_bulk_tts tool to perform Voice Converted (VC) TTS synthesis of a wake word, leveraging a dataset of voice references.
augment_voices.sh Shell Script A utility script that uses the chatterbox_bulk_vc tool to revoice an existing dataset. This is used to augment a synthetic or recorded dataset by converting the audio to sound like new, random speakers.
augment.py Python Script A dedicated script for acoustic dataset augmentation. It applies various real-world audio transformations—such as noise mixing, reverb, pitch shifting, and speed perturbation—to preprocessed audio to increase model robustness.
adversarial_samples.py Python Script A tool for generating adversarial text samples (hard negatives) that are phonetically similar to the target wake word. It employs a combination of Grapheme Augmentation (single-grapheme edits) and potentially a Large Language Model (LLM) to create confusable words, meant for TTS synthesis later.
gen_adversarial_words.sh Shell Script An execution script that calls adversarial_samples.py to generate a list of adversarial words for a specific wake word and saves the output to a text file.
normalize_txt.sh Shell Script A utility script for post-processing text files. It converts all text to lowercase, then sorts and deduplicates the lines in place, commonly used for cleaning up word lists like those generated by gen_adversarial_words.sh.

Credits

This work was made possible by the generous grant from NGI0 Commons Fund

This project was funded through the NGI0 Commons Fund, a fund established by NLnet with financial support from the European Commission's Next Generation Internet programme, under the aegis of DG Communications Networks, Content and Technology under grant agreement No 101135429. Additional funding is made available by the Swiss State Secretariat for Education, Research and Innovation (SERI).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published