Miscellanous scripts

In this repository is a collection of scripts I have created for various text and language processing tasks during my PhD research.

add_audioBNC_dialects.r was used expand dialect codes used in the AudioBNC corpus (http://www.phon.ox.ac.uk/AudioBNC), and group them into larger linguistic macro regions based on The Dialects of England (Trudgill 2000).
clean_textgrids.py removes punctuation from the word tier of the Sounts of the City corpus (https://soundsofthecity.arts.gla.ac.uk/).
parse_buckeye_words.py was used to extract the underlying (force aligned) transcription from the Buckeye corpus (https://buckeyecorpus.osu.edu/). Since the alignment and transcription of the Buckeye corpus was manually adjusted after alignment, this code allows one to approximate the original force-aligned transcription (which may be more comparable with other available speech corpora).
replace_speaker_names.py was used to anonymise the Modern RP corpus (Fabricus 2000) by iteratively replacing the speaker names on the TextGrid tiers with predefined speaker IDs.

Fabricus, Anne. (2000). T-glottalling between stigma and prestige: A sociolinguistic study of modern RP. PhD thesis, Copenhagen Business School

Trudgill, Peter. (2000). The Dialects of England. Oxford: Blackwell

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
README.md		README.md
add_AudioBNC_dialects.r		add_AudioBNC_dialects.r
add_switchboard_speakers.py		add_switchboard_speakers.py
annotate_stops.praat		annotate_stops.praat
clean_textgrids.py		clean_textgrids.py
convert_kana_to_romaji.py		convert_kana_to_romaji.py
fix_HH_names.py		fix_HH_names.py
fix_edi_phones.py		fix_edi_phones.py
fix_p2fa_names.py		fix_p2fa_names.py
format_jvs.py		format_jvs.py
get_mfa_phoneset.py		get_mfa_phoneset.py
glw_dirs.py		glw_dirs.py
make_openms_speaker_dict.py		make_openms_speaker_dict.py
make_speaker_dirs.py		make_speaker_dirs.py
make_vcs_corpus.py		make_vcs_corpus.py
make_vctk_speaker_dirs.py		make_vctk_speaker_dirs.py
move_files_to_speaker_dirs.py		move_files_to_speaker_dirs.py
parse_buckeye_words.py		parse_buckeye_words.py
pgdb-test.py		pgdb-test.py
prep_openms.py		prep_openms.py
prep_ssc.py		prep_ssc.py
prep_vctk_textgrids.py		prep_vctk_textgrids.py
rename_HH_files.py		rename_HH_files.py
replace_speaker_names.py		replace_speaker_names.py
ssc_align.py		ssc_align.py
ssc_postproc.py		ssc_postproc.py
vcs_formants.py		vcs_formants.py
whisper2textgrid.py		whisper2textgrid.py
words_to_utts.py		words_to_utts.py

Provide feedback