Replies: 6 comments 1 reply
-
|
Hello, did you compare this to |
Beta Was this translation helpful? Give feedback.
-
|
Yeah, with I have some examples but being in Finnish it probably doesn't help. I have to watch in slow motion to see the difference. I will soon experiment with English language. Instruments are suppressed. word, lines: lines.mp4word, segment: segment.mp4 |
Beta Was this translation helpful? Give feedback.
-
|
That's interesting, the problem with generalizing this is that not all text comes with newline separators and even if it exists, it might not be in a semantically meaningful position, an alternative is to use both word and sentence splitting, and inserting |
Beta Was this translation helpful? Give feedback.
-
|
Hmm... In my use case the verses are always seperated by a newline and it makes the most sense to just follow the lyrics provider. I think that a good solution would be that the user has an option to insert the What do you mean " Btw, do you run the audio on the audio before instrument suppression or after it?" ? |
Beta Was this translation helpful? Give feedback.
-
|
It's already doable, you can break down |
Beta Was this translation helpful? Give feedback.
-
|
By applying the commit one can set For example I wanted to have them every newline from ctc_forced_aligner import (
load_audio,
load_alignment_model,
generate_emissions,
preprocess_text,
get_alignments,
get_spans,
postprocess_results,
)
import re
audio_path = "test.wav"
text_path = "test.txt"
language = "fi"
device = "cuda" if torch.cuda.is_available() else "cpu"
batch_size = 16
alignment_model, alignment_tokenizer, = load_alignment_model(
device,
dtype=torch.float32 if device == "cuda" else torch.float32,
)
audio_waveform = load_audio(audio_path, alignment_model.dtype, alignment_model.device)
text = ""
with open(text_path, "r") as f:
text = f.read()
text = re.sub(r"\n+", "<star>", text)
#Replace every, or every consecutive newline characters to <star>
print(text)
emissions, stride = generate_emissions(
alignment_model, audio_waveform, batch_size=batch_size,
)
tokens_starred, text_starred = preprocess_text(
text,
romanize=True,
language=language,
star_frequency="custom", #This must be set to "custom"
)
segments, scores, blank_id = get_alignments(
emissions,
tokens_starred,
alignment_tokenizer,
)
spans = get_spans(tokens_starred, segments, blank_id)
word_timestamps = postprocess_results(text_starred, spans, stride, scores)
print(word_timestamps) |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hey, I got better accuracy of word level timestamps for my purposes by adding
<star>tokens per line (\n). The implementation of mine is kinda incoherent so I didn't mind to PR. A better implementation would just be replacing \n with<star>and somehow escaping the<star>so that romanization doesn't strip em.in text_utils.py:
Beta Was this translation helpful? Give feedback.
All reactions