-
Notifications
You must be signed in to change notification settings - Fork 6
Description
i think vocab_file is missing and cannot be found.
Here is the complete out put of the training cell of the training notebook
[Errno 2] No such file or directory: 'codes'
/content/gdrive/MyDrive/trainer/codes
Disabled distributed training.
Path already exists. Rename it to [/content/gdrive/MyDrive/trainer/experiments/gyapan_archived_230806-053843]
23-08-06 05:38:43.913 - INFO: name: gyapan
model: extensibletrainer
scale: 1
gpu_ids: [0]
start_step: 0
checkpointing_enabled: True
fp16: False
use_8bit: True
wandb: False
use_tb_logger: True
datasets:[
train:[
name: gyapan-clone
n_workers: 8
batch_size: 66
mode: paired_voice_audio
path: /content/gdrive/MyDrive/gyapan/train.txt
fetcher_mode: ['lj']
phase: train
max_wav_length: 255995
max_text_length: 200
sample_rate: 22050
load_conditioning: True
num_conditioning_candidates: 2
conditioning_length: 44000
use_bpe_tokenizer: True
load_aligned_codes: False
data_type: img
]
val:[
name: TestValidation
n_workers: 1
batch_size: 33
mode: paired_voice_audio
path: /content/gdrive/MyDrive/gyapan/val.txt
fetcher_mode: ['lj']
phase: val
max_wav_length: 255995
max_text_length: 200
sample_rate: 22050
load_conditioning: True
num_conditioning_candidates: 2
conditioning_length: 44000
use_bpe_tokenizer: True
load_aligned_codes: False
data_type: img
]
]
steps:[
gpt_train:[
training: gpt
loss_log_buffer: 500
optimizer: adamw
optimizer_params:[
lr: 1e-05
triton: False
weight_decay: 0.01
beta1: 0.9
beta2: 0.96
]
clip_grad_eps: 4
injectors:[
paired_to_mel:[
type: torch_mel_spectrogram
mel_norm_file: ../experiments/clips_mel_norms.pth
in: wav
out: paired_mel
]
paired_cond_to_mel:[
type: for_each
subtype: torch_mel_spectrogram
mel_norm_file: ../experiments/clips_mel_norms.pth
in: conditioning
out: paired_conditioning_mel
]
to_codes:[
type: discrete_token
in: paired_mel
out: paired_mel_codes
dvae_config: ../experiments/train_diffusion_vocoder_22k_level.yml
]
paired_fwd_text:[
type: generator
generator: gpt
in: ['paired_conditioning_mel', 'padded_text', 'text_lengths', 'paired_mel_codes', 'wav_lengths']
out: ['loss_text_ce', 'loss_mel_ce', 'logits']
]
]
losses:[
text_ce:[
type: direct
weight: 0.01
key: loss_text_ce
]
mel_ce:[
type: direct
weight: 1
key: loss_mel_ce
]
]
]
]
networks:[
gpt:[
type: generator
which_model_G: unified_voice2
kwargs:[
layers: 30
model_dim: 1024
heads: 16
max_text_tokens: 402
max_mel_tokens: 604
max_conditioning_inputs: 2
mel_length_compression: 1024
number_text_tokens: 256
number_mel_codes: 8194
start_mel_token: 8192
stop_mel_token: 8193
start_text_token: 255
train_solo_embeddings: False
use_mel_codes_as_input: True
checkpointing: True
tortoise_compat: True
]
]
]
path:[
pretrain_model_gpt: ../experiments/autoregressive.pth
strict_load: True
root: /content/gdrive/MyDrive/trainer
experiments_root: /content/gdrive/MyDrive/trainer/experiments/gyapan
models: /content/gdrive/MyDrive/trainer/experiments/gyapan/models
training_state: /content/gdrive/MyDrive/trainer/experiments/gyapan/training_state
log: /content/gdrive/MyDrive/trainer/experiments/gyapan
val_images: /content/gdrive/MyDrive/trainer/experiments/gyapan/val_images
]
train:[
niter: 50000
warmup_iter: -1
mega_batch_factor: 4
val_freq: 60
default_lr_scheme: MultiStepLR
gen_lr_steps: [200, 400, 560, 720]
lr_gamma: 0.5
ema_enabled: False
]
eval:[
pure: True
]
logger:[
print_freq: 20
save_checkpoint_freq: 60
visuals: ['gen', 'mel']
visual_debug_rate: 500
is_mel_spectrogram: True
disable_state_saving: False
]
upgrades:[
number_of_checkpoints_to_save: 1
number_of_states_to_save: 0
]
is_train: True
dist: False
2023-08-06 05:38:45.229937: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
23-08-06 05:38:46.228 - INFO: Random seed: 9361
Traceback (most recent call last):
File "/content/gdrive/MyDrive/trainer/codes/train.py", line 398, in
trainer.init(args.opt, opt, args.launcher)
File "/content/gdrive/MyDrive/trainer/codes/train.py", line 115, in init
self.train_set, collate_fn = create_dataset(dataset_opt, return_collate=True)
File "/content/gdrive/MyDrive/trainer/codes/data/init.py", line 107, in create_dataset
dataset = D(dataset_opt)
File "/content/gdrive/MyDrive/trainer/codes/data/audio/paired_voice_audio_dataset.py", line 169, in init
self.tokenizer = VoiceBpeTokenizer(opt_get(hparams, ['tokenizer_vocab'], '../experiments/bpe_lowercase_asr_256.json'))
File "/content/gdrive/MyDrive/trainer/codes/data/audio/voice_tokenizer.py", line 34, in init
self.tokenizer = Tokenizer.from_file(vocab_file)
Exception: No such file or directory (os error 2)
the question is, how can i get this file?