Skip to content

getting this error: self.tokenizer = Tokenizer.from_file(vocab_file) Exception: No such file or directory (os error 2) #1

@mlkasim791

Description

@mlkasim791

i think vocab_file is missing and cannot be found.
Here is the complete out put of the training cell of the training notebook
[Errno 2] No such file or directory: 'codes'
/content/gdrive/MyDrive/trainer/codes
Disabled distributed training.
Path already exists. Rename it to [/content/gdrive/MyDrive/trainer/experiments/gyapan_archived_230806-053843]
23-08-06 05:38:43.913 - INFO: name: gyapan
model: extensibletrainer
scale: 1
gpu_ids: [0]
start_step: 0
checkpointing_enabled: True
fp16: False
use_8bit: True
wandb: False
use_tb_logger: True
datasets:[
train:[
name: gyapan-clone
n_workers: 8
batch_size: 66
mode: paired_voice_audio
path: /content/gdrive/MyDrive/gyapan/train.txt
fetcher_mode: ['lj']
phase: train
max_wav_length: 255995
max_text_length: 200
sample_rate: 22050
load_conditioning: True
num_conditioning_candidates: 2
conditioning_length: 44000
use_bpe_tokenizer: True
load_aligned_codes: False
data_type: img
]
val:[
name: TestValidation
n_workers: 1
batch_size: 33
mode: paired_voice_audio
path: /content/gdrive/MyDrive/gyapan/val.txt
fetcher_mode: ['lj']
phase: val
max_wav_length: 255995
max_text_length: 200
sample_rate: 22050
load_conditioning: True
num_conditioning_candidates: 2
conditioning_length: 44000
use_bpe_tokenizer: True
load_aligned_codes: False
data_type: img
]
]
steps:[
gpt_train:[
training: gpt
loss_log_buffer: 500
optimizer: adamw
optimizer_params:[
lr: 1e-05
triton: False
weight_decay: 0.01
beta1: 0.9
beta2: 0.96
]
clip_grad_eps: 4
injectors:[
paired_to_mel:[
type: torch_mel_spectrogram
mel_norm_file: ../experiments/clips_mel_norms.pth
in: wav
out: paired_mel
]
paired_cond_to_mel:[
type: for_each
subtype: torch_mel_spectrogram
mel_norm_file: ../experiments/clips_mel_norms.pth
in: conditioning
out: paired_conditioning_mel
]
to_codes:[
type: discrete_token
in: paired_mel
out: paired_mel_codes
dvae_config: ../experiments/train_diffusion_vocoder_22k_level.yml
]
paired_fwd_text:[
type: generator
generator: gpt
in: ['paired_conditioning_mel', 'padded_text', 'text_lengths', 'paired_mel_codes', 'wav_lengths']
out: ['loss_text_ce', 'loss_mel_ce', 'logits']
]
]
losses:[
text_ce:[
type: direct
weight: 0.01
key: loss_text_ce
]
mel_ce:[
type: direct
weight: 1
key: loss_mel_ce
]
]
]
]
networks:[
gpt:[
type: generator
which_model_G: unified_voice2
kwargs:[
layers: 30
model_dim: 1024
heads: 16
max_text_tokens: 402
max_mel_tokens: 604
max_conditioning_inputs: 2
mel_length_compression: 1024
number_text_tokens: 256
number_mel_codes: 8194
start_mel_token: 8192
stop_mel_token: 8193
start_text_token: 255
train_solo_embeddings: False
use_mel_codes_as_input: True
checkpointing: True
tortoise_compat: True
]
]
]
path:[
pretrain_model_gpt: ../experiments/autoregressive.pth
strict_load: True
root: /content/gdrive/MyDrive/trainer
experiments_root: /content/gdrive/MyDrive/trainer/experiments/gyapan
models: /content/gdrive/MyDrive/trainer/experiments/gyapan/models
training_state: /content/gdrive/MyDrive/trainer/experiments/gyapan/training_state
log: /content/gdrive/MyDrive/trainer/experiments/gyapan
val_images: /content/gdrive/MyDrive/trainer/experiments/gyapan/val_images
]
train:[
niter: 50000
warmup_iter: -1
mega_batch_factor: 4
val_freq: 60
default_lr_scheme: MultiStepLR
gen_lr_steps: [200, 400, 560, 720]
lr_gamma: 0.5
ema_enabled: False
]
eval:[
pure: True
]
logger:[
print_freq: 20
save_checkpoint_freq: 60
visuals: ['gen', 'mel']
visual_debug_rate: 500
is_mel_spectrogram: True
disable_state_saving: False
]
upgrades:[
number_of_checkpoints_to_save: 1
number_of_states_to_save: 0
]
is_train: True
dist: False

2023-08-06 05:38:45.229937: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
23-08-06 05:38:46.228 - INFO: Random seed: 9361
Traceback (most recent call last):
File "/content/gdrive/MyDrive/trainer/codes/train.py", line 398, in
trainer.init(args.opt, opt, args.launcher)
File "/content/gdrive/MyDrive/trainer/codes/train.py", line 115, in init
self.train_set, collate_fn = create_dataset(dataset_opt, return_collate=True)
File "/content/gdrive/MyDrive/trainer/codes/data/init.py", line 107, in create_dataset
dataset = D(dataset_opt)
File "/content/gdrive/MyDrive/trainer/codes/data/audio/paired_voice_audio_dataset.py", line 169, in init
self.tokenizer = VoiceBpeTokenizer(opt_get(hparams, ['tokenizer_vocab'], '../experiments/bpe_lowercase_asr_256.json'))
File "/content/gdrive/MyDrive/trainer/codes/data/audio/voice_tokenizer.py", line 34, in init
self.tokenizer = Tokenizer.from_file(vocab_file)
Exception: No such file or directory (os error 2)

the question is, how can i get this file?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions