Skip to content
This repository was archived by the owner on Mar 19, 2024. It is now read-only.
This repository was archived by the owner on Mar 19, 2024. It is now read-only.

problems about pretrainedVectors option #676

@buptstehc

Description

@buptstehc

i have 16 classes and about 2000 samples, trained using pretrained Vectors:

fasttext supervised -input ${input_path} -output ../model/model -pretrainedVectors ${embedding_path} -dim 200 -epoch 20 -thread 2 -wordNgrams 3

Read 0M words
Number of words: 8127
Number of labels: 16
Progress: 100.0% words/sec/thread: 432094 lr: 0.000000 loss: 0.049534 eta: 0h0m -14m

got 97% ,96% P@1 and R@1 respectively:

fasttext test ../model/model.bin ${input_path}

N 494
P@1 0.97
R@1 0.962
Number of examples: 494

however, when predict tens of thousands of new samples, found many samples whose label not in the 16 classes mentioned previously got high prediction probability.

i guess there may be two reasons:

  1. distributions for training and actual samples is different
  2. in test phase, the model did not use pretrained vectors

anyone knows?

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugmachine-learningissue/question to related general ML practice

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions