problems about pretrainedVectors option

i have 16 classes and about 2000 samples, trained using pretrained Vectors:

> fasttext supervised -input ${input_path} -output ../model/model -pretrainedVectors ${embedding_path} -dim 200 -epoch 20 -thread 2 -wordNgrams 3

> Read 0M words
> Number of words:  8127
> Number of labels: 16
> Progress: 100.0%  words/sec/thread: 432094  lr: 0.000000  loss: 0.049534  eta: 0h0m -14m


 got 97% ,96% P@1 and R@1 respectively:

> fasttext test ../model/model.bin ${input_path}

> N	494
> P@1	0.97
> R@1	0.962
> Number of examples: 494

however, when predict tens of thousands of new samples, found many samples whose label not in the  16 classes mentioned previously got high prediction probability. 

i guess there may be two reasons:
1. distributions for training and actual samples is different
2. in test phase, the model did not use pretrained vectors

anyone knows?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

problems about pretrainedVectors option #676

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

problems about pretrainedVectors option #676

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions