Inference#12
Conversation
|
@glicerico, I've just created a new fork from yours. I made some changes to the code related to running on gpu. Also added annotations to the classes. And added evaluation of the model on a test dataset. The problem is, when using your trained checkpoint, my eval method gives me quite poor results. May I ask you to try it on your side and see if there is any problem with your trained checkpoint or with my code? My fork is here https://github.com/PolKul/CASA-Dialogue-Act-Classifier.git here is the result of running Eval on your checkpoint "epoch=29-val_accuracy=0.751411.ckpt" It shows accuracy of only 10%... |
|
Hey @PolKul , as commented in one of the issues, that checkpoint was trained before the classes problem was solved, so it probably is using the wrong labels. |
|
@PolKul I uploaded it here again... please try your evaluation with this checkpoint and let me know. |
|
Oh, @PolKul , I am just noticing that you are using your own class label numbering... so it's expected that the predictions won't match. So, you probably should leave the classes as it was proposed in my pull request, or train a model with the label order that you prefer :) |
|
@glicerico, thank you for your review. However my question was more about the eval() method of the DialogClassifier class. As you can see it doesn't use my annotated classes (act_label_names list) in any way and still produces really bad results (0.1 F1 score). To avoid confusion, you can add the same eval method to your branch and try running it. Let me know if you can see any better statistics from it? |
|
You're right, I see that you only use |
|
Sorry, but I don't see where you see the problem with the act_label_names.
that is a dictionary, with the following structure: ["name","act_tag","example"]. The code below is finding a "name" by "act_tag": Or you mean that "prediction" is incorrectly labeled? |
|
After your past comment, I don't see a problem with
|
I mean that prediction is labeled differently |
|
I confirm that both "epoch=28-val_accuracy=0.746056.ckpt" and "epoch=29-val_accuracy=0.751411.ckpt" give the same (bad) results with F1 score of 0.1 It would be interesting to see the results of your eval()... |
|
Hi @PolKul , these are the results I got using the best checkpoint I trained, with unfrozen Roberta weights. |
|
Hi @glicerico, thanks for the checkpoint and eval. I've just updated the repo from your latest inference branch and it worked! Not sure what was wrong with my previous code though.. any way, thank you for your assistance. |
Dear @glicerico, may I ask if you can re-upload the checkpoint? Somehow I don't get the results, (and my inference speed is so slow when using yours, do you know why?) |
|
@minarainbow , you can find the checkpoint at https://www.dropbox.com/s/egiv70dwl1ikrbq/epoch%3D5-val_accuracy%3D0.779101.ckpt?dl=0, I'll remove it from there in a couple days. |
@macabdul9 sorry for the large PR, but I had to accumulate improvements to achieve proper inference.
Summary of changes:
+, as those are continuations of interrupted utterances. Unless these are somehow joined back to their initial utterance, I believe they are useless... See discussion here.