You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/fmeval/eval_algorithms/qa_accuracy.py
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -285,7 +285,7 @@ class QAAccuracy(EvalAlgorithmInterface):
285
285
3. Precision over Words: The fraction of words in the prediction that are also found in the target answer. The text is normalized as before.
286
286
4. Recall over Words: The fraction of words in the target answer that are also found in the prediction.
287
287
5. F1 over Words: The harmonic mean of precision and recall, over words (normalized).
288
-
6. [BERTScore](https://arxiv.org/pdf/1904.09675.pdf) uses a second ML model (from the BERT family) to compute sentence embeddings and compare their cosine similarity. This score may account for additional linguistic flexibility over ROUGE and METEOR since semantically similar sentences should be embedded closer to each other.
288
+
6. [BERTScore](https://arxiv.org/pdf/1904.09675.pdf) uses a second ML model (from the BERT family) to compute sentence embeddings and compare their cosine similarity. This score may account for additional linguistic flexibility over the other QAAccuracy metrics since semantically similar sentences should be embedded closer to each other.
289
289
290
290
291
291
Precision, Recall and F1 over Words are more flexible as they assign non-zero scores to
0 commit comments