Skip to content

Commit c14a978

Browse files
committed
Adding weighted WER feature.
1 parent d4586b2 commit c14a978

File tree

7 files changed

+100064
-5
lines changed

7 files changed

+100064
-5
lines changed

MANIFEST.in

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,3 @@
11
include LICENSE README.md requirements.txt
22
recursive-include libs *.*
3+
recursive-include texterrors/data *

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ Features:
1010
- Metrics by group (for example speaker)
1111
- Comparing two hypothesis files to reference
1212
- Oracle WER
13+
- **NEW** Weighted WER (English only)
1314
- Sorting most common errors by frequency or count
1415
- Measuring performance on keywords
1516
- Measuring OOV-CER (see [https://arxiv.org/abs/2107.08091](https://arxiv.org/abs/2107.08091) )
@@ -89,6 +90,7 @@ This results in a WER of 83.3\% because of the extra insertion and deletion. And
8990

9091
Recent changes:
9192

93+
- 11.11.25 Weighted WER for English
9294
- 26.02.25 Faster alignment, better multihyp support, fixed multihyp bug.
9395
- 22.06.22 refactored internals to make them simpler, character aware alignment is off by default, added more explanations
9496
- 20.05.22 fixed bug missing regex dependency

requirements.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,3 +6,4 @@ termcolor
66
Levenshtein
77
regex
88
pytest
9+
importlib_resources

setup.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
import setuptools
55
import sys
66

7-
__version__ = "1.0.10"
7+
__version__ = "1.0.11"
88

99

1010
class get_pybind_include(object):
@@ -101,5 +101,7 @@ def get_requires():
101101
entry_points={'console_scripts': ['texterrors=texterrors.texterrors:cli']},
102102
install_requires=get_requires(),
103103
setup_requires=['pybind11'],
104-
python_requires='>=3.6'
104+
python_requires='>=3.6',
105+
package_data={"texterrors": ["data/wordlist"]},
106+
include_package_data=True,
105107
)

tests/test_functions.py

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -117,6 +117,21 @@ def test_oov_cer():
117117
assert err / cnt == 0., err / cnt
118118

119119

120+
def test_weighted_wer():
121+
reflines = ['1 my name is john doe']
122+
hyplines = ['1 my name is joe doe']
123+
refs = create_inp(reflines)
124+
hyps = create_inp(hyplines)
125+
buffer = io.StringIO()
126+
texterrors.process_output(refs, hyps, buffer, 'A', 'B',weighted_wer=True, skip_detailed=True)
127+
output = buffer.getvalue()
128+
ref ="""WER: 20.0 (ins 0, del 0, sub 1 / 5)
129+
SER: 100.0
130+
Weighted WER: 28.3
131+
"""
132+
assert output == ref, show_diff(output, ref)
133+
134+
120135
def test_seq_distance():
121136
a, b = 'a b', 'a b'
122137
d = texterrors.seq_distance(StringVector(a.split()), StringVector(b.split()))

0 commit comments

Comments
 (0)