Skip to content

Commit b746124

Browse files
committed
Adding weighted WER feature.
1 parent d4586b2 commit b746124

File tree

8 files changed

+100075
-7
lines changed

8 files changed

+100075
-7
lines changed

.github/workflows/publish-to-pypi.yaml

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ jobs:
88
if: startsWith(github.ref, 'refs/tags')
99
strategy:
1010
matrix:
11-
os: [ubuntu-latest, ubuntu-24.04, macos-13, macos-14, windows-latest, macos-latest]
11+
os: [ubuntu-latest, macos-13, macos-14, windows-latest]
1212

1313
steps:
1414
- uses: actions/checkout@v4
@@ -51,7 +51,7 @@ jobs:
5151
strategy:
5252
matrix:
5353
python-version: ["3.10", "3.12"]
54-
os: [ubuntu-latest, ubuntu-24.04, macos-13, macos-14, windows-latest, macos-latest]
54+
os: [ubuntu-latest, macos-13, macos-14, windows-latest]
5555

5656
steps:
5757
- uses: actions/checkout@v4
@@ -91,6 +91,11 @@ jobs:
9191
path: dist
9292
merge-multiple: true
9393

94+
- name: Check dist files
95+
run: |
96+
ls -lh dist/
97+
file dist/*
98+
9499
- uses: pypa/[email protected]
95100
with:
96101
user: __token__

MANIFEST.in

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,3 @@
11
include LICENSE README.md requirements.txt
22
recursive-include libs *.*
3+
recursive-include texterrors/data *

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ Features:
1010
- Metrics by group (for example speaker)
1111
- Comparing two hypothesis files to reference
1212
- Oracle WER
13+
- **NEW** Weighted WER (English only)
1314
- Sorting most common errors by frequency or count
1415
- Measuring performance on keywords
1516
- Measuring OOV-CER (see [https://arxiv.org/abs/2107.08091](https://arxiv.org/abs/2107.08091) )
@@ -89,6 +90,7 @@ This results in a WER of 83.3\% because of the extra insertion and deletion. And
8990

9091
Recent changes:
9192

93+
- 11.11.25 Weighted WER for English
9294
- 26.02.25 Faster alignment, better multihyp support, fixed multihyp bug.
9395
- 22.06.22 refactored internals to make them simpler, character aware alignment is off by default, added more explanations
9496
- 20.05.22 fixed bug missing regex dependency

requirements.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,3 +6,4 @@ termcolor
66
Levenshtein
77
regex
88
pytest
9+
importlib_resources

setup.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
import setuptools
55
import sys
66

7-
__version__ = "1.0.10"
7+
__version__ = "1.0.12"
88

99

1010
class get_pybind_include(object):
@@ -101,5 +101,7 @@ def get_requires():
101101
entry_points={'console_scripts': ['texterrors=texterrors.texterrors:cli']},
102102
install_requires=get_requires(),
103103
setup_requires=['pybind11'],
104-
python_requires='>=3.6'
104+
python_requires='>=3.6',
105+
package_data={"texterrors": ["data/wordlist"]},
106+
include_package_data=True,
105107
)

tests/test_functions.py

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -117,6 +117,21 @@ def test_oov_cer():
117117
assert err / cnt == 0., err / cnt
118118

119119

120+
def test_weighted_wer():
121+
reflines = ['1 my name is john doe']
122+
hyplines = ['1 my name is joe doe']
123+
refs = create_inp(reflines)
124+
hyps = create_inp(hyplines)
125+
buffer = io.StringIO()
126+
texterrors.process_output(refs, hyps, buffer, 'A', 'B',weighted_wer=True, skip_detailed=True)
127+
output = buffer.getvalue()
128+
ref ="""WER: 20.0 (ins 0, del 0, sub 1 / 5)
129+
SER: 100.0
130+
Weighted WER: 23.1
131+
"""
132+
assert output == ref, show_diff(output, ref)
133+
134+
120135
def test_seq_distance():
121136
a, b = 'a b', 'a b'
122137
d = texterrors.seq_distance(StringVector(a.split()), StringVector(b.split()))

0 commit comments

Comments
 (0)