Description
Describe the bug
When performing lemmatization of certain Finnish expressions, PyTorch emits a UserWarning about the deprecated __floordiv__
operation. The lemmatization is still working. The UserWarning is only shown once per process/session.
This appears to be quite rare, only certain combinations of words will trigger this. But when processing a large file in Finnish, it will eventually be triggered. I've also done similar lemmatization for long documents in Swedish and English, but never saw this warning with those languages.
To Reproduce
This code will trigger the warning for me:
import stanza
nlp = stanza.Pipeline(lang='fi', processors='tokenize,mwt,pos,lemma')
doc = nlp("ettei se")
Output:
2022-01-14 13:39:50 INFO: Loading these models for language: fi (Finnish):
=======================
| Processor | Package |
-----------------------
| tokenize | tdt |
| mwt | tdt |
| pos | tdt |
| lemma | tdt |
=======================
2022-01-14 13:39:50 INFO: Use device: cpu
2022-01-14 13:39:50 INFO: Loading: tokenize
2022-01-14 13:39:50 INFO: Loading: mwt
2022-01-14 13:39:50 INFO: Loading: pos
2022-01-14 13:39:51 INFO: Loading: lemma
2022-01-14 13:39:51 INFO: Done loading processors!
[REDACTED]/lib/python3.8/site-packages/stanza/models/common/beam.py:86: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
prevK = bestScoresId // numWords
Expected behavior
Expected no UserWarning.
Environment (please complete the following information):
- OS: Ubuntu 20.04
- Python version: 3.8.10 from Ubuntu system package 3.8.10-0ubuntu1~20.04.2
- Stanza version: 1.3.0 (installed from PyPI in a virtual environment)
- PyTorch version: 1.10.1 (installed from PyPI in a virtual environment)
Additional context
According to the warning message, the problem seems to be this line:
stanza/stanza/models/common/beam.py
Line 86 in e44d1c8
Here is a PR fixing the same warning in another codebase: NVIDIA/MinkowskiEngine#407