Skip to content

Commit 6ba1ffb

Browse files
author
Yashwant Bezawada
committed
Fix model_input_names singleton issue causing shared state
Fixes huggingface#42024 The model_input_names attribute was defined as a class-level list, and when initializing tokenizer instances, they were all pointing to the same list object. This meant modifying model_input_names on one instance would affect all other instances. The issue was in tokenization_utils_base.py line 1417: ```python self.model_input_names = kwargs.pop("model_input_names", self.model_input_names) ``` When no model_input_names is passed in kwargs, it would use the class attribute directly (self.model_input_names), creating a reference to the shared list instead of creating a new list for the instance. Fixed by wrapping it in list() to ensure each instance gets its own copy: ```python self.model_input_names = list(kwargs.pop("model_input_names", self.model_input_names)) ``` This is a standard pattern for handling mutable default values in Python.
1 parent 8fb854c commit 6ba1ffb

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

src/transformers/tokenization_utils_base.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1414,7 +1414,7 @@ def __init__(self, **kwargs):
14141414
f"Truncation side should be selected between 'right' and 'left', current value: {self.truncation_side}"
14151415
)
14161416

1417-
self.model_input_names = kwargs.pop("model_input_names", self.model_input_names)
1417+
self.model_input_names = list(kwargs.pop("model_input_names", self.model_input_names))
14181418

14191419
# By default, cleaning tokenization spaces for both fast and slow tokenizers
14201420
self.clean_up_tokenization_spaces = kwargs.pop("clean_up_tokenization_spaces", False)

0 commit comments

Comments
 (0)