Skip to content

Conversation

@wirthual
Copy link
Collaborator

POC to see if bettertransformer can simply be used with transformers>4.49

Extracted code from optimum repo to here: https://github.com/wirthual/better_transformer

Had to update some dependencies to make it work.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This PR introduces a workaround to restore BetterTransformer functionality in the infinity_emb library by replacing the official optimum package dependency with a custom fork. The changes involve two main modifications:

  1. Import Source Change: The acceleration.py file now imports BetterTransformer and BetterTransformerManager from a custom better_transformer package instead of the official optimum.bettertransformer package. This change maintains the same API interface, so existing code using these classes continues to work without modification.

  2. Dependency Updates: The pyproject.toml file adds a new git dependency pointing to https://github.com/wirthual/better_transformer.git, which contains extracted code from the optimum repository. Additionally, the minimum transformers version is bumped from 4.47.0 to 4.49.0, and colpali-engine is updated to ^0.3.12.

This change addresses a compatibility issue where the official optimum package's BetterTransformer implementation likely has version checks that prevent it from working with transformers>=4.49.0. The acceleration functionality in infinity_emb relies on BetterTransformer to optimize model performance, particularly for faster inference. By using the custom fork, the codebase can support newer transformers versions while maintaining these critical performance optimizations.

The integration fits into the existing architecture seamlessly since the API remains identical - the check_if_bettertransformer_possible() and to_bettertransformer() functions continue to work with the same logic flow and error handling patterns.

Confidence score: 2/5

  • This PR introduces significant risks by depending on an unofficial, unvetted fork that could introduce security vulnerabilities or unexpected behavior
  • Score reflects concerns about using external git dependencies from unknown maintainers and potential breaking changes in the custom fork
  • Pay close attention to the new git dependency and consider security implications of the custom fork

2 files reviewed, 2 comments

Edit Code Review Bot Settings | Greptile

from infinity_emb._optional_imports import CHECK_OPTIMUM, CHECK_TORCH, CHECK_TRANSFORMERS
from infinity_emb.primitives import Device

if CHECK_OPTIMUM.is_available:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: CHECK_OPTIMUM.is_available may not correctly detect the new better_transformer package since it checks for 'optimum' module availability.

tensorrt = {version = "^10.6.0", optional=true}
soundfile = {version="^0.12.1", optional=true}

better-transformer = {git = "https://github.com/wirthual/better_transformer.git"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Git dependency from external fork creates supply chain risk and version unpredictability. Consider pinning to specific commit hash.

@michaelfeil
Copy link
Owner

nice! I. think it might break with newer versions of transformers, we should check.

@codecov-commenter
Copy link

codecov-commenter commented Sep 5, 2025

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 75.00000% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 79.51%. Comparing base (c030718) to head (30c14a7).

Files with missing lines Patch % Lines
...y_emb/infinity_emb/transformer/classifier/torch.py 0.00% 1 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #641      +/-   ##
==========================================
- Coverage   79.91%   79.51%   -0.41%     
==========================================
  Files          43       43              
  Lines        3495     3495              
==========================================
- Hits         2793     2779      -14     
- Misses        702      716      +14     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@michaelfeil
Copy link
Owner

Does this work?

@michaelfeil
Copy link
Owner

Also: is it compatible with never versions?

@wirthual
Copy link
Collaborator Author

wirthual commented Sep 6, 2025

Ran it with transformers > 4.49 on my local setup. Did not see the expected speedup. Need to investigate further.

@wirthual
Copy link
Collaborator Author

wirthual commented Sep 9, 2025

Running the benchmark of this branch using GPU on my local machine I got the following results:

336s with vs 529s without

transformers version: 4.53.3

Details

Without bettertransformer:

poetry run infinity_emb v2  --model-id "BAAI/bge-small-en-v1.5"  --port 7997 --log-level debug --no-model-warmup --device cuda --no-bettertransformer
make benchmark_embed
ab -n 50 -c 50 -l -s 480 \
-T 'application/json' \
-p tests/data/benchmark/benchmark_embed.json \
http://127.0.0.1:7997/embeddings
This is ApacheBench, Version 2.3 <$Revision: 1913912 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient).....done


Server Software:        uvicorn
Server Hostname:        127.0.0.1
Server Port:            7997

Document Path:          /embeddings
Document Length:        Variable

Concurrency Level:      50
Time taken for tests:   529.129 seconds
Complete requests:      50
Failed requests:        0
Total transferred:      103914000 bytes
Total body sent:        35982600
HTML transferred:       103907500 bytes
Requests per second:    0.09 [#/sec] (mean)
Time per request:       529128.703 [ms] (mean)
Time per request:       10582.574 [ms] (mean, across all concurrent requests)
Transfer rate:          191.78 [Kbytes/sec] received
                        66.41 kb/s sent
                        258.19 kb/s total

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    3   1.0      3       5
Processing: 10292 259725 154566.9 264434  518838
Waiting:    10285 259723 154567.2 264432  518836
Total:      10292 259728 154568.0 264437  518842

Percentage of the requests served within a certain time (ms)
  50%  264437
  66%  349699
  75%  392333
  80%  424398
  90%  477934
  95%  499389
  98%  518842
  99%  518842
 100%  518842 (longest request)

With bettertransformer

poetrylatest run infinity_emb v2  --model-id "BAAI/bge-small-en-v1.5"  --port 7997 --log-level debug --no-model-warmup --device cuda --bettertransformer
make benchmark_embed
ab -n 50 -c 50 -l -s 480 \
-T 'application/json' \
-p tests/data/benchmark/benchmark_embed.json \
http://127.0.0.1:7997/embeddings
This is ApacheBench, Version 2.3 <$Revision: 1913912 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient).....done


Server Software:        uvicorn
Server Hostname:        127.0.0.1
Server Port:            7997

Document Path:          /embeddings
Document Length:        Variable

Concurrency Level:      50
Time taken for tests:   335.988 seconds
Complete requests:      50
Failed requests:        0
Total transferred:      103942750 bytes
Total body sent:        35982600
HTML transferred:       103936250 bytes
Requests per second:    0.15 [#/sec] (mean)
Time per request:       335987.780 [ms] (mean)
Time per request:       6719.756 [ms] (mean, across all concurrent requests)
Transfer rate:          302.11 [Kbytes/sec] received
                        104.58 kb/s sent
                        406.70 kb/s total

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    4   1.6      3       6
Processing:  5406 156561 101251.8 156997  330329
Waiting:     5405 156559 101251.5 156996  330328
Total:       5407 156564 101253.4 157001  330335

Percentage of the requests served within a certain time (ms)
  50%  157001
  66%  214982
  75%  244064
  80%  265959
  90%  302483
  95%  317117
  98%  330335
  99%  330335
 100%  330335 (longest request)

Following package versions were used:

accelerate                        1.10.1            Accelerate
aiohappyeyeballs                  2.4.3             Happy Eyeballs for asyncio
aiohttp                           3.10.10           Async http client/server framework (asyncio)
aiosignal                         1.3.1             aiosignal: a list of registered asynchronous callbacks
annotated-types                   0.7.0             Reusable constraint types to use with typing.Annotated
anyio                             4.6.2.post1       High level compatibility layer for multiple asynchronous event loop implementations
asgi-lifespan                     2.1.0             Programmatic startup/shutdown of ASGI apps.
attrs                             24.2.0            Classes Without Boilerplate
babel                             2.16.0            Internationalization utilities
backoff                           2.2.1             Function decoration for backoff and retry
beautifulsoup4                    4.12.3            Screen-scraping library
better-transformer                0.1.0 aaa4a18     
black                             24.10.0           The uncompromising code formatter.
certifi                           2024.8.30         Python package for providing Mozilla's CA Bundle.
cffi                              1.17.1            Foreign Function Interface for Python calling C code.
charset-normalizer                3.4.0             The Real First Universal Charset Detector. Open, modern and actively maintained alternative to Chardet.
click                             8.1.7             Composable command line interface toolkit
codespell                         2.3.0             Codespell
colorama                          0.4.6             Cross-platform colored terminal text.
coloredlogs                       15.0.1            Colored terminal output for Python's logging module
colpali-engine                    0.3.12            The code used to train and run inference with the ColPali architecture.
coverage                          7.6.3             Code coverage measurement for Python
cryptography                      43.0.3            cryptography is a package which provides cryptographic recipes and primitives to Python developers.
ctranslate2                       4.4.0             Fast inference engine for Transformer models
datasets                          2.14.4            HuggingFace community-driven open-source library of datasets
dill                              0.3.7             serialize all of Python
diskcache                         5.6.3             Disk Cache -- Disk and file backed persistent cache.
distro                            1.9.0             Distro - an OS platform information API
einops                            0.8.0             A new flavour of deep learning operations
fastapi                           0.115.2           FastAPI framework, high performance, easy to learn, fast to code, ready for production
filelock                          3.16.1            A platform independent file lock.
flatbuffers                       24.3.25           The FlatBuffers serialization format for Python
frozenlist                        1.4.1             A list-like structure which implements collections.abc.MutableSequence
fsspec                            2024.9.0          File-system specification
ghp-import                        2.1.0             Copy your docs directly to the gh-pages branch.
h11                               0.14.0            A pure-Python, bring-your-own-I/O implementation of HTTP/1.1
hf-xet                            1.1.7             Fast transfer of large files with the Hugging Face Hub.
httpcore                          1.0.6             A minimal low-level HTTP client.
httptools                         0.6.4             A collection of framework independent HTTP protocol utils.
httpx                             0.27.2            The next generation HTTP client.
huggingface-hub                   0.34.4            Client library to download and publish models, datasets and other repos on the huggingface.co hub
humanfriendly                     10.0              Human friendly output for text interfaces using Python
idna                              3.10              Internationalized Domain Names in Applications (IDNA)
importlib-metadata                8.5.0             Read metadata from Python packages
importlib-resources               6.4.5             Read resources from Python packages
iniconfig                         2.0.0             brain-dead simple config-ini parsing
jinja2                            3.1.4             A very fast and expressive template engine.
jinja2-cli                        0.8.2             A CLI interface to Jinja2
jiter                             0.6.1             Fast iterable JSON parser.
joblib                            1.4.2             Lightweight pipelining with Python functions
markdown                          3.7               Python implementation of John Gruber's Markdown.
markdown-it-py                    3.0.0             Python port of markdown-it. Markdown parsing, done right!
markupsafe                        3.0.2             Safely add untrusted strings to HTML/XML markup.
mdurl                             0.1.2             Markdown URL utilities
mergedeep                         1.3.4             A deep merge function for 🐍.
mike                              2.1.3             Manage multiple versions of your MkDocs-powered documentation
mkdocs                            1.6.1             Project documentation with Markdown.
mkdocs-get-deps                   0.2.0             MkDocs extension that lists all dependencies according to a mkdocs.yml file
mkdocs-material                   9.5.41            Documentation that simply works
mkdocs-material-extensions        1.3.1             Extension pack for Python Markdown and MkDocs Material.
mkdocs-swagger-ui-tag             0.6.10            A MkDocs plugin supports for add Swagger UI in page.
monotonic                         1.6               An implementation of time.monotonic() for Python 2 & < 3.3
mpmath                            1.3.0             Python library for arbitrary-precision floating-point arithmetic
multidict                         6.1.0             multidict implementation
multiprocess                      0.70.15           better multiprocessing and multithreading in Python
mypy                              1.12.0            Optional static typing for Python
mypy-extensions                   1.0.0             Type system extensions for programs checked with the mypy type checker.
mypy-protobuf                     3.6.0             Generate mypy stub files from protobuf specs
networkx                          3.2.1             Python package for creating and manipulating graphs and networks
numpy                             1.26.4            Fundamental package for array computing in Python
nvidia-cublas-cu12                12.6.4.1          CUBLAS native runtime libraries
nvidia-cuda-cupti-cu12            12.6.80           CUDA profiling tools runtime libs.
nvidia-cuda-nvrtc-cu12            12.6.77           NVRTC native runtime libraries
nvidia-cuda-runtime-cu12          12.6.77           CUDA Runtime native Libraries
nvidia-cudnn-cu12                 9.5.1.17          cuDNN runtime libraries
nvidia-cufft-cu12                 11.3.0.4          CUFFT native runtime libraries
nvidia-cufile-cu12                1.11.1.6          cuFile GPUDirect libraries
nvidia-curand-cu12                10.3.7.77         CURAND native runtime libraries
nvidia-cusolver-cu12              11.7.1.2          CUDA solver native runtime libraries
nvidia-cusparse-cu12              12.5.4.2          CUSPARSE native runtime libraries
nvidia-cusparselt-cu12            0.6.3             NVIDIA cuSPARSELt
nvidia-nccl-cu12                  2.26.2            NVIDIA Collective Communication Library (NCCL) Runtime
nvidia-nvjitlink-cu12             12.6.85           Nvidia JIT LTO Library
nvidia-nvtx-cu12                  12.6.77           NVIDIA Tools Extension
onnx                              1.17.0            Open Neural Network Exchange
onnxruntime                       1.19.2            ONNX Runtime is a runtime accelerator for Machine Learning models
onnxruntime-gpu                   1.19.2            ONNX Runtime is a runtime accelerator for Machine Learning models
openai                            1.52.0            The official Python library for the openai API
optimum                           1.27.0            Optimum Library is an extension of the Hugging Face Transformers library, providing a framework to integrate third-party librar...
orjson                            3.10.7            Fast, correct Python JSON library supporting dataclasses, datetimes, and numpy
outcome                           1.3.0.post0       Capture the outcome of Python function calls.
packaging                         24.1              Core utilities for Python packages
paginate                          0.5.7             Divides large result sets into pages for easier browsing
pandas                            2.2.3             Powerful data structures for data analysis, time series, and statistics
pathspec                          0.12.1            Utility library for gitignore style pattern matching of file paths.
peft                              0.14.0            Parameter-Efficient Fine-Tuning (PEFT)
pillow                            10.4.0            Python Imaging Library (Fork)
platformdirs                      4.3.6             A small Python package for determining appropriate platform-specific dirs, e.g. a `user data dir`.
pluggy                            1.5.0             plugin and hook calling mechanisms for python
posthog                           3.7.0             Integrate PostHog into any python application.
prometheus-client                 0.21.0            Python client for the Prometheus monitoring system.
prometheus-fastapi-instrumentator 7.0.0             Instrument your FastAPI with Prometheus metrics.
propcache                         0.2.0             Accelerated property cache
protobuf                          5.28.2            
psutil                            6.1.0             Cross-platform lib for process and system monitoring in Python.
pyarrow                           17.0.0            Python library for Apache Arrow
pycparser                         2.22              C parser in Python
pydantic                          2.9.2             Data validation using Python type hints
pydantic-core                     2.23.4            Core functionality for Pydantic validation and serialization
pygments                          2.18.0            Pygments is a syntax highlighting package written in Python.
pymdown-extensions                10.11.2           Extension pack for Python Markdown.
pyparsing                         3.2.0             pyparsing module - Classes and methods to define and execute parsing grammars
pytest                            8.3.3             pytest: simple powerful testing with Python
pytest-mock                       3.14.0            Thin-wrapper around the mock package for easier use with pytest
python-dateutil                   2.9.0.post0       Extensions to the standard Python datetime module
python-dotenv                     1.0.1             Read key-value pairs from a .env file and set them as environment variables
pytz                              2024.2            World timezone definitions, modern and historical
pyyaml                            6.0.2             YAML parser and emitter for Python
pyyaml-env-tag                    0.1               A custom YAML tag for referencing environment variables in YAML files. 
regex                             2024.9.11         Alternative regular expression module, to replace re.
requests                          2.32.3            Python HTTP for Humans.
rich                              13.9.2            Render rich text, tables, progress bars, syntax highlighting, markdown and more to the terminal
ruff                              0.7.0             An extremely fast Python linter and code formatter, written in Rust.
safetensors                       0.4.5             
scikit-learn                      1.5.2             A set of python modules for machine learning and data mining
scipy                             1.13.1            Fundamental algorithms for scientific computing in Python
sentence-transformers             3.3.1             State-of-the-Art Text Embeddings
sentencepiece                     0.2.0             SentencePiece python wrapper
setuptools                        75.2.0            Easily download, build, install, upgrade, and uninstall Python packages
shellingham                       1.5.4             Tool to Detect Surrounding Shell
six                               1.16.0            Python 2 and 3 compatibility utilities
sniffio                           1.3.1             Sniff out which async library your code is running under
sortedcontainers                  2.4.0             Sorted Containers -- Sorted List, Sorted Dict, Sorted Set
soundfile                         0.12.1            An audio library based on libsndfile, CFFI and NumPy
soupsieve                         2.6               A modern CSS selector implementation for Beautiful Soup.
starlette                         0.40.0            The little ASGI library that shines.
sympy                             1.14.0            Computer algebra system (CAS) in Python
tensorrt                          10.6.0            TensorRT Metapackage
tensorrt-cu12                     10.6.0            A high performance deep learning inference library
threadpoolctl                     3.5.0             threadpoolctl
timm                              1.0.11            PyTorch Image Models
tokenizers                        0.21.1            
torch                             2.7.0             Tensors and Dynamic neural networks in Python with strong GPU acceleration
torchvision                       0.22.0            image and video datasets and models for torch deep learning
tqdm                              4.66.5            Fast, Extensible Progress Meter
transformers                      4.53.3            State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow
trio                              0.27.0            A friendly Python library for async concurrency and I/O
triton                            3.3.0             A language and compiler for custom Deep Learning operations
typer                             0.12.5            Typer, build great CLIs. Easy to code. Based on Python type hints.
types-cffi                        1.16.0.20240331   Typing stubs for cffi
types-chardet                     5.0.4.6           Typing stubs for chardet
types-protobuf                    5.28.0.20240924   Typing stubs for protobuf
types-pyopenssl                   24.1.0.20240722   Typing stubs for pyOpenSSL
types-pytz                        2023.4.0.20240130 Typing stubs for pytz
types-redis                       4.6.0.20241004    Typing stubs for redis
types-requests                    2.28.1            Typing stubs for requests
types-setuptools                  75.2.0.20241019   Typing stubs for setuptools
types-toml                        0.10.8.20240310   Typing stubs for toml
types-urllib3                     1.26.25.14        Typing stubs for urllib3
typing-extensions                 4.12.2            Backported and Experimental Type Hints for Python 3.8+
tzdata                            2024.2            Provider of IANA time zone data
urllib3                           2.2.3             HTTP library with thread-safe connection pooling, file post, and more.
uvicorn                           0.32.0            The lightning-fast ASGI server.
uvloop                            0.21.0            Fast implementation of asyncio event loop on top of libuv
verspec                           0.1.0             Flexible version handling
watchdog                          5.0.3             Filesystem events monitoring
watchfiles                        0.24.0            Simple, modern and high performance file watching and code reload in python.
websockets                        13.1              An implementation of the WebSocket Protocol (RFC 6455 & 7692)
xxhash                            3.5.0             Python binding for xxHash
yarl                              1.15.5            Yet another URL library
zipp                              3.20.2            Backport of pathlib-compatible object wrapper for zip files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants