-
Notifications
You must be signed in to change notification settings - Fork 177
WIP: add back better_transformer without version check #641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Summary
This PR introduces a workaround to restore BetterTransformer functionality in the infinity_emb library by replacing the official optimum package dependency with a custom fork. The changes involve two main modifications:
-
Import Source Change: The
acceleration.pyfile now importsBetterTransformerandBetterTransformerManagerfrom a custombetter_transformerpackage instead of the officialoptimum.bettertransformerpackage. This change maintains the same API interface, so existing code using these classes continues to work without modification. -
Dependency Updates: The
pyproject.tomlfile adds a new git dependency pointing tohttps://github.com/wirthual/better_transformer.git, which contains extracted code from the optimum repository. Additionally, the minimum transformers version is bumped from 4.47.0 to 4.49.0, and colpali-engine is updated to ^0.3.12.
This change addresses a compatibility issue where the official optimum package's BetterTransformer implementation likely has version checks that prevent it from working with transformers>=4.49.0. The acceleration functionality in infinity_emb relies on BetterTransformer to optimize model performance, particularly for faster inference. By using the custom fork, the codebase can support newer transformers versions while maintaining these critical performance optimizations.
The integration fits into the existing architecture seamlessly since the API remains identical - the check_if_bettertransformer_possible() and to_bettertransformer() functions continue to work with the same logic flow and error handling patterns.
Confidence score: 2/5
- This PR introduces significant risks by depending on an unofficial, unvetted fork that could introduce security vulnerabilities or unexpected behavior
- Score reflects concerns about using external git dependencies from unknown maintainers and potential breaking changes in the custom fork
- Pay close attention to the new git dependency and consider security implications of the custom fork
2 files reviewed, 2 comments
| from infinity_emb._optional_imports import CHECK_OPTIMUM, CHECK_TORCH, CHECK_TRANSFORMERS | ||
| from infinity_emb.primitives import Device | ||
|
|
||
| if CHECK_OPTIMUM.is_available: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: CHECK_OPTIMUM.is_available may not correctly detect the new better_transformer package since it checks for 'optimum' module availability.
libs/infinity_emb/pyproject.toml
Outdated
| tensorrt = {version = "^10.6.0", optional=true} | ||
| soundfile = {version="^0.12.1", optional=true} | ||
|
|
||
| better-transformer = {git = "https://github.com/wirthual/better_transformer.git"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Git dependency from external fork creates supply chain risk and version unpredictability. Consider pinning to specific commit hash.
|
nice! I. think it might break with newer versions of transformers, we should check. |
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #641 +/- ##
==========================================
- Coverage 79.91% 79.51% -0.41%
==========================================
Files 43 43
Lines 3495 3495
==========================================
- Hits 2793 2779 -14
- Misses 702 716 +14 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Does this work? |
|
Also: is it compatible with never versions? |
|
Ran it with |
|
Running the benchmark of this branch using GPU on my local machine I got the following results:
transformers version: DetailsWithout bettertransformer:make benchmark_embed
ab -n 50 -c 50 -l -s 480 \
-T 'application/json' \
-p tests/data/benchmark/benchmark_embed.json \
http://127.0.0.1:7997/embeddings
This is ApacheBench, Version 2.3 <$Revision: 1913912 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 127.0.0.1 (be patient).....done
Server Software: uvicorn
Server Hostname: 127.0.0.1
Server Port: 7997
Document Path: /embeddings
Document Length: Variable
Concurrency Level: 50
Time taken for tests: 529.129 seconds
Complete requests: 50
Failed requests: 0
Total transferred: 103914000 bytes
Total body sent: 35982600
HTML transferred: 103907500 bytes
Requests per second: 0.09 [#/sec] (mean)
Time per request: 529128.703 [ms] (mean)
Time per request: 10582.574 [ms] (mean, across all concurrent requests)
Transfer rate: 191.78 [Kbytes/sec] received
66.41 kb/s sent
258.19 kb/s total
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 3 1.0 3 5
Processing: 10292 259725 154566.9 264434 518838
Waiting: 10285 259723 154567.2 264432 518836
Total: 10292 259728 154568.0 264437 518842
Percentage of the requests served within a certain time (ms)
50% 264437
66% 349699
75% 392333
80% 424398
90% 477934
95% 499389
98% 518842
99% 518842
100% 518842 (longest request)
With bettertransformermake benchmark_embed
ab -n 50 -c 50 -l -s 480 \
-T 'application/json' \
-p tests/data/benchmark/benchmark_embed.json \
http://127.0.0.1:7997/embeddings
This is ApacheBench, Version 2.3 <$Revision: 1913912 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 127.0.0.1 (be patient).....done
Server Software: uvicorn
Server Hostname: 127.0.0.1
Server Port: 7997
Document Path: /embeddings
Document Length: Variable
Concurrency Level: 50
Time taken for tests: 335.988 seconds
Complete requests: 50
Failed requests: 0
Total transferred: 103942750 bytes
Total body sent: 35982600
HTML transferred: 103936250 bytes
Requests per second: 0.15 [#/sec] (mean)
Time per request: 335987.780 [ms] (mean)
Time per request: 6719.756 [ms] (mean, across all concurrent requests)
Transfer rate: 302.11 [Kbytes/sec] received
104.58 kb/s sent
406.70 kb/s total
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 4 1.6 3 6
Processing: 5406 156561 101251.8 156997 330329
Waiting: 5405 156559 101251.5 156996 330328
Total: 5407 156564 101253.4 157001 330335
Percentage of the requests served within a certain time (ms)
50% 157001
66% 214982
75% 244064
80% 265959
90% 302483
95% 317117
98% 330335
99% 330335
100% 330335 (longest request)Following package versions were used: accelerate 1.10.1 Accelerate
aiohappyeyeballs 2.4.3 Happy Eyeballs for asyncio
aiohttp 3.10.10 Async http client/server framework (asyncio)
aiosignal 1.3.1 aiosignal: a list of registered asynchronous callbacks
annotated-types 0.7.0 Reusable constraint types to use with typing.Annotated
anyio 4.6.2.post1 High level compatibility layer for multiple asynchronous event loop implementations
asgi-lifespan 2.1.0 Programmatic startup/shutdown of ASGI apps.
attrs 24.2.0 Classes Without Boilerplate
babel 2.16.0 Internationalization utilities
backoff 2.2.1 Function decoration for backoff and retry
beautifulsoup4 4.12.3 Screen-scraping library
better-transformer 0.1.0 aaa4a18
black 24.10.0 The uncompromising code formatter.
certifi 2024.8.30 Python package for providing Mozilla's CA Bundle.
cffi 1.17.1 Foreign Function Interface for Python calling C code.
charset-normalizer 3.4.0 The Real First Universal Charset Detector. Open, modern and actively maintained alternative to Chardet.
click 8.1.7 Composable command line interface toolkit
codespell 2.3.0 Codespell
colorama 0.4.6 Cross-platform colored terminal text.
coloredlogs 15.0.1 Colored terminal output for Python's logging module
colpali-engine 0.3.12 The code used to train and run inference with the ColPali architecture.
coverage 7.6.3 Code coverage measurement for Python
cryptography 43.0.3 cryptography is a package which provides cryptographic recipes and primitives to Python developers.
ctranslate2 4.4.0 Fast inference engine for Transformer models
datasets 2.14.4 HuggingFace community-driven open-source library of datasets
dill 0.3.7 serialize all of Python
diskcache 5.6.3 Disk Cache -- Disk and file backed persistent cache.
distro 1.9.0 Distro - an OS platform information API
einops 0.8.0 A new flavour of deep learning operations
fastapi 0.115.2 FastAPI framework, high performance, easy to learn, fast to code, ready for production
filelock 3.16.1 A platform independent file lock.
flatbuffers 24.3.25 The FlatBuffers serialization format for Python
frozenlist 1.4.1 A list-like structure which implements collections.abc.MutableSequence
fsspec 2024.9.0 File-system specification
ghp-import 2.1.0 Copy your docs directly to the gh-pages branch.
h11 0.14.0 A pure-Python, bring-your-own-I/O implementation of HTTP/1.1
hf-xet 1.1.7 Fast transfer of large files with the Hugging Face Hub.
httpcore 1.0.6 A minimal low-level HTTP client.
httptools 0.6.4 A collection of framework independent HTTP protocol utils.
httpx 0.27.2 The next generation HTTP client.
huggingface-hub 0.34.4 Client library to download and publish models, datasets and other repos on the huggingface.co hub
humanfriendly 10.0 Human friendly output for text interfaces using Python
idna 3.10 Internationalized Domain Names in Applications (IDNA)
importlib-metadata 8.5.0 Read metadata from Python packages
importlib-resources 6.4.5 Read resources from Python packages
iniconfig 2.0.0 brain-dead simple config-ini parsing
jinja2 3.1.4 A very fast and expressive template engine.
jinja2-cli 0.8.2 A CLI interface to Jinja2
jiter 0.6.1 Fast iterable JSON parser.
joblib 1.4.2 Lightweight pipelining with Python functions
markdown 3.7 Python implementation of John Gruber's Markdown.
markdown-it-py 3.0.0 Python port of markdown-it. Markdown parsing, done right!
markupsafe 3.0.2 Safely add untrusted strings to HTML/XML markup.
mdurl 0.1.2 Markdown URL utilities
mergedeep 1.3.4 A deep merge function for 🐍.
mike 2.1.3 Manage multiple versions of your MkDocs-powered documentation
mkdocs 1.6.1 Project documentation with Markdown.
mkdocs-get-deps 0.2.0 MkDocs extension that lists all dependencies according to a mkdocs.yml file
mkdocs-material 9.5.41 Documentation that simply works
mkdocs-material-extensions 1.3.1 Extension pack for Python Markdown and MkDocs Material.
mkdocs-swagger-ui-tag 0.6.10 A MkDocs plugin supports for add Swagger UI in page.
monotonic 1.6 An implementation of time.monotonic() for Python 2 & < 3.3
mpmath 1.3.0 Python library for arbitrary-precision floating-point arithmetic
multidict 6.1.0 multidict implementation
multiprocess 0.70.15 better multiprocessing and multithreading in Python
mypy 1.12.0 Optional static typing for Python
mypy-extensions 1.0.0 Type system extensions for programs checked with the mypy type checker.
mypy-protobuf 3.6.0 Generate mypy stub files from protobuf specs
networkx 3.2.1 Python package for creating and manipulating graphs and networks
numpy 1.26.4 Fundamental package for array computing in Python
nvidia-cublas-cu12 12.6.4.1 CUBLAS native runtime libraries
nvidia-cuda-cupti-cu12 12.6.80 CUDA profiling tools runtime libs.
nvidia-cuda-nvrtc-cu12 12.6.77 NVRTC native runtime libraries
nvidia-cuda-runtime-cu12 12.6.77 CUDA Runtime native Libraries
nvidia-cudnn-cu12 9.5.1.17 cuDNN runtime libraries
nvidia-cufft-cu12 11.3.0.4 CUFFT native runtime libraries
nvidia-cufile-cu12 1.11.1.6 cuFile GPUDirect libraries
nvidia-curand-cu12 10.3.7.77 CURAND native runtime libraries
nvidia-cusolver-cu12 11.7.1.2 CUDA solver native runtime libraries
nvidia-cusparse-cu12 12.5.4.2 CUSPARSE native runtime libraries
nvidia-cusparselt-cu12 0.6.3 NVIDIA cuSPARSELt
nvidia-nccl-cu12 2.26.2 NVIDIA Collective Communication Library (NCCL) Runtime
nvidia-nvjitlink-cu12 12.6.85 Nvidia JIT LTO Library
nvidia-nvtx-cu12 12.6.77 NVIDIA Tools Extension
onnx 1.17.0 Open Neural Network Exchange
onnxruntime 1.19.2 ONNX Runtime is a runtime accelerator for Machine Learning models
onnxruntime-gpu 1.19.2 ONNX Runtime is a runtime accelerator for Machine Learning models
openai 1.52.0 The official Python library for the openai API
optimum 1.27.0 Optimum Library is an extension of the Hugging Face Transformers library, providing a framework to integrate third-party librar...
orjson 3.10.7 Fast, correct Python JSON library supporting dataclasses, datetimes, and numpy
outcome 1.3.0.post0 Capture the outcome of Python function calls.
packaging 24.1 Core utilities for Python packages
paginate 0.5.7 Divides large result sets into pages for easier browsing
pandas 2.2.3 Powerful data structures for data analysis, time series, and statistics
pathspec 0.12.1 Utility library for gitignore style pattern matching of file paths.
peft 0.14.0 Parameter-Efficient Fine-Tuning (PEFT)
pillow 10.4.0 Python Imaging Library (Fork)
platformdirs 4.3.6 A small Python package for determining appropriate platform-specific dirs, e.g. a `user data dir`.
pluggy 1.5.0 plugin and hook calling mechanisms for python
posthog 3.7.0 Integrate PostHog into any python application.
prometheus-client 0.21.0 Python client for the Prometheus monitoring system.
prometheus-fastapi-instrumentator 7.0.0 Instrument your FastAPI with Prometheus metrics.
propcache 0.2.0 Accelerated property cache
protobuf 5.28.2
psutil 6.1.0 Cross-platform lib for process and system monitoring in Python.
pyarrow 17.0.0 Python library for Apache Arrow
pycparser 2.22 C parser in Python
pydantic 2.9.2 Data validation using Python type hints
pydantic-core 2.23.4 Core functionality for Pydantic validation and serialization
pygments 2.18.0 Pygments is a syntax highlighting package written in Python.
pymdown-extensions 10.11.2 Extension pack for Python Markdown.
pyparsing 3.2.0 pyparsing module - Classes and methods to define and execute parsing grammars
pytest 8.3.3 pytest: simple powerful testing with Python
pytest-mock 3.14.0 Thin-wrapper around the mock package for easier use with pytest
python-dateutil 2.9.0.post0 Extensions to the standard Python datetime module
python-dotenv 1.0.1 Read key-value pairs from a .env file and set them as environment variables
pytz 2024.2 World timezone definitions, modern and historical
pyyaml 6.0.2 YAML parser and emitter for Python
pyyaml-env-tag 0.1 A custom YAML tag for referencing environment variables in YAML files.
regex 2024.9.11 Alternative regular expression module, to replace re.
requests 2.32.3 Python HTTP for Humans.
rich 13.9.2 Render rich text, tables, progress bars, syntax highlighting, markdown and more to the terminal
ruff 0.7.0 An extremely fast Python linter and code formatter, written in Rust.
safetensors 0.4.5
scikit-learn 1.5.2 A set of python modules for machine learning and data mining
scipy 1.13.1 Fundamental algorithms for scientific computing in Python
sentence-transformers 3.3.1 State-of-the-Art Text Embeddings
sentencepiece 0.2.0 SentencePiece python wrapper
setuptools 75.2.0 Easily download, build, install, upgrade, and uninstall Python packages
shellingham 1.5.4 Tool to Detect Surrounding Shell
six 1.16.0 Python 2 and 3 compatibility utilities
sniffio 1.3.1 Sniff out which async library your code is running under
sortedcontainers 2.4.0 Sorted Containers -- Sorted List, Sorted Dict, Sorted Set
soundfile 0.12.1 An audio library based on libsndfile, CFFI and NumPy
soupsieve 2.6 A modern CSS selector implementation for Beautiful Soup.
starlette 0.40.0 The little ASGI library that shines.
sympy 1.14.0 Computer algebra system (CAS) in Python
tensorrt 10.6.0 TensorRT Metapackage
tensorrt-cu12 10.6.0 A high performance deep learning inference library
threadpoolctl 3.5.0 threadpoolctl
timm 1.0.11 PyTorch Image Models
tokenizers 0.21.1
torch 2.7.0 Tensors and Dynamic neural networks in Python with strong GPU acceleration
torchvision 0.22.0 image and video datasets and models for torch deep learning
tqdm 4.66.5 Fast, Extensible Progress Meter
transformers 4.53.3 State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow
trio 0.27.0 A friendly Python library for async concurrency and I/O
triton 3.3.0 A language and compiler for custom Deep Learning operations
typer 0.12.5 Typer, build great CLIs. Easy to code. Based on Python type hints.
types-cffi 1.16.0.20240331 Typing stubs for cffi
types-chardet 5.0.4.6 Typing stubs for chardet
types-protobuf 5.28.0.20240924 Typing stubs for protobuf
types-pyopenssl 24.1.0.20240722 Typing stubs for pyOpenSSL
types-pytz 2023.4.0.20240130 Typing stubs for pytz
types-redis 4.6.0.20241004 Typing stubs for redis
types-requests 2.28.1 Typing stubs for requests
types-setuptools 75.2.0.20241019 Typing stubs for setuptools
types-toml 0.10.8.20240310 Typing stubs for toml
types-urllib3 1.26.25.14 Typing stubs for urllib3
typing-extensions 4.12.2 Backported and Experimental Type Hints for Python 3.8+
tzdata 2024.2 Provider of IANA time zone data
urllib3 2.2.3 HTTP library with thread-safe connection pooling, file post, and more.
uvicorn 0.32.0 The lightning-fast ASGI server.
uvloop 0.21.0 Fast implementation of asyncio event loop on top of libuv
verspec 0.1.0 Flexible version handling
watchdog 5.0.3 Filesystem events monitoring
watchfiles 0.24.0 Simple, modern and high performance file watching and code reload in python.
websockets 13.1 An implementation of the WebSocket Protocol (RFC 6455 & 7692)
xxhash 3.5.0 Python binding for xxHash
yarl 1.15.5 Yet another URL library
zipp 3.20.2 Backport of pathlib-compatible object wrapper for zip files |
POC to see if bettertransformer can simply be used with transformers>4.49
Extracted code from optimum repo to here: https://github.com/wirthual/better_transformer
Had to update some dependencies to make it work.