-
Notifications
You must be signed in to change notification settings - Fork 177
[Feature] [Optimum] [Intel] [OpenVINO] Add OpenVINO backend support through Optimum-Intel #454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #454 +/- ##
==========================================
- Coverage 79.18% 78.20% -0.99%
==========================================
Files 41 41
Lines 3248 3308 +60
==========================================
+ Hits 2572 2587 +15
- Misses 676 721 +45 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Summary
This PR adds OpenVINO backend support through Optimum-Intel integration, enabling optimized inference on Intel hardware with BF16 precision.
- Added OpenVINO execution provider in
primitives.pyand integrated model loading/optimization inutils_optimum.pywith hardcoded BF16 precision - Implemented OpenVINO model file handling and caching in
utils_optimum.pywithget_openvino_files()function - Added
CHECK_OPTIMUM_INTELdependency check in_optional_imports.pyfor graceful fallback - Modified
Docker.template.yamland addedDockerfile.intel_autoto support OpenVINO builds - Updated
OptimumEmbedderclass to handle OpenVINO model loading while maintaining compatibility with existing ONNX runtime
7 file(s) reviewed, 13 comment(s)
Edit PR Review Bot Settings | Greptile
| # "RUN poetry install --no-interaction --no-ansi --no-root --extras \"${EXTRAS}\" --without lint,test && poetry cache clear pypi --all" | ||
| COPY requirements_install_from_poetry.sh requirements_install_from_poetry.sh | ||
| RUN ./requirements_install_from_poetry.sh --no-root --without lint,test "https://download.pytorch.org/whl/cpu" | ||
| RUN poetry run python -m pip install --upgrade --upgrade-strategy eager "optimum[openvino]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Consider pinning the optimum[openvino] version to ensure reproducible builds
| RUN ./requirements_install_from_poetry.sh --no-root --without lint,test "https://download.pytorch.org/whl/cpu" | ||
| RUN poetry run python -m pip install --upgrade --upgrade-strategy eager "optimum[openvino]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Installing optimum[openvino] after requirements_install_from_poetry.sh may override dependency versions. Consider integrating this into the requirements script
| RUN ./requirements_install_from_poetry.sh --no-root --without lint,test "https://download.pytorch.org/whl/cpu" | ||
| RUN poetry run python -m pip install --upgrade --upgrade-strategy eager "optimum[openvino]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Installing optimum[openvino] after poetry install could override poetry-managed dependencies. Consider adding optimum[openvino] to pyproject.toml instead.
| RUN ./requirements_install_from_poetry.sh --without lint,test "https://download.pytorch.org/whl/cpu" | ||
| RUN poetry run python -m pip install --upgrade --upgrade-strategy eager "optimum[openvino]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Redundant installation of optimum[openvino] - this package was already installed in the previous stage
| RUN ./requirements_install_from_poetry.sh --with lint,test "https://download.pytorch.org/whl/cpu" | ||
| RUN poetry run python -m pip install --upgrade --upgrade-strategy eager "optimum[openvino]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Third redundant installation of optimum[openvino] - consider consolidating into a single installation in base image
| except Exception as e: # show error then let the optimum intel compress on the fly | ||
| print(str(e)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: printing error to stdout could mask critical failures. Consider proper error logging or propagating the exception
|
|
||
| if provider == "OpenVINOExecutionProvider": | ||
| CHECK_OPTIMUM_INTEL.mark_required() | ||
| filename = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: empty filename could cause issues if get_openvino_files fails. Consider setting a default model path or handling this case explicitly
| use_auth_token=True, | ||
| prefer_quantized="cpu" in provider.lower(), | ||
| ) | ||
| elif provider == "CPUExecutionProvider": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: missing else clause for unsupported providers could lead to undefined model state
| if files_optimized: | ||
| file_optimized = files_optimized[-1] | ||
| if file_name: | ||
| file_optimized = file_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: file_name overrides files_optimized[-1] without checking if file_name exists, could cause errors if file_name is invalid
| openvino_files = [p for p in repo_files if p.match(pattern)] | ||
|
|
||
| if len(openvino_files) > 1: | ||
| logger.info(f"Found {len(openvino_files)} onnx files: {openvino_files}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
syntax: log message incorrectly refers to 'onnx files' when listing OpenVINO files
| except Exception as e: # show error then let the optimum intel compress on the fly | ||
| print(str(e)) | ||
|
|
||
| self.model = optimize_model( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@michaelfeil you can just load your model with OVModelForFeatureExtraction (if the model wasn't already exported to the OV IR then optimum will load the pytorch model and export it on-the-fly, if already converted it will just load the OV model) without calling optimize_model which seems to call ORTOptimizer (which should be used for onnx models only). OpenVINO conversion can also be done from an onnx model (https://github.com/huggingface/optimum-intel/blob/6dbc59eb80ba7eee9d347d03f3b737fc54b46e5d/optimum/intel/openvino/modeling_base.py#L347) but usually not recommended and this feature will likely be removed from optimum-intel in the future. Let us know if we can help on this integration!
| except Exception as e: # show error then let the optimum intel compress on the fly | ||
| print(str(e)) | ||
|
|
||
| self.model = optimize_model( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@michaelfeil you can just load your model with OVModelForFeatureExtraction (if the model wasn't already exported to the OV IR then optimum will load the pytorch model and export it on-the-fly, if already converted it will just load the OV model) without calling optimize_model which seems to call ORTOptimizer (which should be used for onnx models only). OpenVINO conversion can also be done from an onnx model (https://github.com/huggingface/optimum-intel/blob/6dbc59eb80ba7eee9d347d03f3b737fc54b46e5d/optimum/intel/openvino/modeling_base.py#L347) but usually not recommended and this feature will likely be removed from optimum-intel in the future. Let us know if we can help on this integration!
Description
This is a PR that integrates OpenVINO backend into Infinity's Optimum Embedder class through the use of optimum-intel library.
Related Issue
If applicable, link the issue this PR addresses.
Types of Change
Checklist
Additional Notes
There are multiple inferencing precisions that can be specified through in
libs/infinity_emb/infinity_emb/transformer/utils_optimum.pyThe
Inference precision hintis hardcoded tobf16because it offers the fastest inference speed.We have also performed MTEB evaluation test (bankclassification dataset) on the INT4 weight only quantized model with BF16 inference precision, the drop in accuracy is just
0.71%.Based on speed and accuracy tradeoff as well as the ease-of-use, we think that settling down on a single effective configuration could enhance the user experience of
infinity_emb.License
By submitting this PR, I confirm that my contribution is made under the terms of the MIT license.