Skip to content

Conversation

@pctablet505
Copy link
Collaborator

This pull request updates the OpenVINO export tests in keras/src/export/openvino_test.py to use explicit tolerances when comparing model outputs. This ensures that small numerical differences between reference and OpenVINO outputs are tolerated, making the tests more robust to minor floating-point discrepancies.

Testing improvements:

  • Updated all self.assertAllClose calls in the test cases to include atol=1e-3 and rtol=1e-3 for output comparisons, allowing for small absolute and relative differences between reference and OpenVINO outputs. [1] [2] [3]

pctablet505 and others added 30 commits May 6, 2025 10:25
Corrected indentation in doc string
Fixed issue with passing a single image without batch dimension.
Test case for unbatched inputs
Testcase for checking both unbatched and batched single image inputs.
There was a bug, and it was causing cycle in graph.
removed the use of tree.map_structure
Enhanced the _can_use_flash_attention function to provide more detailed
error messages when flash attention compatibility checks fail.

Changes:
- Replace generic exception catching with specific error propagation
- When raise_error=True, directly re-raise original exceptions from
  check_layout() and check_is_flash_attention() functions
- Preserve detailed error context from JAX internal validation functions
- Maintain existing behavior when raise_error=False (returns False)

This improves debugging experience by surfacing specific technical details
about tensor layout incompatibilities, cuDNN version requirements, and
other flash attention compatibility issues.

Relates to keras-hub PR keras-team#2257 and addresses flash attention debugging needs.
…sh_attention`

Changes:
- Add missing q_offsets=None and kv_offsets=None parameters to check_layout()
  call to match updated JAX function signature
- Replace bare `except:` with `except Exception as e:` and `raise e` to
  preserve detailed error messages from JAX validation functions
- Maintain existing fallback behavior when raise_error=False

This resolves compatibility issues with newer JAX versions and improves
debugging experience by surfacing specific technical details about
flash attention compatibility failures.
Simplified the check for `flasth_attention` by removing redundant checks that are already done in `_can_use_flash_attention`.
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @pctablet505, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the stability and reliability of the OpenVINO export tests by introducing specific tolerance thresholds for numerical comparisons. By allowing for small, expected floating-point differences, the tests become more robust and less prone to spurious failures, ensuring that only significant deviations in model output are flagged.

Highlights

  • Test Robustness: Updated OpenVINO export tests to include explicit absolute and relative tolerances (atol=1e-3, rtol=1e-3) in self.assertAllClose calls.
  • Numerical Stability: The change addresses minor floating-point discrepancies between reference model outputs and OpenVINO outputs, preventing test failures due to insignificant numerical variations.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly addresses the need for explicit tolerances in the OpenVINO export tests by adding atol and rtol to assertAllClose calls. This makes the tests more robust against minor floating-point discrepancies. I've added a few suggestions to define these tolerance values as constants to improve code maintainability by avoiding magic numbers. Overall, this is a good improvement.

ov_output = compiled_model([ref_input])[compiled_model.output(0)]

self.assertAllClose(ref_output, ov_output)
self.assertAllClose(ref_output, ov_output, atol=1e-3, rtol=1e-3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To improve maintainability and avoid magic numbers, consider defining the tolerance values 1e-3 as class-level constants. This makes it easier to understand their purpose and update them in one place if needed.

For example, you could add these attributes to the ExportOpenVINOTest class:

class ExportOpenVINOTest(testing.TestCase):
    OV_ATOL = 1e-3
    OV_RTOL = 1e-3

    # ... tests ...

Then, you can use them in your assertions like this:
self.assertAllClose(ref_output, ov_output, atol=self.OV_ATOL, rtol=self.OV_RTOL)


ov_output = compiled_model(ov_inputs)[compiled_model.output(0)]
self.assertAllClose(ref_output, ov_output)
self.assertAllClose(ref_output, ov_output, atol=1e-3, rtol=1e-3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to my other comment, it would be better to use class-level constants for the tolerance values to improve maintainability and avoid repetition of magic numbers. For example: self.assertAllClose(ref_output, ov_output, atol=self.OV_ATOL, rtol=self.OV_RTOL).

compiled_model.output(0)
]
self.assertAllClose(ref_output, ov_output)
self.assertAllClose(ref_output, ov_output, atol=1e-3, rtol=1e-3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to my other comments, it would be better to use class-level constants for the tolerance values to improve maintainability and avoid repetition of magic numbers. For example: self.assertAllClose(ref_output, ov_output, atol=self.OV_ATOL, rtol=self.OV_RTOL).

@codecov-commenter
Copy link

codecov-commenter commented Dec 10, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.30%. Comparing base (46813a3) to head (5687248).

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #21911   +/-   ##
=======================================
  Coverage   76.30%   76.30%           
=======================================
  Files         580      580           
  Lines       60031    60031           
  Branches     9433     9433           
=======================================
  Hits        45805    45805           
  Misses      11750    11750           
  Partials     2476     2476           
Flag Coverage Δ
keras 76.17% <ø> (ø)
keras-jax 62.12% <ø> (ø)
keras-numpy 57.32% <ø> (+<0.01%) ⬆️
keras-openvino 34.30% <ø> (ø)
keras-torch 63.22% <ø> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@hertschuh
Copy link
Collaborator

The tests are not failing at HEAD, so these are not needed.

Maybe you're seeing this on your local machine only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants