Skip to content

3.1 improve schema test coverage #4781

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 22 commits into from

Conversation

ralfhandl
Copy link
Contributor

@ralfhandl ralfhandl commented Jul 15, 2025

This PR is a cross-port of

bringing schema test coverage to 100% with the new tool https://github.com/hyperjump-io/json-schema-coverage.

It includes the changes in

which can't be merged due to the insufficient schema test coverage.

  • schema changes are included in this pull request
  • schema changes are needed for this pull request but not done yet
  • no schema changes are needed for this pull request

dependabot bot and others added 20 commits July 9, 2025 07:45
Bumps [@hyperjump/json-schema](https://github.com/hyperjump-io/json-schema) from 1.16.0 to 1.16.1.
- [Commits](hyperjump-io/json-schema@v1.16.0...v1.16.1)

---
updated-dependencies:
- dependency-name: "@hyperjump/json-schema"
  dependency-version: 1.16.1
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
…p/json-schema-1.16.1

Bump @hyperjump/json-schema from 1.16.0 to 1.16.1
More invalid security scheme objects
@ralfhandl ralfhandl marked this pull request as ready for review July 16, 2025 11:57
@ralfhandl ralfhandl requested review from a team as code owners July 16, 2025 11:57
@ralfhandl ralfhandl added approved pr port PRs that just port an approved PR to another version Schema labels Jul 16, 2025
@ralfhandl ralfhandl changed the title 3.1 improve schema coverage 3.1 improve schema test coverage Jul 16, 2025
@ralfhandl ralfhandl marked this pull request as draft July 19, 2025 20:37
@ralfhandl ralfhandl marked this pull request as ready for review July 23, 2025 10:22
@ralfhandl ralfhandl requested review from handrews and a team July 23, 2025 10:23
Copy link
Member

@handrews handrews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, but the more I look at this the more I am skeptical that this is the direction we want, and I am extremely skeptical of trying to cram it in right at the end of a release cycle.

I think expanding negative cases needs a lot more discussion about costs and benefits, and it needs to be separated from 3.1.2 and 3.2. What we have now is far better than what we had before, and I am comfortable releasing 3.2 and 3.1.2 with the existing approach. I am just not comfortable with the cost-benefit tradeoff of this approach in this effort right now.

Comment on lines +86 to +87
allowEmptyValue: yes # must be a boolean
allowReserved: no # must be a boolean
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these supposed to be testing that they are not booleans? because yes and no are booleans in yaml:

Python 3.10.10 (main, Feb 16 2023, 23:23:55) [Clang 14.0.0 (clang-1400.0.29.202)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from yaml import safe_load as sl
>>> sl("""
... foo: yes
... bar: no
... """)
{'foo': True, 'bar': False}
>>> 

Is the coverage tool showing this testing the negative case for these fields?

Comment on lines +99 to +100
examples: true # must be an object
explode: 42 # must be a boolean
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm picking an arbitrary line to make this comment: I'm just still not sure what we're trying to accomplish here. This does not, in fact, test that these fields must be an object and a boolean, respectively. It tests that examples cannot be a boolean, and explode cannot be a number/integer.

What are we gaining here? We already test that an object is allowed here. If we want to test that nothing else is allowed, we'd need to do a lot more work. If we are just testing some random selection of things are not allowed... I question whether the benefit is worth the maintenance cost.

I have added test cases to the negative tests in the past, but I tested things like complex dependencies among fields. AFAICT, there is some measurement of coverage here and we're just throwing cases at the wall based on whatever that metric is and not based on what is actually cost-effective for our needs.

@ralfhandl
Copy link
Contributor Author

ralfhandl commented Aug 7, 2025

I am comfortable releasing 3.2 and 3.1.2 with the existing approach

The existing approach has two drawbacks:

  1. It is custom-built for us and based on an internal API of the schema validation tool we use, and has no guarantee to work with future versions of the schema validation tool.

    • It already broke once in the past due to an incompatible change of the internal API, and we did not notice that for weeks.
    • There is now an official schema test coverage tool that we definitely should use instead of the brittle custom approach.
  2. We do not know whether a "fail" test case fails for the intended reason or for some other reason.

    • The v3.2-dev branch currently has two test cases fail/encoding-enc-*-exclusion.yaml that fail because their structure is wrong. The deeply nested error of using mutually exclusive keywords side-by-side that is supposed to cause failure is never reached.
    • I only noticed because the "negative" case of "do not use both keywords next to each other" was reported as not covered by the new coverage tool, which our custom approach completely missed.

@ralfhandl
Copy link
Contributor Author

Which way forward should we take:

  1. Stick with the brittle custom tool and only fix the two "fail" test cases to make them fail for the right reason.

  2. Switch to the new, supported tool and set a coverage limit well below 100% that is reached with the current set of tests.

  3. Switch to the new tool and bring coverage back up to 100% by adding more "fail" test cases that fail all of the keywords used in our schema files.

This 3.1 PR and its 3.2 "original" go the 3rd way.

And we can use them as a starting point for refactoring the test cases in any future direction while staying at 100%, which allows refactoring in baby steps without losing confidence into our tests.

@mikekistler
Copy link
Contributor

mikekistler commented Aug 7, 2025

My preference is to take option 2 now, and work towards option 3 over time but not making option 3 a requirement for releasing v3.2.

@handrews
Copy link
Member

handrews commented Aug 7, 2025

@ralfhandl

There is now an official schema test coverage tool that we definitely should use instead of the brittle custom approach.

No one is disputing this. It is only the question of what tests to add now.

I only noticed because the "negative" case of "do not use both keywords next to each other" was reported as not covered by the new coverage tool, which our custom approach completely missed.

And that is a useful thing. But I am not convinced that what this PR is doing in terms of test additions is as useful, nor am I convinced that the coverage tool is working as you seem to expect. e.g. this is not covering the condition it says it is covering and testing 1/6th-1/5th of the incorrect type conditions (depending on how you treat integer) is of questionable cost/benefit value to me.

I strongly dislike placating whatever a coverage tool says. Choosing test cases is about cost/benefit tradeoffs, and a coverage tool does not always understand those. In the case negative type checks, "100%" is not actually 100%. It's at best 20%. And I do not want to make policies about "100% coverage" when our 100% number is a lie. Honestly, I have concerns about some of our positive case coverage metrics as well, but that is a more complex discussion and I was fine to let it be.

As @mikekistler says:

My preference is to take option 2 now, and work towards option 3 over time but not making option 3 a requirement for releasing v3.2.

I don't have the energy to figure out what is going on with the cost-benefit issues here or why the coverage claims don't seem to line up with what I'm seeing. And I do not want to delay 3.2 to figure this out. We can ship with the old system (it worked well enough) and then change it, or we can change the system but not start a mandatory totally new test approach. I'm fine with either of those two, but not with requiring a new "100%" in a way that does not seem, to me, to align with actual thought-out test priorities.

@jdesrosiers
Copy link
Contributor

I'll point out that you have some flexibility in how you enforce coverage requirements. There are three categories that can all be configured separately. There's "statements" (keywords), "functions" (subschemas), and "branches" (true/false result of keywords).

The current solution only checks that all keywords ("statements") have been visited. So, if you set only "statements" to 100%, you should get the same enforcement you have now, but you also have the benefit of a maintained tool and nice coverage reports. I believe you could make that change immediately with no disruption or additional tests.

I would suggest also setting "functions" to 100% as that ensures that things like unevaluatedProperties: false are covered, which is missed in the current solution. I think that could be added without controversy.

If "branch" coverage is controversial, you don't have to enforce it at all. It will still be reported and can be used as feedback to point to the gap and you can make a decision whether or not it's worth writing a test for, but it won't fail automated checks.

@ralfhandl ralfhandl closed this Aug 7, 2025
@ralfhandl ralfhandl deleted the 3.1-schema-coverage branch August 7, 2025 22:03
@ralfhandl
Copy link
Contributor Author

I like @jdesrosiers's proposal.

Unfortunately we are not at 100% statement coverage with the current test cases, see https://github.com/OAI/OpenAPI-Specification/actions/runs/16374451434/job/46270758901?pr=4787.

Who would volunteer adding test cases for reaching 100% statement coverage?

@handrews
Copy link
Member

handrews commented Aug 7, 2025

Thanks @jdesrosiers, that is helpful! I think it's more that I want to understand the branch coverage more deeply rather than being dead-set against it. So "controversial for now, but very much worth revisiting" is probably where I am with it.

@ralfhandl we should continue this discussion somewhere (in a Discussion, perhaps?) and make a collective decision on what we're measuring and what that measurement needs to be. We are doing so much better with the schema than ever before, and I prefer to focus on that, and whether it is sufficient in its own right, rather than on what numbers we aren't hitting. The list of things I think we should be checking before putting out 3.2 (or a patch release) is very long and we aren't going to do most of them. But we're going to ship anyway, because we're at a point where it is better to ship than not. This is the same sort of call, Ithink.

@ralfhandl ralfhandl mentioned this pull request Aug 8, 2025
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved pr port PRs that just port an approved PR to another version Schema
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants