Skip to content

v3.2: Support ordered multipart including streaming #4589

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: v3.2-dev
Choose a base branch
from

Conversation

handrews
Copy link
Member

Fixes:

This adds support for all multipart media types that do not have named parts, including support for streaming such media types. Note that multipart/mixed defines the basic processing rules for all multipart types, and implementations that encounter unrecognized multipart subtypes are required to process them as multipart/mixed. Therefore support for multipart/mixed addresses all other subtypes to some degree.

This builds on the recent support for sequential media types:

  • multipart/mixed and similar meet the definition for a sequential media type, requiring it to be modeled as an array. This does use an expansive definition of "repeating the same structure", where the structure is literally any content with a media type.
  • As a sequential media type, it also supports itemSchema
  • Adding a parallel itemEncoding is the obvious solution to multipart/mixed streams requiring an Encoding Object
  • We have regularly received requests to support truly mixed multipart/mixed payloads, and previously claimed such support from 3.0.0 onwards, without actually supporting it. Adding prefixEncoding along with itemEncoding supports this use case with a clear parallel to prefixItems, which is the schema construct needed to support this case.
  • There is no need for a prefixSchema field because the streaming use case requires a repetition of the same schema for each item. Therefore all mixed use cases can use schema and prefixItems
  • schema changes are included in this pull request
  • schema changes are needed for this pull request but not done yet
  • no schema changes are needed for this pull request

We do not seem to run tests on the 3.2 schemas, and I couldn't quickly figure out how to add that, so we should do that separately and include coverage for this and other new fields.

Also paging @thecheatah, @jeremyfiel

@jeremyfiel
Copy link
Contributor

Thanks @handrews for taking this on. I'm really happy to see it coming to fruition and hopefully the tooling catches up with it sooner than later.

I couldn't immediately make out if this would support nested multipart.

POST  /things HTTP/1.1
content-type: multipart/mixed;boundary=aaa

--aaa
content-type: application/json

{ 
   "data": ""
}
--aaa
content-type: multipart/mixed;boundary=bbb

        --bbb
        content-type: application/json
        {
            "more_data": ""
        }
        --bbb
        content-type: text/plain
        test file
        --bbb
        content-type: application/zip
        
        <binary data>
        ---bbb
        content-type: application/pdf
        
        <binary data>
        --bbb--
--aaa--

multipart/mixed:
  schema:
     prefixItems:
     -  type: object
         properties:
           data:
             type: string
     - prefixItems:
        - type: object
           properties:
              more_data: ""
        - {}
        - {}
        - {}
    prefixEncoding:
      - {}
      - contentType: multipart/mixed
      # not sure how to further document a nested structure here.

@handrews
Copy link
Member Author

@jeremyfiel aww... I was hoping no one would bring up nested multipart... 😵‍💫

I think it would be hard to do that, because there isn't anywhere to put the nested Encoding Object. I think we'd have to add encoding, prefixEncoding, and itemEncoding to the Encoding Object as well as the Media Type Object. I'm a bit hesitant to do that, but we could talk about it at the Thursday call and I could submit it as a follow-up if it gains traction.

Alternatively, we could recommend trying that as an extension given that it adds significant complexity and is a rare case that is deprecated by the current RFC (I know that's small consolation when you're the "rare case" and built things in good faith using older RFCs when they were current).

The complexity is not just the recursive structure, but also that you are now correlating two separate trees of structure.

@jeremyfiel
Copy link
Contributor

I'm not entirely sure this is a correct statement to include multipart/mixed. It is registered in the IANA registry and it does technically have an envelope with the boundary parameter.

Sequential Media Types

Within this specification, a sequential media type is defined as any media type that consists of a repeating structure, without any sort of header, footer, envelope, or other metadata in addition to the sequence.
Some examples of sequential media types (including some that are not IANA-registered but are in common use) are:

  application/jsonl
  application/x-ndjson
  application/json-seq
  application/geo+json-seq
  text/event-stream
  multipart/mixed

@handrews
Copy link
Member Author

handrews commented May 27, 2025

[EDIT: This goes with the nested multipart discussion]

@jeremyfiel the problem is that instead of just re-using the Media Type Object, we came up with the contentType field :-(

@jeremyfiel
Copy link
Contributor

I totally understand the complexity, just trying to confirm my initial impression.

@handrews
Copy link
Member Author

@jeremyfiel That statement only says that some of the listed types are not registered. application/json-seq, application/geo+json-seq, and multipart/mixed are all registered.

I decided not to get into the preamble and postamble of multipart because AFAICT they're supposed to be ignored and are there for historical purposes. Media type parameters are not part of the actual media type content, and the boundaries in the content are no more (or less) significant than the various differences in the three sequential JSON media type delimiters.

@handrews
Copy link
Member Author

@jeremyfiel I added some clarifications about the envelope/preamble/epilogue and the lack of nesting support.

handrews added 4 commits May 30, 2025 10:04
This adds support for all `multipart` media types that do not
have named parts, including support for streaming such media types.
Note that `multipart/mixed` defines the basic processing rules
for all `multipart` types, and implementations that encounter
unrecognized `multipart` subtypes are required to process them
as `multipart/mixed`.  Therefore support for `multipart/mixed`
addresses all other subtypes to some degree.

This builds on the recent support for sequential media types:

* `multipart/mixed` and similar meet the definition for
  a sequential media type, requiring it to be modeled as
  an array.  This does use an expansive definition of
  "repeating the same structure", where the structure is
  literally any content with a media type.
* As a sequential media type, it also supports `itemSchema`
* Adding a parallel `itemEncoding` is the obvious solution to
  `multipart/mixed` streams requiring an Encoding Object
* We have regularly received requests to support truly mixed
  `multipart/mixed` payloads, and previously claimed such support
  from 3.0.0 onwards, without actually supporting it.
  Adding `prefixEncoding` along with `itemEncoding` supports this
  use case with a clear parallel to `prefixItems`, which is the
  schema construct needed to support this case.
* There is no need for a `prefixSchema` field because the streaming
  use case requires a repetition of the same schema for each item.
  Therefore all mixed use cases can use `schema` and `prefixItems`
@handrews
Copy link
Member Author

This force-push was just a plain re-base with no conflicts or other changes. Exactly the same commits applied, I just wanted to make sure the other big PRs wouldn't cause merge issues.

@jeremyfiel GitHub won't let me request a review from you, but if you could provide an approval when you are satisfied with the PR it would be much appreciated as you probably have more expertise with this than just about anyone else.

@thecheatah if you are able to review, even just for the streaming support part, that would also be greatly appreciated. I did not use application/json in the streaming multipart example, but the principle would be the same.

Copy link

@thecheatah thecheatah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. This change allows us to describe the multipart/mixed streaming use case. Thanks!

Thanks to @thecheatah for catching this.
Copy link
Contributor

@jeremyfiel jeremyfiel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Co-authored-by: Jeremy Fiel <[email protected]>
src/oas.md Outdated
Most `multipart` media types, including `multipart/mixed` which defines the underlying rules for parsing all `multipart` types, do not have named parts.
Data for these media types are modeled as an array, with one item per part, in order.

To use the `prefixEncoding` and/or `itemEncoding` fields, either an array `schema` or `itemSchema` MUST be present.
Copy link
Contributor

@ralfhandl ralfhandl Jun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To use the `prefixEncoding` and/or `itemEncoding` fields, either an array `schema` or `itemSchema` MUST be present.
To use the `prefixEncoding` and/or `itemEncoding` fields, either `itemSchema` or an array `schema` MUST be present.

Reorder to make clear that „array“ only applies to schema?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update this- you have extra spaces in your suggestion so I can't merge it as-is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed 😊

@@ -1661,6 +1686,8 @@ Determining how to handle a `type` value of `null` depends on how `null` values
If `null` values are entirely omitted, then the `contentType` is irrelevant.
See [Appendix B](#appendix-b-data-type-conversion) for a discussion of data type conversion options.

It is not currently possible to model nested `multipart` media types.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pity, my only multipart use case is that the multipart request consists of parts that are either of

  • media type application/http, or
  • media type multipart/mixed with parts of media type application/http,

see Multipart Batch Format.

I had assumed I can model that with an itemSchema that is oneOf an array of HTTP requests or a single HTTP request, but the corresponding Encoding Object may be difficult.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ralfhandl It's telling that the two people to give the most comprehensive reviews of this PR to date both need nested multipart. I do have thoughts on how to do this, and would be more than happy to consider it part of this release-blocking functionality for 3.2. It would involve adding encoding, prefixEncoding, and itemEncoding to the Encoding Object, but there are challenging edge cases and I don't want to drag this PR down with them. I'd rather get this merged, submit a follow-on PR (which is impractical without merging this first), and then we can decide if the follow-on works.

Regarding application/http, the problem (to me) is not "how to use an Encoding Object with application/http" but "how to model application/http at all." Can you point me to how you would model an application/http payload if it were not in a multipart part? I realize that would be a rather strange payload outside of multipart, but please humor me- feel free to open a new discussion on it if there's no clear answer. If we know how to model it in general, we can look at how to model it as a multipart part. Although that might have to wait for 3.3.

@handrews
Copy link
Member Author

handrews commented Jun 9, 2025

@ralfhandl I have fixed the sentence ordering, and also added a new section, Encoding and type [see commits below- initial push failed and I didn't notice at first], that clarifies how to handle detecting the "schema type" that gets mentioned in many places but is never explained. I suspect that originally the expectation was that the Schema Object under the schema field in the Media Type Object (adjacent to the Encoding Objects' parent encoding field) would be an inline schema with inline properties subschemas, or at most a single $ref directly under schema.

This is not realistic, so I think that setting some boundaries on whatis expected is required. I stuck with requiring (MUST) only the most unambiguous scenario, although I considered including the search-order support for multi-valued type keywords as a MUST. Or at least if it is two values and the second value is "null". But you can get into some weird corner cases when you are doing that so it felt better to banish it under the "MAY choose to implement more complex things." Although I also considered pulling that part out of the MAY and making it a SHOULD on its own. Opinions here would be much appreciated.

This, btw, is a prerequisite for supporitng nested multipart/mixed as the problem becomes much more complex the further down you go in nested objects and arrays. But this is really needed already, whether we support nesting or not, so I decided to add it to this PR.

@handrews
Copy link
Member Author

handrews commented Jun 9, 2025

@ralfhandl oops, push had failed and I hadn't noticed. The section mentioned in my last comment is now actually added, sorry about that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement media and encoding Issues regarding media type support and how to encode data (outside of query/path params)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants