Skip to content

Generate base client variants from a single canonical source via AST composition #438

Description

@Minister944

Blocked by AR-6

Problem

The base client ships as a matrix of pre-written files, one per combination of feature flags:

  • async / sync
  • OpenTelemetry / no OpenTelemetry
  • multipart uploads / no upload

That's 8 hand-maintained client files (plus a base_model no-upload variant). Every new boolean
setting doubles the matrix, and any change to shared logic (e.g. a websocket fix) has to be applied
to every file by hand. The variants drift out of sync easily.

Introducing the multipart_uploads opt-in (AR-6) made this concrete: it added yet another axis and
the file count jumped again.

Goal

Keep one canonical source file (async_base_client_open_telemetry.py — the richest variant:
async + OT + multipart) and generate all other variants at codegen time by composing AST
transformers:

Output file Transformers applied
async_base_client_open_telemetry none (copy as-is)
async_base_client strip_ot
async_base_client_open_telemetry_no_upload strip_multipart
async_base_client_no_upload strip_ot ∘ strip_multipart
base_client_open_telemetry async_to_sync
base_client strip_ot ∘ async_to_sync
base_client_open_telemetry_no_upload strip_multipart ∘ async_to_sync
base_client_no_upload strip_ot ∘ strip_multipart ∘ async_to_sync

This deletes 7 source files and makes shared fixes land in exactly one place.

Approaches considered

  1. AST template + fragment injection — build a base AST for the whole file, then conditionally
    splice/remove nodes. Hard to keep coherent: every injected fragment has to be authored to fit its
    surrounding context.
  2. AST as a patcher over the existing file — parse a finished file, mutate specific nodes. More
    predictable, but needs precise "what to remove where" mapping.
  3. Build clean code from AST programmatically — full control, but huge boilerplate even for
    trivial files.

The workable approach turned out to be (2): composable ast.NodeTransformer subclasses, one per
concern (StripOpenTelemetry, StripMultipartUpload, AsyncToSync), chained per target variant.

The hard part: comments

ast.unparse() drops all inline comments. Our templates rely on linter-directive comments that

from websockets import ClientConnection  # type: ignore[import-not-found,unused-ignore]
def Subprotocol(*args, **kwargs):  # type: ignore # noqa: N802, N803
ClientConnection = Any  # ty: ignore[invalid-assignment]

There is no standard way to round-trip these through ast. The workaround is a post-processing step
(_restore_inline_comments()) that scans the generated output line-by-line and re-appends known
comments by matching a small, stable signature map. It works, but it's a maintenance surface of its
own — every new linter directive in the source has to be registered in the restore map.

Possible alternative: ast-comments extends the stdlib
ast by representing comments as first-class Comment nodes, and provides an unparse() that
reconstructs source with comments intact (Python 3.9+). This could replace the brittle signature-map
restore step entirely. Caveats: it adds a dependency, the parsed tree can't be passed straight to
compile() (needs its pre_compile_fixer()), and the transformers would have to be written to
account for the extra Comment nodes. Worth evaluating before committing to the manual restore map.

Additional gotcha: ast.unparse() output differs slightly across Python versions (e.g. 3.13
drops parens in for (k, v) tuples), so generated snapshots are version-sensitive.

Why this isn't part of #AR-6

This is a sizable refactor with its own risk surface (comment restoration, cross-version unparse
differences, regenerating every integration snapshot). It's out of scope for the multipart opt-in
task and deserves to be staged and reviewed on its own.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions