Skip to content

Conversation

@derrickaw
Copy link
Contributor

@derrickaw derrickaw commented Nov 13, 2025

  1. Update KafkaToBigQuery.yaml to jive with existing KafkaToBigQuery.java template.
  2. Add generation script to convert yaml blueprint to java interface.
    3. Add workflow to inspect each pull request opened to rerun generation script and create or modify java templates.
  3. Add java template file as a baseline file.
  4. Add generic readme on how to add new blueprints, templates, etc.

@derrickaw derrickaw added addition New feature or request java Pull requests that update Java code improvement yaml Pull requests that update Yaml code labels Nov 13, 2025
@codecov
Copy link

codecov bot commented Nov 13, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 50.24%. Comparing base (59d8c15) to head (437942b).
⚠️ Report is 16 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##               main    #2984    +/-   ##
==========================================
  Coverage     50.23%   50.24%            
- Complexity     5001     5018    +17     
==========================================
  Files           966      967     +1     
  Lines         59157    59261   +104     
  Branches       6445     6458    +13     
==========================================
+ Hits          29719    29777    +58     
- Misses        27334    27377    +43     
- Partials       2104     2107     +3     
Components Coverage Δ
spanner-templates 70.42% <ø> (-0.03%) ⬇️
spanner-import-export 68.96% <ø> (-0.03%) ⬇️
spanner-live-forward-migration 79.69% <ø> (ø)
spanner-live-reverse-replication 77.03% <ø> (-0.05%) ⬇️
spanner-bulk-migration 88.33% <ø> (ø)
see 6 files with indirect coverage changes
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

update script name

update workflow

debug

fix step names

checkout branch

try again

more debug

update to fetch and checkout

add permissions

add token

fix names etc
@derrickaw derrickaw force-pushed the addAutoYamlToJavaGeneration branch from a398dbd to 996004d Compare November 13, 2025 21:32
@derrickaw derrickaw marked this pull request as ready for review November 14, 2025 21:01
Copy link
Contributor

@damccorm damccorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - the general flow LGTM, just had questions about the workflow structure

name: YAML to Java Template Generation or Modification

on:
pull_request:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we probably want to run this on different conditions (probably commits to this branch on the main branch or on a schedule). We should also add a workflow_dispatch trigger

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see - you're committing it back to the pull request branch. In general, I think it is probably better to have this as a standalone thing rather than tying it to PRs - that way it can be run if we update the underlying generation infrastructure (or make similar other changes)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think this won't work on forks since I don't think the workflow has permissions on forks. I'd recommend following the pattern in https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/main/.github/workflows/update-python-deps.yml

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, you could require that a user generate these as part of their pull request and fail the check if they don't (similar to how linting checks work). To do this, you could do all the same things you're already doing, but finish with git diff --exit-code to fail the job if there are differences detected.

In general, I'd stay away from modifying a user's branch though, since it is surprising and will run into those permission problems.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like the final suggestion, at least for the initial version. Let us give instructions for users to generate java interface from yaml using your script(We did similar setup for ManagedIO docs). Committing to working branch might not be good. I am happy not to have a validation check (PR vs script output) too. We might not want to create Template for every Blueprint.

Copy link
Contributor Author

@derrickaw derrickaw Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the initial version:

  1. Removed the workflow.
  2. Added a readme file explaining how to add the yaml blueprint, interface, IT, etc.
  3. Added mvn spotless process to the generate script from the workflow file to minimize that probable check failure.

Thanks

Comment on lines 34 to 35
with:
fetch-depth: 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for this? This will slow clones

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I was iterating on this and with my understanding, it was required to have the full history to know about the branch and its context for pushing back to the same PR. Its a mute point now based on other comments, so will remove. Thanks.

tarun-google
tarun-google previously approved these changes Nov 18, 2025
Copy link
Contributor

@tarun-google tarun-google left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


- name: "kafkaReadTopics"
help: "Kafka topic to read the input from."
description: "Kafka topic(s) to read the input from."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QQ: Are we displaying these new fields in UI too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This yaml file isn't used currently in the UI for Job Builder. The template fields in the java template are used currently in the Create Job from template.

Copy link
Contributor

@chamikaramj chamikaramj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

yaml/README.md Outdated

## Steps

1. **Add YAML Blueprint:** Create the YAML blueprint file that defines the template's structure and parameters.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a link on contribution instructions to this Github repo ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, thanks

```

3. **Create Integration Test:** Develop an integration test to validate the functionality of the new template.
Place this file in [here](https://github.com/GoogleCloudPlatform/DataflowTemplates/yaml/src/test/java).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any existing instructions on creating the integration tests for Flex templates that we can link to ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, thanks



def get_java_type(param_type):
"""Maps a YAML parameter type to a Java type."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we limited to these three types here ?

Copy link
Contributor Author

@derrickaw derrickaw Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return 'String'

def get_template_parameter_type(param_type):
"""Maps a YAML parameter type to a TemplateParameter annotation type."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

Copy link
Contributor Author

@derrickaw derrickaw Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with open(yaml_path, 'r') as f:
content = f.read()
# Remove Jinja variables before parsing
content = re.sub(r'{{.*?}}', '', content)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably we should have some validation to make sure we do not remove required params ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under normal situation jinja variables are rendered before the they are loaded. In this case we do not care about the jinja variables since this exercise really only cares about the metadata of the file, but it prevents us from loading the file correctly, so just doing a dummy replace for now.

param_code += " @Validation.Required\n"

# default param
if 'default' in param:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto regarding types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see other comments, thanks


# Convert each YAML file to a Java interface
try:
for yaml_path in yaml_files:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that we might have other YAML files that are not intended to be pipelines ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that this location would only contain the golden yaml blueprints.

Copy link
Contributor

@chamikaramj chamikaramj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. LGTM other than one comment.

print(f"Error running mvn spotless:apply: {e}", file=sys.stderr)
return e

def generate_java_interface(yaml_path, java_path):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to add some unit testing to cover these functions ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

addition New feature or request improvement java Pull requests that update Java code size/L yaml Pull requests that update Yaml code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants