-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Add auto yaml to java generation #2984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add auto yaml to java generation #2984
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2984 +/- ##
==========================================
Coverage 50.23% 50.24%
- Complexity 5001 5018 +17
==========================================
Files 966 967 +1
Lines 59157 59261 +104
Branches 6445 6458 +13
==========================================
+ Hits 29719 29777 +58
- Misses 27334 27377 +43
- Partials 2104 2107 +3
🚀 New features to boost your workflow:
|
update script name update workflow debug fix step names checkout branch try again more debug update to fetch and checkout add permissions add token fix names etc
a398dbd to
996004d
Compare
…ould result from the new workflow
damccorm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks - the general flow LGTM, just had questions about the workflow structure
| name: YAML to Java Template Generation or Modification | ||
|
|
||
| on: | ||
| pull_request: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we probably want to run this on different conditions (probably commits to this branch on the main branch or on a schedule). We should also add a workflow_dispatch trigger
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see - you're committing it back to the pull request branch. In general, I think it is probably better to have this as a standalone thing rather than tying it to PRs - that way it can be run if we update the underlying generation infrastructure (or make similar other changes)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think this won't work on forks since I don't think the workflow has permissions on forks. I'd recommend following the pattern in https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/main/.github/workflows/update-python-deps.yml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, you could require that a user generate these as part of their pull request and fail the check if they don't (similar to how linting checks work). To do this, you could do all the same things you're already doing, but finish with git diff --exit-code to fail the job if there are differences detected.
In general, I'd stay away from modifying a user's branch though, since it is surprising and will run into those permission problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i like the final suggestion, at least for the initial version. Let us give instructions for users to generate java interface from yaml using your script(We did similar setup for ManagedIO docs). Committing to working branch might not be good. I am happy not to have a validation check (PR vs script output) too. We might not want to create Template for every Blueprint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the initial version:
- Removed the workflow.
- Added a readme file explaining how to add the yaml blueprint, interface, IT, etc.
- Added mvn spotless process to the generate script from the workflow file to minimize that probable check failure.
Thanks
| with: | ||
| fetch-depth: 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason for this? This will slow clones
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I was iterating on this and with my understanding, it was required to have the full history to know about the branch and its context for pushing back to the same PR. Its a mute point now based on other comments, so will remove. Thanks.
tarun-google
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
|
||
| - name: "kafkaReadTopics" | ||
| help: "Kafka topic to read the input from." | ||
| description: "Kafka topic(s) to read the input from." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QQ: Are we displaying these new fields in UI too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This yaml file isn't used currently in the UI for Job Builder. The template fields in the java template are used currently in the Create Job from template.
chamikaramj
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
yaml/README.md
Outdated
|
|
||
| ## Steps | ||
|
|
||
| 1. **Add YAML Blueprint:** Create the YAML blueprint file that defines the template's structure and parameters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a link on contribution instructions to this Github repo ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, thanks
| ``` | ||
|
|
||
| 3. **Create Integration Test:** Develop an integration test to validate the functionality of the new template. | ||
| Place this file in [here](https://github.com/GoogleCloudPlatform/DataflowTemplates/yaml/src/test/java). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any existing instructions on creating the integration tests for Flex templates that we can link to ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, thanks
|
|
||
|
|
||
| def get_java_type(param_type): | ||
| """Maps a YAML parameter type to a Java type.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we limited to these three types here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Goes back to the limitation of https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/main/metadata/src/main/java/com/google/cloud/teleport/metadata/TemplateParameter.java#L24
Updated a little.
| return 'String' | ||
|
|
||
| def get_template_parameter_type(param_type): | ||
| """Maps a YAML parameter type to a TemplateParameter annotation type.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The template parameter types are limited based on this - https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/main/metadata/src/main/java/com/google/cloud/teleport/metadata/TemplateParameter.java#L24
Updated a little.
| with open(yaml_path, 'r') as f: | ||
| content = f.read() | ||
| # Remove Jinja variables before parsing | ||
| content = re.sub(r'{{.*?}}', '', content) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably we should have some validation to make sure we do not remove required params ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Under normal situation jinja variables are rendered before the they are loaded. In this case we do not care about the jinja variables since this exercise really only cares about the metadata of the file, but it prevents us from loading the file correctly, so just doing a dummy replace for now.
| param_code += " @Validation.Required\n" | ||
|
|
||
| # default param | ||
| if 'default' in param: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto regarding types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see other comments, thanks
|
|
||
| # Convert each YAML file to a Java interface | ||
| try: | ||
| for yaml_path in yaml_files: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that we might have other YAML files that are not intended to be pipelines ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is that this location would only contain the golden yaml blueprints.
chamikaramj
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. LGTM other than one comment.
| print(f"Error running mvn spotless:apply: {e}", file=sys.stderr) | ||
| return e | ||
|
|
||
| def generate_java_interface(yaml_path, java_path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to add some unit testing to cover these functions ?
3. Add workflow to inspect each pull request opened to rerun generation script and create or modify java templates.