-
Notifications
You must be signed in to change notification settings - Fork 209
Supporting extended configuration files #524
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
… extends_config # Conflicts: # dagfactory/dagfactory.py # docs/configuration/defaults.md # tests/test_dagfactory.py # tests/test_utils.py
FYI: @tatiana implemented the ability to override defaults.yml based on the directory hierarchy in PR: |
Yes, I think PR #500 goes a long way toward combining defaults hierarchically and naturally. If something needs to be overridden, we can place a Adding programmatic constructs like |
My bad, it took me too long to complete this PR. There are some differences with these 2 approaches: Pros of
Pros of
I don't think In my opinion, the |
dagfactory/utils.py
Outdated
""" | ||
result = copy.deepcopy(dict1) | ||
|
||
def recursive_dict_merge(_dict1: dict, _dict2: dict) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great if we could avoid recursion. Historically, recursion is not performant in Python (an example is discussed in this blog post).
Do we really need to expose this in DAG-level args, including default_args
? What are the use-cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a recently requested feature: #462.
PR #500 rolled back this feature. No matter it's intentional or not, this rollback should be noted down in the doc or changelog to avoid the confusion.
cc @pankajastro
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code updated and recursion removed. This is possible because there's no nested dict value in current dag-level args.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gyli Thanks a lot for the work on this, not only including the implementation, but also tests and examples.
I wrote feedback on the implementation, particularly on parts that we could improve so both #500 and this PR would reuse most of the code, and it would be easier to maintain. It is worth having a separate PR, for instance, on refactoring the merge methods - since this is a significant contribution that will be accepted without further discussion.
DagFactory currently has four different ways of defining defaults:
https://astronomer.github.io/dag-factory/dev/configuration/defaults/
And, I agree with @pankajkoti, to add another one will make things even more complex. We've been trying to reduce entry points in the 1.0 release (such as in #509), and keeping both defaults.yml
implementations may be overwhelming for users & will add some maintenance burden (which I believe is relative small, if we refactor the code).
I do not have a preference between the interfaces proposed in #469 and #443. Ideally, we'd validate these with end-users and make a user-driven / data-driven decision. Unfortunately, we don't have time to do this before the 1.0 release. I agree with the pros and cons you and @pankajkoti raised, including that #443 is more extensible, but more complex to use.
We had discussed this feature with our new product manager, @yetudada, earlier this week, and due to the lack of data, we had deferred #443 to the 1.1 release, three days ago (as seen in the ticket #443).
I can see two main paths forward:
- Keep one of the interfaces (#443 or #469) for the 1.0 release, measure adoption, and consider introducing the other one afterwards (e.g. 1.1 release). In this case we need to choose one of them.
- Release both features (#443 or #469) as experimental and have a way of measuring adoption, and remove one of them after collecting data.
@yetudada @pankajastro @pankajkoti - I know we discussed these three days ago, but by then we didn't have a PR. Circumstances changed, and I'd love to know if you see any other path forward. Ultimately, this is a product decision.
4. If (3) is not declared, the `defaults.yml` hierarchy. | ||
|
||
|
||
## Configuration Inheritance with `__extends__` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this documentation would have to live in "### Combining multiple defaults.yml
files", and we'd either replace the current description or we'd have both as sub-cases of this broader approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having both as sub-cases is exactly what I tried to do here. It has these H2 level title:
## YAML top-level `default`
## Specifying `default` arguments via a Python dictionary
## Declaring default values using the `defaults.yml` file
## Configuration Inheritance with `__extends__`
Both approaches aim to reduce duplication and improve reusability. However, keeping both Personally, I lean toward something more explicit and controllable, but I might be biased since I haven’t authored many YAML files myself. The @gyli — are we planning to deprecate defaults.yml, or how do we plan to handle it? If we go with the @yetudada, looking forward to hearing your thoughts. |
I think these two options make sense to us:
Meanwhile, I will update this PR based on @tatiana 's comment to reuse more existing utils. |
I have addressed all the comments from @tatiana above, and will hold resolving the git conflicts for now, until we have a clearer answer about which defaults solution we prefer or whether we are going to keep both. cc @pankajastro |
Resolves #443
Note that in this PR,
__extends__
only supports path relative to the YAML file directory. In the future, we can add a new argument toDagFactory
, accepting the base dir for extends.