Skip to content

Conversation

gyli
Copy link
Contributor

@gyli gyli commented Jul 31, 2025

Resolves #443

Note that in this PR, __extends__ only supports path relative to the YAML file directory. In the future, we can add a new argument to DagFactory, accepting the base dir for extends.

Young Li added 6 commits July 30, 2025 22:32
… extends_config

# Conflicts:
#	dagfactory/dagfactory.py
#	docs/configuration/defaults.md
#	tests/test_dagfactory.py
#	tests/test_utils.py
@gyli gyli requested a review from a team as a code owner July 31, 2025 03:06
@pankajastro
Copy link
Collaborator

FYI: @tatiana implemented the ability to override defaults.yml based on the directory hierarchy in PR:
#500
@tatiana @pankajkoti @yetudada, do you have any feedback on this?

@pankajkoti
Copy link
Contributor

Yes, I think PR #500 goes a long way toward combining defaults hierarchically and naturally. If something needs to be overridden, we can place a default.yml file closer (same folder) to the target YAML and specify that file as the defaults path.

Adding programmatic constructs like extends would introduce a learning curve for non-programmer DAG authors. Since we already have an alternative, I’d rather we stick with the current approach until there’s a proven need for something more complex.

@gyli
Copy link
Contributor Author

gyli commented Jul 31, 2025

My bad, it took me too long to complete this PR.
I also noticed the duplicated feature as well, and in the ideally only one of them is needed eventually. I think we can decide one, or keep both for the short term and decide later once we have enough user feedback.

There are some differences with these 2 approaches:

Pros of defaults.yml at sub-dir:

  • Adding defaults without updating the DAG config
  • Defaults can be combined without specifying the file path of defaults.yml at different levels

Pros of __extends__:

  • More flexible default config management (extending multiple config files, or chained extending)
  • Allowing multiple presets of default config (for example, one default config for daily DAGs, another for weekly DAGs)
  • Allowing dynamic default files depending on env var

I don't think extends is a programmatic solution, because these two follow the same logic combining config, but extends just requires manual path specifying.

In my opinion, the __extends__ way has more potential, so I would still suggest to keep it for now, even though I'm fully aware of the duplicated features.

"""
result = copy.deepcopy(dict1)

def recursive_dict_merge(_dict1: dict, _dict2: dict) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if we could avoid recursion. Historically, recursion is not performant in Python (an example is discussed in this blog post).

Do we really need to expose this in DAG-level args, including default_args? What are the use-cases?

Copy link
Contributor Author

@gyli gyli Aug 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a recently requested feature: #462.
PR #500 rolled back this feature. No matter it's intentional or not, this rollback should be noted down in the doc or changelog to avoid the confusion.
cc @pankajastro

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code updated and recursion removed. This is possible because there's no nested dict value in current dag-level args.

Copy link
Collaborator

@tatiana tatiana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gyli Thanks a lot for the work on this, not only including the implementation, but also tests and examples.

I wrote feedback on the implementation, particularly on parts that we could improve so both #500 and this PR would reuse most of the code, and it would be easier to maintain. It is worth having a separate PR, for instance, on refactoring the merge methods - since this is a significant contribution that will be accepted without further discussion.

DagFactory currently has four different ways of defining defaults:
https://astronomer.github.io/dag-factory/dev/configuration/defaults/

And, I agree with @pankajkoti, to add another one will make things even more complex. We've been trying to reduce entry points in the 1.0 release (such as in #509), and keeping both defaults.yml implementations may be overwhelming for users & will add some maintenance burden (which I believe is relative small, if we refactor the code).

I do not have a preference between the interfaces proposed in #469 and #443. Ideally, we'd validate these with end-users and make a user-driven / data-driven decision. Unfortunately, we don't have time to do this before the 1.0 release. I agree with the pros and cons you and @pankajkoti raised, including that #443 is more extensible, but more complex to use.

We had discussed this feature with our new product manager, @yetudada, earlier this week, and due to the lack of data, we had deferred #443 to the 1.1 release, three days ago (as seen in the ticket #443).

I can see two main paths forward:

  1. Keep one of the interfaces (#443 or #469) for the 1.0 release, measure adoption, and consider introducing the other one afterwards (e.g. 1.1 release). In this case we need to choose one of them.
  2. Release both features (#443 or #469) as experimental and have a way of measuring adoption, and remove one of them after collecting data.

@yetudada @pankajastro @pankajkoti - I know we discussed these three days ago, but by then we didn't have a PR. Circumstances changed, and I'd love to know if you see any other path forward. Ultimately, this is a product decision.

4. If (3) is not declared, the `defaults.yml` hierarchy.


## Configuration Inheritance with `__extends__`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this documentation would have to live in "### Combining multiple defaults.yml files", and we'd either replace the current description or we'd have both as sub-cases of this broader approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having both as sub-cases is exactly what I tried to do here. It has these H2 level title:

## YAML top-level `default`
## Specifying `default` arguments via a Python dictionary
## Declaring default values using the `defaults.yml` file
## Configuration Inheritance with `__extends__`

@pankajastro
Copy link
Collaborator

Both approaches aim to reduce duplication and improve reusability.

However, keeping both __extend__ and default feels a bit redundant. It also increases the number of possible permutations, which could lead to confusion. It would be cleaner if we could stick to a single concept—either __extend__ or default.

Personally, I lean toward something more explicit and controllable, but I might be biased since I haven’t authored many YAML files myself. The __extend__ approach also reminds me of Docker Compose's include mechanism, which feels familiar.

@gyli — are we planning to deprecate defaults.yml, or how do we plan to handle it?

If we go with the __extend__ approach, it might introduce one more breaking change that we’ll need to account for.

@yetudada, looking forward to hearing your thoughts.

@gyli
Copy link
Contributor Author

gyli commented Aug 3, 2025

I think these two options make sense to us:

  1. Decide our preferred one in this thread before merging. (Can we collect feedback from any known users?)
  2. Merge this and keep both approaches for the short term (a few months?), maybe until next version. (Do we have any timeline planned?)

Meanwhile, I will update this PR based on @tatiana 's comment to reuse more existing utils.

@gyli
Copy link
Contributor Author

gyli commented Aug 6, 2025

I have addressed all the comments from @tatiana above, and will hold resolving the git conflicts for now, until we have a clearer answer about which defaults solution we prefer or whether we are going to keep both.

cc @pankajastro

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] Allow multiple customizable shared defaults YAML
4 participants