-
Notifications
You must be signed in to change notification settings - Fork 153
fix: run init-job on first install #450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
fix: run init-job on first install #450
Conversation
⚠ Not tested yet ⚠ Expected behavior: 1. The CRDB-StatefulSet and the init-Job are deployed almost at the same time
2. The init-job initializes the DB - once it is ready for initialization - and finishes
3. The `helm install` command succeeds
4. The Job is eventually auto-deleted by Kubernetes due to the `ttlSecondsAfterFinished` being 0
(this unblocks the `helm upgrade`) On Upgrade: (0. The init-Job is removed, if present from a previous `helm upgrade` - should not be
there anymore from the `helm install` step)
1. The CRDB-StatefulSet is upgraded
2. The init-Job is run after the StatefulSet is upgraded and ready
3. The `helm upgrade` command succeeds |
The open issues related to the init-job deadlock: Probably: The closed issues related to the init-job deadlock: This just shows that the topic is highly confusing, as people expect a simple |
603e86a
to
2162138
Compare
I just had big success with the init job. The changes I propose in this PR seem to do their job (pun intended...?). The cluster was initialized without any manual action.
Nothing special required here, as you can see. Flux enables The init-job was applied in parallel to the StatefulSet, ran in the loop until the CRDB-pods accepted the init-command, ran the init, printed a success message and stayed as "completed" (which is expected and totally fine) before I got the And when the upgrade ran, it removed the previously installed job - because of the Side-Note: It is probably debatable, if the init-job is even needed at all on upgrade...because what does it do there? I mark this PR as ready and would be happy to address any questions or notes regarding my code changes. |
I'm unsure what a good title for this PR would be, so if you have one, let me know please |
closes #69 |
7184a75
to
63deb0f
Compare
initCluster() { | ||
while true; do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing this loop can cause issues, what if the init was not successful in first attempt. Job will fail and cockroachdb will not come up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cockroach init
already runs in a loop. I think this was changed since the Job was created and the Job was never updated..
helm.sh/hook: post-upgrade | ||
helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main issue with this is the job will be left in the cluster and when you do an upgrade, the new image will be patched to this job which is not allowed as job is an immutable resource in kubernetes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the hook-delete-policy
still applies on Upgrade, it removes the old job before creating the one for the upgrade.
Feel free to test it out.
There are multiple issues stating the init-job is not run as expected This is because there is a deadlock with the CRDB-StatfulSet requiring the init-job to run, which is only ran by Helm when the StatefulSet is considered ready The optional use of the --wait of `helm install` is causing differing observations This PR attempts to fix the problem by using the Job as plain Job instead of hook when the Chart is initially installed
63deb0f
to
bd9eead
Compare
There are multiple issues stating the init-job is not run as expected
This is because there is a deadlock with the CRDB-StatfulSet requiring the init-job to run, which is only ran by Helm when the StatefulSet is considered ready
The optional use of the --wait of
helm install
is causing differing observationsThis PR attempts to fix the problem by using the Job as plain Job instead of hook when the Chart is initially installed