-
Notifications
You must be signed in to change notification settings - Fork 34
use system MariaDBAccount for the galera server's root pw (PR 6 of 6) #347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: zzzeek The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
07f0ddb to
ff2d053
Compare
54d31c7 to
2fd8211
Compare
e311b40 to
10549a9
Compare
| fi | ||
| # If we get here, credentials are invalid, fall through to refresh | ||
| fi | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note we need some extra logic here to detect a root password change in flight. for now we will look in both places to always have a good root password
6711288 to
3050206
Compare
ef65a28 to
4e3f794
Compare
4e3f794 to
9daaa49
Compare
|
/retest |
1 similar comment
|
/retest |
e0f37fb to
3e9ca8f
Compare
bbca2b0 to
bd1707f
Compare
|
/retest |
ca2fcca to
0d4ce36
Compare
|
/retest |
1 similar comment
|
/retest |
898a06e to
c3272d1
Compare
The existing mariadb operator in fact already "supports" in-place change of secret, since if you change Spec.Secret to a new secret name, that would imply a new job hash, and the account.sh script running using only GRANT statements would update the password per mariadb. This already works for flipping the TLS flag on and off too. So in this patch, we clean this up and add a test to include: * a new field Status.CurrentSecret, which is used to indicate the previous secret from which the finalizer should be removed. This will also be used when we migrate the root password to use MariaDBAccount by providing the "current" root password when changing to a new root password * improved messaging in log messages, name of job. This changes the job hash for mariadbaccount which will incur a run on existing environments, however the job hashes are already changing on existing environments due to the change in how the mysql root password is sent, i.e. via volume mounted script rather than env var secret * update account.sh to use modern idiomatic patterns for user create /alter, while mariadb is fine with the legacy style of using only GRANT statements, MySQL 8 no longer allows this statement to proceed without a CREATE USER, so formalize the commands used here to use distinct CREATE USER, ALTER USER, GRANT statements and clarify the script is good for all create/update user operations.
This commit ties together the previous ones to create a new MariaDBAccount when a Galera instance is created, and then to use the password from that account/secret in the mariadb bootstrap/maintenance scripts. Galera gets bootstrapped with this secret, then the mariadbaccount controller, who is waiting for galera to be available to set up this new "root" account, wakes up when galera is running, and changes the root password to itself, establishing the initial job hash for the mariadbaccount. As we now have a mariadbaccount linked to the outermost lifecycle of a galera instance, some hardening of the deletion process has been added to clarify that mariadbaccount will run deletion jobs only if Galera is not marked for deletion. If galera is marked for deletion, then we have to assume the service/pods are gone and no more drops can take place, even if the Galera CR is still present (chainsaw conveniently adds its own finalizer to Galera when running, preventing it from being fully deleted, which exposed this issue). Additional changes to the mysql_root_auth.sh and account.sh scripts allow for in-place changes of root password.
c3272d1 to
a4c9469
Compare
dciabrin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial round of comments for early feedback. I still have to review the subtle bits and the chainsaw tests.
| description: |- | ||
| RootDatabaseAccount - name of MariaDBAccount which will be used to | ||
| generate root account / password. | ||
| this account is generated if not exists, and a name is chosen based |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should stick the the original semantics and require a secret for instantiating the DB. Otherwise existing jobs/users which populate field secret won't be able to access it anymore.
We should add logic in the webhook to set rootDatabaseAccount to the value set in secret, as an intermediate step to ensure there's no user-detectable change with this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the "secret" field is deprecated by this change, because the new way that Galera gets its root secret is via Galera->RootDatabaseAccount->Secret. The old architecture is:
Galera ->
Secret -> (osp_secret with lots of secrets in it) ->
DbRootPassword
The new architecture is:
Galera ->
RootDatabaseAccount -> (a MariaDBAccount, either manually or automatically generated) ->
Secret ->
DatabasePassword
now, when we talk about "existing jobs / users", we have that covered (and as mentioned elsewhere I can try to add an explicit test for this scenario), by the code at line 585-615 of galera_contrller , which populates DbRootPassword into the new MariaDBAccount
so when you say "require a secret for instantiating the DB", do you mean:
- we no longer deprecate Galera->Secret->DbRootPassword, and that always has to be set up front? it gets copied to MariaDBAccount ->Secret and that's the documented behavior?
- Users (or something) must manually create a MariaDBAccount -> Secret in order to spin up Galera?
regarding item 1, im not sure how that would work because the whole point is that we can change the root password. I know that "osp-secret" is our own internal "secret for everything" but it's my understanding that end-users of the product still have one big (mutable?) "secret" with all of these passwords in it. but IIUC there's never been any feature where "user modifies a password in osp-secret -> that becomes the new secret", and it's not clear that's what you're proposing here. So if we dont document Galera->Secret->DbRootPassword as "legacy", it's part of the software, how do we document it? "This is the initial root password, but if you want to change the password, now you have to go in this totally different place to do that" ? Or do we make a system by which we actually respond to changes in osp-secret and handle the whole Galera->MariaDBAccount->Secret thing separately? (which means you cant change MariaDBAccount->Secret directly anymore, it would be out of sync with osp-secret ? that seems like a recipe for confusion)
Regarding item 2, the original plan for MariaDBAccount as used in each controller was that the Openstack Operator itself was going to "pre-create" all the MariaDBAccount->Secret objects for every controller. That's still a thing that could happen but I think it became apparent there wasn't much point, whether OO calls EnsureMariaDBAccount or each controller does (which each controller does anyway), the effect is the same.
it seems like in the Openstack CR we want there to be some notion of "initial password" maybe? osp-secret could be where we do this, but for the DB passwords of all the services, we removed those
Anyway tl;dr when you say, "we should require a secret", the question is, "where?"
| ) error { | ||
| if accountName == "" { | ||
| return fmt.Errorf("Account name is blank") | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this one check is there to distinguish between system account and regular account?
One should probably add a check for empty namespace as well.
| return nil | ||
| } | ||
|
|
||
| // DeleteAccountFinalizers performs just the primary account + secret finalizer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you make this comment self-contained? Otherwise one need t open this file and scroll up to the previous definition to make sense of it.
| var accountType AccountType | ||
|
|
||
| if exactUserName == "" { | ||
| accountType = "User" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should use the type constant here instead of value "User".
| } else if userNamePrefix != "" { | ||
| return nil, nil, fmt.Errorf("userNamePrefix and exactUserName are mutually exclusive") | ||
| } else { | ||
| accountType = "System" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should use the type constant here instead of value "System".
|
|
||
| // ******** TEMPORARY ************ | ||
| // Pull the DbRootPassword from osp-secret to allow CI / external | ||
| // scripts to work, until they can adapt to root_auth.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we rely on a validation webhook to ensure the new secret is populated, this block would change, right? We wouldn't need to have this TEMPORARY wrapper comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure im following but if you mean we had validation to ensure that Galera->RootDatabaseAccount->MariaDBAccount->Secret were populated, then I guess we could get rid of that block and no longer care about DbRootPassword, but how do we do upgrades?
| // this allows us to avoid having to change the Spec, which seems to be not | ||
| // best practice here and also seems to interfere with the openstack | ||
| // controller's ability to see the Galera status as Completed (though this | ||
| // might only be due to CRDs not being in sync between mariadb-operator and | ||
| // openstack-operator during development) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should get rid of this section of the comments, as they are not leading to any conclusion.
I kind of like the idea of explaining why we copy the root secret to Status and why it makes sense to use that field during any interaction with the DB
This commit ties together the previous ones to create a new
MariaDBAccount when a Galera instance is created, and then to
use the password from that account/secret in the mariadb
bootstrap/maintenance scripts.
Galera gets bootstrapped with this secret, then the mariadbaccount
controller, who is waiting for galera to be available to set up this
new "root" account, wakes up when galera is running, and changes
the root password to itself, establishing the initial job hash
for the mariadbaccount.
As we now have a mariadbaccount linked to the outermost lifecycle
of a galera instance, some hardening of the deletion process has been
added to clarify that mariadbaccount will run deletion jobs only if
Galera is not marked for deletion. If galera is marked for deletion,
then we have to assume the service/pods are gone and no more drops can
take place, even if the Galera CR is still present (chainsaw
conveniently adds its own finalizer to Galera when running, preventing
it from being fully deleted, which exposed this issue).