Skip to content

Conversation

@zzzeek
Copy link
Contributor

@zzzeek zzzeek commented Jul 1, 2025

This commit ties together the previous ones to create a new
MariaDBAccount when a Galera instance is created, and then to
use the password from that account/secret in the mariadb
bootstrap/maintenance scripts.

Galera gets bootstrapped with this secret, then the mariadbaccount
controller, who is waiting for galera to be available to set up this
new "root" account, wakes up when galera is running, and changes
the root password to itself, establishing the initial job hash
for the mariadbaccount.

As we now have a mariadbaccount linked to the outermost lifecycle
of a galera instance, some hardening of the deletion process has been
added to clarify that mariadbaccount will run deletion jobs only if
Galera is not marked for deletion. If galera is marked for deletion,
then we have to assume the service/pods are gone and no more drops can
take place, even if the Galera CR is still present (chainsaw
conveniently adds its own finalizer to Galera when running, preventing
it from being fully deleted, which exposed this issue).

@openshift-ci openshift-ci bot requested review from lewisdenny and lmiccini July 1, 2025 13:18
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 1, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: zzzeek
Once this PR has been reviewed and has the lgtm label, please assign viroel for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@zzzeek zzzeek changed the title use system MariaDBAccount for the galera server's root pw use system MariaDBAccount for the galera server's root pw (PR 6 of 6) Jul 1, 2025
@zzzeek zzzeek requested a review from dciabrin July 1, 2025 13:23
@zzzeek zzzeek force-pushed the OSPRH-14916-pr6 branch 3 times, most recently from 07f0ddb to ff2d053 Compare July 4, 2025 16:10
@zzzeek zzzeek force-pushed the OSPRH-14916-pr6 branch 2 times, most recently from 54d31c7 to 2fd8211 Compare August 18, 2025 19:22
@zzzeek zzzeek force-pushed the OSPRH-14916-pr6 branch 5 times, most recently from e311b40 to 10549a9 Compare October 12, 2025 01:54
fi
# If we get here, credentials are invalid, fall through to refresh
fi

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note we need some extra logic here to detect a root password change in flight. for now we will look in both places to always have a good root password

@zzzeek zzzeek force-pushed the OSPRH-14916-pr6 branch 5 times, most recently from 6711288 to 3050206 Compare October 12, 2025 23:42
@zzzeek zzzeek force-pushed the OSPRH-14916-pr6 branch 3 times, most recently from ef65a28 to 4e3f794 Compare November 7, 2025 22:36
@zzzeek
Copy link
Contributor Author

zzzeek commented Nov 14, 2025

/retest

1 similar comment
@zzzeek
Copy link
Contributor Author

zzzeek commented Nov 14, 2025

/retest

@zzzeek zzzeek force-pushed the OSPRH-14916-pr6 branch 2 times, most recently from e0f37fb to 3e9ca8f Compare November 14, 2025 23:29
@zzzeek zzzeek force-pushed the OSPRH-14916-pr6 branch 2 times, most recently from bbca2b0 to bd1707f Compare November 15, 2025 00:10
@zzzeek
Copy link
Contributor Author

zzzeek commented Nov 15, 2025

/retest

@zzzeek zzzeek force-pushed the OSPRH-14916-pr6 branch 4 times, most recently from ca2fcca to 0d4ce36 Compare November 15, 2025 04:50
@zzzeek
Copy link
Contributor Author

zzzeek commented Nov 15, 2025

/retest

1 similar comment
@zzzeek
Copy link
Contributor Author

zzzeek commented Nov 15, 2025

/retest

@zzzeek zzzeek force-pushed the OSPRH-14916-pr6 branch 2 times, most recently from 898a06e to c3272d1 Compare November 19, 2025 17:57
The existing mariadb operator in fact already "supports" in-place
change of secret, since if you change Spec.Secret to a new secret
name, that would imply a new job hash, and the account.sh script running
using only GRANT statements would update the password per mariadb.
This already works for flipping the TLS flag on and off too.

So in this patch, we clean this up and add a test to include:

* a new field Status.CurrentSecret, which is used to indicate the
  previous secret from which the finalizer should be removed.  This will
  also be used when we migrate the root password to use MariaDBAccount
  by providing the "current" root password when changing to a new root
  password
* improved messaging in log messages, name of job.   This changes the
  job hash for mariadbaccount which will incur a run on existing environments,
  however the
  job hashes are already changing on existing environments due to the
  change in how the mysql root password is sent, i.e. via volume mounted
  script rather than env var secret
* update account.sh to use modern idiomatic patterns for user create
  /alter, while mariadb is fine with the legacy style of using only
  GRANT statements, MySQL 8 no longer allows this statement to proceed
  without a CREATE USER, so formalize the commands used here to use
  distinct CREATE USER, ALTER USER, GRANT statements and clarify the
  script is good for all create/update user operations.
This commit ties together the previous ones to create a new
MariaDBAccount when a Galera instance is created, and then to
use the password from that account/secret in the mariadb
bootstrap/maintenance scripts.

Galera gets bootstrapped with this secret, then the mariadbaccount
controller, who is waiting for galera to be available to set up this
new "root" account, wakes up when galera is running, and changes
the root password to itself, establishing the initial job hash
for the mariadbaccount.

As we now have a mariadbaccount linked to the outermost lifecycle
of a galera instance, some hardening of the deletion process has been
added to clarify that mariadbaccount will run deletion jobs only if
Galera is not marked for deletion.  If galera is marked for deletion,
then we have to assume the service/pods are gone and no more drops can
take place, even if the Galera CR is still present (chainsaw
conveniently adds its own finalizer to Galera when running, preventing
it from being fully deleted, which exposed this issue).

Additional changes to the mysql_root_auth.sh and account.sh scripts
allow for in-place changes of root password.
Copy link
Contributor

@dciabrin dciabrin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial round of comments for early feedback. I still have to review the subtle bits and the chainsaw tests.

description: |-
RootDatabaseAccount - name of MariaDBAccount which will be used to
generate root account / password.
this account is generated if not exists, and a name is chosen based
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should stick the the original semantics and require a secret for instantiating the DB. Otherwise existing jobs/users which populate field secret won't be able to access it anymore.
We should add logic in the webhook to set rootDatabaseAccount to the value set in secret, as an intermediate step to ensure there's no user-detectable change with this PR

Copy link
Contributor Author

@zzzeek zzzeek Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the "secret" field is deprecated by this change, because the new way that Galera gets its root secret is via Galera->RootDatabaseAccount->Secret. The old architecture is:

Galera -> 
     Secret -> (osp_secret with lots of secrets in it) -> 
          DbRootPassword

The new architecture is:

Galera -> 
    RootDatabaseAccount -> (a MariaDBAccount, either manually or automatically generated) -> 
          Secret -> 
                  DatabasePassword

now, when we talk about "existing jobs / users", we have that covered (and as mentioned elsewhere I can try to add an explicit test for this scenario), by the code at line 585-615 of galera_contrller , which populates DbRootPassword into the new MariaDBAccount

so when you say "require a secret for instantiating the DB", do you mean:

  1. we no longer deprecate Galera->Secret->DbRootPassword, and that always has to be set up front? it gets copied to MariaDBAccount ->Secret and that's the documented behavior?
  2. Users (or something) must manually create a MariaDBAccount -> Secret in order to spin up Galera?

regarding item 1, im not sure how that would work because the whole point is that we can change the root password. I know that "osp-secret" is our own internal "secret for everything" but it's my understanding that end-users of the product still have one big (mutable?) "secret" with all of these passwords in it. but IIUC there's never been any feature where "user modifies a password in osp-secret -> that becomes the new secret", and it's not clear that's what you're proposing here. So if we dont document Galera->Secret->DbRootPassword as "legacy", it's part of the software, how do we document it? "This is the initial root password, but if you want to change the password, now you have to go in this totally different place to do that" ? Or do we make a system by which we actually respond to changes in osp-secret and handle the whole Galera->MariaDBAccount->Secret thing separately? (which means you cant change MariaDBAccount->Secret directly anymore, it would be out of sync with osp-secret ? that seems like a recipe for confusion)

Regarding item 2, the original plan for MariaDBAccount as used in each controller was that the Openstack Operator itself was going to "pre-create" all the MariaDBAccount->Secret objects for every controller. That's still a thing that could happen but I think it became apparent there wasn't much point, whether OO calls EnsureMariaDBAccount or each controller does (which each controller does anyway), the effect is the same.

it seems like in the Openstack CR we want there to be some notion of "initial password" maybe? osp-secret could be where we do this, but for the DB passwords of all the services, we removed those

Anyway tl;dr when you say, "we should require a secret", the question is, "where?"

) error {
if accountName == "" {
return fmt.Errorf("Account name is blank")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this one check is there to distinguish between system account and regular account?
One should probably add a check for empty namespace as well.

return nil
}

// DeleteAccountFinalizers performs just the primary account + secret finalizer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make this comment self-contained? Otherwise one need t open this file and scroll up to the previous definition to make sense of it.

var accountType AccountType

if exactUserName == "" {
accountType = "User"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use the type constant here instead of value "User".

} else if userNamePrefix != "" {
return nil, nil, fmt.Errorf("userNamePrefix and exactUserName are mutually exclusive")
} else {
accountType = "System"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use the type constant here instead of value "System".


// ******** TEMPORARY ************
// Pull the DbRootPassword from osp-secret to allow CI / external
// scripts to work, until they can adapt to root_auth.sh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we rely on a validation webhook to ensure the new secret is populated, this block would change, right? We wouldn't need to have this TEMPORARY wrapper comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure im following but if you mean we had validation to ensure that Galera->RootDatabaseAccount->MariaDBAccount->Secret were populated, then I guess we could get rid of that block and no longer care about DbRootPassword, but how do we do upgrades?

Comment on lines +635 to +639
// this allows us to avoid having to change the Spec, which seems to be not
// best practice here and also seems to interfere with the openstack
// controller's ability to see the Galera status as Completed (though this
// might only be due to CRDs not being in sync between mariadb-operator and
// openstack-operator during development)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should get rid of this section of the comments, as they are not leading to any conclusion.
I kind of like the idea of explaining why we copy the root secret to Status and why it makes sense to use that field during any interaction with the DB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants