-
Notifications
You must be signed in to change notification settings - Fork 1k
Extending support for rds_topology metadata handling #4992
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: v3.0
Are you sure you want to change the base?
Conversation
… Deployment topologies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @nielsvogell, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request lays the groundwork for supporting AWS RDS Blue/Green deployments within the monitor. It extends the existing rds_topology
discovery mechanism to recognize the Blue/Green table format, switch to a more detailed query, and process the results, although full handling of roles and statuses is left for future commits. The feature is disabled by default.
Highlights
- Blue/Green Deployment Topology Discovery: I've added initial support for discovering servers involved in AWS RDS Blue/Green deployments by querying the extended
mysql.rds_topology
table which includesrole
,status
, andversion
columns. - Dynamic Query Switching: The monitor now dynamically determines whether the
rds_topology
table contains Multi-AZ cluster information (3 rows) or Blue/Green deployment information (even number of rows) and switches the query to fetch the appropriate columns on subsequent checks. - Refactored Topology Processing: The logic for processing discovered topology has been refactored to handle the different structures of the
rds_topology
table for Multi-AZ and Blue/Green deployments, preparing for future handling of roles and statuses. - New Monitor Task Types: Introduced new monitor task types (
MON_READ_ONLY__AND__AWS_RDS_BLUE_GREEN_TOPOLOGY_DISCOVERY
, etc.) to represent the different topology queries.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configureGemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
Can one of the admins verify this patch? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces foundational changes to support AWS RDS Blue/Green deployment topology discovery. It adds new monitoring task types, SQL queries for the extended mysql.rds_topology
schema, and refactors the logic for processing discovered topologies. The changes appear to be a solid step towards the full feature, with good attention to detail in parsing and state management. Key improvements include more robust checks for Multi-AZ cluster identification and the use of unordered_set
for better performance in process_discovered_topology
.
A few areas to double-check include the updated logic for added_new_server
in MySQL_HostGroups_Manager.cpp
to ensure it aligns with the intended downstream effects, and the error logging in monitor_read_only_process_ready_tasks
to avoid potentially misleading messages for non-topology tasks. The use of assert
for runtime_mysql_servers
should also be reviewed for production robustness.
lib/MySQL_Monitor.cpp
Outdated
@@ -3456,47 +3502,70 @@ void MySQL_Monitor::process_discovered_topology(const std::string& originating_s | |||
char *query=(char *)"SELECT DISTINCT hostname FROM monitor_internal.mysql_servers ORDER BY hostname"; | |||
proxy_debug(PROXY_DEBUG_ADMIN, 4, "%s\n", query); | |||
monitordb->execute_statement(query, &error, &cols, &affected_rows, &runtime_mysql_servers); | |||
assert(runtime_mysql_servers); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using assert(runtime_mysql_servers)
is good for debugging, but it becomes a no-op in release builds. If monitordb->execute_statement
can return nullptr
for runtime_mysql_servers
without setting the error
flag (e.g., if the table is empty), this could lead to a dereference of a null pointer when accessing runtime_mysql_servers->rows
.
Consider adding an explicit null check for runtime_mysql_servers
after the execute_statement
call, in addition to checking the error
flag, to prevent potential crashes in production if the function contract allows *resultset
to be nullptr
on success with no rows.
assert(runtime_mysql_servers); | |
if (error) { | |
proxy_error("Error on %s : %s\n", query, error); | |
} else { | |
if (runtime_mysql_servers == nullptr) { | |
proxy_error("runtime_mysql_servers is NULL but no error was returned from execute_statement. Possible memory corruption?\n", query); | |
return; // Or some other appropriate error handling | |
} | |
assert(runtime_mysql_servers); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I concur. assert
is used in combination with MyHG->execute_query
, which is where I got it from but it is actually not required here. Will remove that line.
@@ -1077,7 +1077,7 @@ class MySQL_HostGroups_Manager : public Base_HostGroups_Manager<MyHGC> { | |||
void set_Readyset_status(char *hostname, int port, enum MySerStatus status); | |||
unsigned long long Get_Memory_Stats(); | |||
|
|||
void add_discovered_servers_to_mysql_servers_and_replication_hostgroups(const vector<tuple<string, int, int>>& new_servers); | |||
void add_discovered_servers_to_mysql_servers_and_replication_hostgroups(const vector<tuple<string, int, long, int>>& new_servers); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The addition of long
to the tuple suggests that hostgroup IDs can now be larger than int
. Ensure that all related code (e.g., database schema, other functions using hostgroup IDs) are updated to handle long
values to prevent potential truncation or overflow issues.
void add_discovered_servers_to_mysql_servers_and_replication_hostgroups(const vector<tuple<string, int, long, int>>& new_servers);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I aligned the data type with the existing extraction. Reading further in the code, it seems that this long
is actually converted to uint_32
implicitly right after. Did this have a reason in the original implementation? Happy to change both the new and the existing variable to use int
or unsigned int
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The correct datatype for this would be uint32_t
, since it's the datatype for the target function in which this data should be piped (create_new_server_in_hg
).
int res = create_new_server_in_hg(hostgroup_id, srv_info, srv_opts); | ||
if (added_new_server < 0) { | ||
added_new_server = res; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic for added_new_server
has changed. Previously, it would reflect the status of the last server addition attempt. Now, it reflects the status of the first attempt if it was successful (0), or the error code of the first failure if all prior attempts also failed.
For example:
- If
create_new_server_in_hg
returns 0 for success, -1 for failure:- Old:
S, S, F
->added_new_server = -1
- New:
S, S, F
->added_new_server = 0
(from the first S) - Old:
F, S, S
->added_new_server = 0
- New:
F, S, S
->added_new_server = -1
(from the first F)
- Old:
Confirm if this change in how added_new_server
is determined aligns with the intended behavior. If the goal is to know if any server was successfully added, a boolean flag might be clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be missing something, but wouldn't F, S, S
result in added_new_server = 0
?
First server:
res = -1
, added_new_server = -1
-> added_new_server = -1
Second server:
res = 0
, added_new_server = -1
-> added_new_server = 0
Third server:
res = 0
, added_new_server = 0
-> no change because added_new_server >= 0
The desired behavior is to update the internal structures if at least one new server was added. The previous implementation didn't do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You reasoning is correct, and I agree the new behavior is better than the one on the previous impl.
lib/MySQL_Monitor.cpp
Outdated
if (fields && is_aws_rds_topology_query_task(mmsd->get_task_type()) && mysql_row_matches_query_task(field_names, mmsd->get_task_type())) { | ||
// Process the read_only field as above and store the first server | ||
vector<MYSQL_ROW> discovered_servers; | ||
discovered_servers.push_back(row); | ||
|
||
// Store the remaining servers | ||
int num_rows = mysql_num_rows(mmsd->result); | ||
for (int i = 1; i < num_rows; i++) { | ||
row = mysql_fetch_row(mmsd->result); | ||
discovered_servers.push_back(row); | ||
VALGRIND_DISABLE_ERROR_REPORTING; | ||
if (row[j]) { | ||
if (!strcmp(row[j], "0") || !strcasecmp(row[j], "OFF")) | ||
read_only = 0; | ||
} | ||
VALGRIND_ENABLE_ERROR_REPORTING; | ||
} | ||
} | ||
|
||
// Store the remaining servers | ||
int num_rows = mysql_num_rows(mmsd->result); | ||
for (int i = 1; i < num_rows; i++) { | ||
MYSQL_ROW row = mysql_fetch_row(mmsd->result); | ||
discovered_servers.push_back(row); | ||
} | ||
|
||
// Process the discovered servers and add them to 'runtime_mysql_servers' (process only for AWS RDS Multi-AZ DB Clusters) | ||
if (!discovered_servers.empty() && is_aws_rds_multi_az_db_cluster_topology(discovered_servers)) { | ||
process_discovered_topology(originating_server_hostname, discovered_servers, mmsd->reader_hostgroup); | ||
// Process the discovered servers and add them to 'runtime_mysql_servers' | ||
if (!discovered_servers.empty()) { | ||
process_discovered_topology(originating_server_hostname, discovered_servers, mmsd, num_fields); | ||
} | ||
} else { | ||
proxy_error("mysql_fetch_fields returns NULL, or mysql_num_fields is incorrect. Server %s:%d . See bug #1994\n", mmsd->hostname, mmsd->port); | ||
rc = (*proxy_sqlite3_bind_null)(statement, 5); ASSERT_SQLITE_OK(rc, mmsd->mondb); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The else
block starting on line 7717 logs an error: proxy_error("mysql_fetch_fields returns NULL, or mysql_num_fields is incorrect. Server %s:%d . See bug #1994\n", mmsd->hostname, mmsd->port);
.
This error message might be misleading if the task is a simple read_only check (i.e., is_aws_rds_topology_query_task(mmsd->get_task_type())
is false).
In such a case, fields
might be valid and num_fields
might be 1 (for read_only
), which is correct for a non-topology task. The current structure implies this error is logged if it's not a topology task that also matches its expected fields.
Consider restructuring the conditional logic to ensure this error is only logged when it's genuinely an issue with field fetching/matching for an expected topology query, and not for standard read_only tasks.
if (fields && is_aws_rds_topology_query_task(mmsd->get_task_type()) && mysql_row_matches_query_task(field_names, mmsd->get_task_type())) { | |
// Process the read_only field as above and store the first server | |
vector<MYSQL_ROW> discovered_servers; | |
discovered_servers.push_back(row); | |
// Store the remaining servers | |
int num_rows = mysql_num_rows(mmsd->result); | |
for (int i = 1; i < num_rows; i++) { | |
row = mysql_fetch_row(mmsd->result); | |
discovered_servers.push_back(row); | |
VALGRIND_DISABLE_ERROR_REPORTING; | |
if (row[j]) { | |
if (!strcmp(row[j], "0") || !strcasecmp(row[j], "OFF")) | |
read_only = 0; | |
} | |
VALGRIND_ENABLE_ERROR_REPORTING; | |
} | |
} | |
// Store the remaining servers | |
int num_rows = mysql_num_rows(mmsd->result); | |
for (int i = 1; i < num_rows; i++) { | |
MYSQL_ROW row = mysql_fetch_row(mmsd->result); | |
discovered_servers.push_back(row); | |
} | |
// Process the discovered servers and add them to 'runtime_mysql_servers' (process only for AWS RDS Multi-AZ DB Clusters) | |
if (!discovered_servers.empty() && is_aws_rds_multi_az_db_cluster_topology(discovered_servers)) { | |
process_discovered_topology(originating_server_hostname, discovered_servers, mmsd->reader_hostgroup); | |
// Process the discovered servers and add them to 'runtime_mysql_servers' | |
if (!discovered_servers.empty()) { | |
process_discovered_topology(originating_server_hostname, discovered_servers, mmsd, num_fields); | |
} | |
} else { | |
proxy_error("mysql_fetch_fields returns NULL, or mysql_num_fields is incorrect. Server %s:%d . See bug #1994\n", mmsd->hostname, mmsd->port); | |
rc = (*proxy_sqlite3_bind_null)(statement, 5); ASSERT_SQLITE_OK(rc, mmsd->mondb); | |
} | |
if (fields && is_aws_rds_topology_query_task(mmsd->get_task_type()) && mysql_row_matches_query_task(field_names, mmsd->get_task_type())) { | |
// Process the read_only field as above and store the first server |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error message slipped to the wrong if
clause after refactoring. I'll move it one level up, where it came from.
|
add to whitelist |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some other small comments, but I think we should start discussing/addressing this first issues which may require bigger rewrites. We can later iterate into smaller details once these initial points are addressed.
|| (rds_topology_check_type == AWS_RDS_BLUE_GREEN_DEPLOYMENT_STATE_CHECK && discovered_servers.size() % 2 != 0)) { | ||
// Query result matches neither a Multi_AZ DB Cluster nor a Blue/Green deployment | ||
rds_topology_check_type = AWS_RDS_TOPOLOGY_CHECK; // Set back to default rds_topology check | ||
proxy_debug(PROXY_DEBUG_MONITOR, 7, "Got a query result for the rds_topology metadata table but it matches neither Multi-AZ DB Clusters, nor a blue/green deployment. Number of records: %d\n", discovered_servers.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Detail here, compilation warning, %llu
should be the format specifier. We should revisit the debug/non-debug messages in a later iteration.
|
||
// Do an initial loop through the query results to save existing runtime server hostnames | ||
// Do a loop through the query results to save existing runtime server hostnames | ||
for (std::vector<SQLite3_row *>::iterator it = runtime_mysql_servers->rows.begin(); it != runtime_mysql_servers->rows.end(); it++) { | ||
SQLite3_row *r1 = *it; | ||
string current_runtime_hostname = r1->fields[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This belongs to the previous implementation, but it's unsafe. This resultset
is obtained querying monitor_internal.mysql_servers
. Accessing this data should always be done using GloMyMon->mysql_servers_mutex
. Not doing so could result into race conditions, as other threads may rebuild this table previous to their monitoring operations. On top of that, there is no guarantee that after the read_only
operation was scheduled, and before it's resolution and processing, the table itself wasn't modified by other thread, for example, cleaning it. That will turn this line into a SEGFAULT
.
A more memory safe approach would be to use the resulset
already updated by other monitoring threads to access the servers that are actively monitored, like:
{
std::lock_guard<std::mutex> mysql_servers_guard(MyHGM->mysql_servers_to_monitor_mutex);
for (
auto it = MyHGM->mysql_servers_to_monitor->rows.begin();
it != MyHGM->mysql_servers_to_monitor->rows.end();
it++
) {
// Rest of the code...
}
}
However, I think that implementation would also be flawed. I said this, because this resultset
is used to later determine which servers have been 'discovered', assuming their discover if they are not present in this monitoring resulset
. If my reading is correct, I think this is wrong, since this will mix any configured server that is candidate for monitoring (ping
, connect
, ...) with servers purposely chosen for read_only
monitoring (i.e. placed by config in hostgroups
that are also configured in mysql_replication_hostgroups
).
This is why at the beginning of read_only
monitoring actions there is a filtering for such candidates servers using the following query:
void * MySQL_Monitor::monitor_read_only() {
...
char *query=(char *)"SELECT hostname, port, MAX(use_ssl) use_ssl, check_type, reader_hostgroup FROM mysql_servers JOIN mysql_replication_hostgroups ON hostgroup_id=writer_hostgroup OR hostgroup_id=reader_hostgroup WHERE status NOT IN (2,3) GROUP BY hostname, port ORDER BY RANDOM()";
...
The current implementation will present odd behaviors. E.g. if a server that is going to be part of a Multi AZ cluster is configured in another random hostgroup
, that doesn't belong to the mysql_replication_hostgroups
of the cluster, the server won't be a candidate for autodiscovery. Since it's already being monitored for connect
or ping
operations.
A more correct implementation for determining if a server has been autodiscovered would use the resulset
(or analogous structure) for this original server filtering that takes place at the beginning of read_only
monitoring actions. This would be a similar approach to the one taken in the Group Replication
implementation, the following member usage can be seeing as the reference:
const std::vector<gr_host_def_t>* cur_monitored_gr_srvs = nullptr;
This will match the server hostnames
only against the servers that have been configured in mysql_replication_hostgroups
, which are the real candidates for this cluster monitoring. I suggest to rework the implementation in a similar way to this, since the current one doesn't respect the user configuration regarding to server placement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point. The original resultset is passed to monitor_read_only_process_ready_tasks
in the form of std::vector<MySQL_Monitor_State_Data*>& mmsds
from which a std::list<read_only_server_t> mysql_servers;
is generated by processing each mmsd
iteratively. Since we would need the full list when processing the discovered topology, which can happen before mysql_servers
is fully populated, I would propose to pass mmsds
by reference to process_discovered_topology
and create the list of read_only servers there.
void MySQL_Monitor::process_discovered_topology(..., const std::vector<MySQL_Monitor_State_Data*>& mmsds) {
...
unordered_set<string> saved_hostnames;
for (auto mmsd : mmsds) {
saved_hostnames.insert(mmsd->hostname);
}
...
Pro:
- this reuses the same list of
read_only
-check servers as the calling function (monitor_read_only_process_ready_tasks
). - also, doesn't modify the standard
read_only
function signatures and - only affects servers that actually have topology data to process.
Con:
- when processing the topology from multiple servers, the loop runs for each of them instead of reusing the list of
saved_hostnames
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, I also like the approach of storing the list of read_only_monitor servers or even just the AWS related ones in a class member of MySQL_Monitor
like here
SQLite3_result *Group_Replication_Hosts_resultset;
std::map<std::string, Galera_monitor_node *> Galera_Hosts_Map;
SQLite3_result *Galera_Hosts_resultset;
std::map<std::string, AWS_Aurora_monitor_node *> AWS_Aurora_Hosts_Map;
SQLite3_result *AWS_Aurora_Hosts_resultset;
I think that is in line with the group replication approach you mentioned:
4306: vector<gr_host_def_t> hosts_defs { extract_gr_host_defs(wr_hg, GloMyMon->Group_Replication_Hosts_resultset) };
...
4376: conn_mmsds[rnd_discoverer]->cur_monitored_gr_srvs = &hosts_defs;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the second approach is safer, since a server can be missing from the mmsd
list for read_only
actions if MySQL_Monitor_Connection_Pool
fails to return a connection for it (for example, if the pool isn't yet warm-up, or a connection has been lost). You can see the logic responsible here.
This could lead to 'discovering' a server that in reality was already configured by the user, but was missing from the list. I think it's more complex for both user-side and implementation to require rules/precedence for these kind of conflict resolutions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Will rewrite it using that approach then. Do you think there is a safe way to detect whether a discovered server was previously added manually? I was thinking of adding a mechanism to remove a server that was only added through discovery if it is later removed from the rds_topology table. Not really necessary at this point, just thinking ahead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yw! Yes, the most straight-forward way I can think for detecting a server which has been added by auto-discovery should be via comparison of the memory
vs runtime
table for mysql_servers
. Meaning that an auto-discovered server isn't user-configured, so shouldn't be in the memory
database, should only exist at runtime
.
I think this is a good principle, since it aligns with the definition of auto-discovered
, not previously there or configured. Also, this could be supported by extra conventions in the server comment
field, this could be modified changing the definition of create_new_server_in_hg
.
int res = create_new_server_in_hg(hostgroup_id, srv_info, srv_opts); | ||
if (added_new_server < 0) { | ||
added_new_server = res; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You reasoning is correct, and I agree the new behavior is better than the one on the previous impl.
@@ -1077,7 +1077,7 @@ class MySQL_HostGroups_Manager : public Base_HostGroups_Manager<MyHGC> { | |||
void set_Readyset_status(char *hostname, int port, enum MySerStatus status); | |||
unsigned long long Get_Memory_Stats(); | |||
|
|||
void add_discovered_servers_to_mysql_servers_and_replication_hostgroups(const vector<tuple<string, int, int>>& new_servers); | |||
void add_discovered_servers_to_mysql_servers_and_replication_hostgroups(const vector<tuple<string, int, long, int>>& new_servers); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The correct datatype for this would be uint32_t
, since it's the datatype for the target function in which this data should be piped (create_new_server_in_hg
).
Context
ProxySql currently supports the discovery of AWS RDS Multi-AZ Cluster topologies by querying the
mysql.rds_topology
table on every nth (withn = monitor_aws_rds_topology_discovery_interval
) read-only check. Therds_topology
table contains information about cluster instances in the formatThe same table is used for RDS databases in Blue/Green Deployments, where instead of cluster instances, it contains information about the blue and the green endpoints. Its schema is extended by three additional columns (
role
,status
, andversion
), so that the data looks like this:Blue/Green Deployments
An RDS blue/green deployment is created for an existing (blue) RDS database (DB instance or cluster) and creates a read-only copy (green) of that database on a new server. The green environment may be modified (e.g., by performing an engine version upgrade). When ready, a switchover can be performed, which will promote the green database to a standalone and change its endpoint to the original (blue) server's name.
Changes in this PR
Note: this PR only contains the first commit towards a full feature. The remaining commits are in this working branch. I thought that reviewing them step by step might make the process easier.
Also note that currently this feature is by default disabled (
monitor_aws_rds_topology_discovery_interval = 0
), which is recommended until all required changes for the full feature are merged.This PR adds support to reading the extended
rds_topology
table for both theREAD_ONLY
and theINNODB_READ_ONLY
check. To preserve backwards compatibility, by default, the topology query remains the same and only when the discovered topology matches one found in blue/green deployments (even number of entries as opposed to the three found in Multi-AZ DB Clusters), do we switch the query to include the additional columns.Highlevel change list:
MON_INNDOB_READ_ONLY
andMON_READ_ONLY
)process_discovered_topology
to see if the query result matches the expectations and whether to switch to querying for blue/green deployment metadatastatus
androle
columns (different to Multi-AZ DB Clusters, the server records from a blue/green deployment will need to be dynamically updated during a switchover)How was this tested
read_only
check. Verified that the other two servers are discovered as expected.mysql.rds_topology
table with records as in the example above. Added the blue endpoint as a server into a hostgroup withread_only
check enabled. Verified that the green endpoint was discovered and added to the runtime server list as expected.Future steps
To make the feature complete, we nee to add a few more partial features. Roughly these are
read_only
polling frequency during a switchover to minimize downtime (allows to recognize when the green database is promoted, without waiting for DNS changes to take effect and be noticed by ProxySQL)