Open
Description
On preemtible nodes we had one instance of manual recovery, after #30. There was no mariadb-1
pod, and -0
and -2
stayed crashlooping. The were past init but the mariadb containers exited after:
2020-05-27 6:10:26 140050350966464 [Warning] WSREP: last inactive check more than PT1.5S ago (PT3.50176S), skipping check
2020-05-27 6:10:55 140050350966464 [Note] WSREP: view((empty))
2020-05-27 6:10:55 140050350966464 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
at gcomm/src/pc.cpp:connect():158
2020-05-27 6:10:55 140050350966464 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():209: Failed to open backend connection: -110 (Connection timed out)
2020-05-27 6:10:55 140050350966464 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1458: Failed to open channel 'my_wsrep_cluster' at 'gcomm://mariadb-0.mariadb,mariadb-1.mariadb,mariadb-2.mariadb': -110 (Connection timed out)
2020-05-27 6:10:55 140050350966464 [ERROR] WSREP: gcs connect failed: Connection timed out
2020-05-27 6:10:55 140050350966464 [ERROR] WSREP: wsrep::connect(gcomm://mariadb-0.mariadb,mariadb-1.mariadb,mariadb-2.mariadb) failed: 7
2020-05-27 6:10:55 140050350966464 [ERROR] Aborting
This could be a case for switching from OrderedReady to Parrallel.
The solution was to scale down to zero and then back up to three again. Oddly the pods wouldn't go away at scale to zero, so I had to manually delete mariadb-2. Is that the expected behavior for OrderedReady?
Metadata
Metadata
Assignees
Labels
No labels