Skip to content

Second pod start hangs on Running wsrep_sst_rsync #27

Open
@solsson

Description

@solsson

With a fresh microk8s 1.14.4 cluster kubectl apply -k . gets mariadb-0 to Ready state, but mariadb-1 blocks during start.

2019-07-27 14:25:40 139726685889856 [Note] mysqld (mysqld 10.2.25-MariaDB-1:10.2.25+maria~bionic) starting as process 1 ...
2019-07-27 14:25:40 139726685889856 [Note] WSREP: Read nil XID from storage engines, skipping position init
2019-07-27 14:25:40 139726685889856 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera/libgalera_smm.so'
2019-07-27 14:25:40 139726685889856 [Note] WSREP: wsrep_load(): Galera 25.3.26(r3857) by Codership Oy <[email protected]> loaded successfully.
2019-07-27 14:25:40 139726685889856 [Note] WSREP: CRC-32C: using hardware acceleration.
2019-07-27 14:25:40 139726685889856 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1, safe_to_bootstrap: 1
2019-07-27 14:25:40 139726685889856 [Note] WSREP: Passing config to GCS: base_dir = /data/db/; base_host = 10.1.1.49; base_port = 4567; cert.log_conflicts = no; cert.optimistic_pa = yes; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /data/db/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /data/db//galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false;
2019-07-27 14:25:40 139726685889856 [Note] WSREP: GCache history reset: 00000000-0000-0000-0000-000000000000:0 -> 00000000-0000-0000-0000-000000000000:-1
2019-07-27 14:25:40 139726685889856 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
2019-07-27 14:25:40 139726685889856 [Note] WSREP: wsrep_sst_grab()
2019-07-27 14:25:40 139726685889856 [Note] WSREP: Start replication
2019-07-27 14:25:40 139726685889856 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
2019-07-27 14:25:40 139726685889856 [Note] WSREP: protonet asio version 0
2019-07-27 14:25:40 139726685889856 [Note] WSREP: Using CRC-32C for message checksums.
2019-07-27 14:25:40 139726685889856 [Note] WSREP: backend: asio
2019-07-27 14:25:40 139726685889856 [Note] WSREP: gcomm thread scheduling priority set to other:0 
2019-07-27 14:25:40 139726685889856 [Note] WSREP: restore pc from disk successfully
2019-07-27 14:25:40 139726685889856 [Note] WSREP: GMCast version 0
2019-07-27 14:25:40 139726685889856 [Warning] WSREP: Failed to resolve tcp://mariadb-1.mariadb:4567
2019-07-27 14:25:40 139726685889856 [Warning] WSREP: Failed to resolve tcp://mariadb-2.mariadb:4567
2019-07-27 14:25:40 139726685889856 [Note] WSREP: (5559eb88, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2019-07-27 14:25:40 139726685889856 [Note] WSREP: (5559eb88, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2019-07-27 14:25:40 139726685889856 [Note] WSREP: EVS version 0
2019-07-27 14:25:40 139726685889856 [Note] WSREP: gcomm: connecting to group 'my_wsrep_cluster', peer 'mariadb-0.mariadb:,mariadb-1.mariadb:,mariadb-2.mariadb:'
2019-07-27 14:25:40 139726685889856 [Note] WSREP: (5559eb88, 'tcp://0.0.0.0:4567') connection established to 4188e6e8 tcp://10.1.1.48:4567
2019-07-27 14:25:40 139726685889856 [Note] WSREP: (5559eb88, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: 
2019-07-27 14:25:41 139726685889856 [Note] WSREP: gcomm: connected
2019-07-27 14:25:41 139726685889856 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
2019-07-27 14:25:41 139726685889856 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
2019-07-27 14:25:41 139726685889856 [Note] WSREP: Opened channel 'my_wsrep_cluster'
2019-07-27 14:25:41 139726685889856 [Note] WSREP: Waiting for SST to complete.
2019-07-27 14:25:43 139726519949056 [Warning] WSREP: no nodes coming from prim view, prim not possible
2019-07-27 14:25:43 139726519949056 [Note] WSREP: view(view_id(NON_PRIM,5559eb88,4) memb {
	5559eb88,0
} joined {
} left {
} partitioned {
})
2019-07-27 14:25:43 139726505051904 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2019-07-27 14:25:43 139726505051904 [Note] WSREP: Flow-control interval: [16, 16]
2019-07-27 14:25:43 139726505051904 [Note] WSREP: Trying to continue unpaused monitor
2019-07-27 14:25:43 139726505051904 [Note] WSREP: Received NON-PRIMARY.
2019-07-27 14:25:43 139726684378880 [Note] WSREP: New cluster view: global state: :-1, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version -1
2019-07-27 14:25:43 139726684378880 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2019-07-27 14:25:43 139726519949056 [Note] WSREP: (5559eb88, 'tcp://0.0.0.0:4567') turning message relay requesting off
2019-07-27 14:25:44 139726519949056 [Note] WSREP: declaring 4188e6e8 at tcp://10.1.1.48:4567 stable
2019-07-27 14:25:44 139726519949056 [Note] WSREP: re-bootstrapping prim from partitioned components
2019-07-27 14:25:44 139726519949056 [Note] WSREP: view(view_id(PRIM,4188e6e8,5) memb {
	4188e6e8,0
	5559eb88,0
} joined {
} left {
} partitioned {
})
2019-07-27 14:25:44 139726519949056 [Note] WSREP: save pc into disk
2019-07-27 14:25:44 139726505051904 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2
2019-07-27 14:25:44 139726505051904 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
2019-07-27 14:25:44 139726519949056 [Note] WSREP: clear restored view
2019-07-27 14:25:44 139726505051904 [Note] WSREP: STATE EXCHANGE: sent state msg: 6b41ad2f-b07a-11e9-b313-aefa479322f6
2019-07-27 14:25:44 139726505051904 [Note] WSREP: STATE EXCHANGE: got state msg: 6b41ad2f-b07a-11e9-b313-aefa479322f6 from 0 (mariadb-0)
2019-07-27 14:25:44 139726505051904 [Note] WSREP: STATE EXCHANGE: got state msg: 6b41ad2f-b07a-11e9-b313-aefa479322f6 from 1 (mariadb-1)
2019-07-27 14:25:44 139726505051904 [Warning] WSREP: Quorum: No node with complete state:

	Version      : 4
	Flags        : 0x3
	Protocols    : 0 / 9 / 3
	State        : NON-PRIMARY
	Desync count : 0
	Prim state   : SYNCED
	Prim UUID    : 55b40e71-b07a-11e9-8369-fb3b92b43c93
	Prim  seqno  : 2
	First seqno  : -1
	Last  seqno  : 4
	Prim JOINED  : 1
	State UUID   : 6b41ad2f-b07a-11e9-b313-aefa479322f6
	Group UUID   : 3f1e1add-b07a-11e9-b142-86a198db1bc6
	Name         : 'mariadb-0'
	Incoming addr: '10.1.1.48:3306'

	Version      : 4
	Flags        : 00
	Protocols    : 0 / 9 / 3
	State        : NON-PRIMARY
	Desync count : 0
	Prim state   : NON-PRIMARY
	Prim UUID    : 00000000-0000-0000-0000-000000000000
	Prim  seqno  : -1
	First seqno  : -1
	Last  seqno  : -1
	Prim JOINED  : 0
	State UUID   : 6b41ad2f-b07a-11e9-b313-aefa479322f6
	Group UUID   : 00000000-0000-0000-0000-000000000000
	Name         : 'mariadb-1'
	Incoming addr: '10.1.1.49:3306'

2019-07-27 14:25:44 139726505051904 [Note] WSREP: Full re-merge of primary 55b40e71-b07a-11e9-8369-fb3b92b43c93 found: 1 of 1.
2019-07-27 14:25:44 139726505051904 [Note] WSREP: Quorum results:
	version    = 4,
	component  = PRIMARY,
	conf_id    = 2,
	members    = 1/2 (joined/total),
	act_id     = 4,
	last_appl. = -1,
	protocols  = 0/9/3 (gcs/repl/appl),
	group UUID = 3f1e1add-b07a-11e9-b142-86a198db1bc6
2019-07-27 14:25:44 139726505051904 [Note] WSREP: Flow-control interval: [23, 23]
2019-07-27 14:25:44 139726505051904 [Note] WSREP: Trying to continue unpaused monitor
2019-07-27 14:25:44 139726505051904 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 4)
2019-07-27 14:25:44 139726684378880 [Note] WSREP: State transfer required: 
	Group state: 3f1e1add-b07a-11e9-b142-86a198db1bc6:4
	Local state: 00000000-0000-0000-0000-000000000000:-1
2019-07-27 14:25:44 139726684378880 [Note] WSREP: New cluster view: global state: 3f1e1add-b07a-11e9-b142-86a198db1bc6:4, view# 3: Primary, number of nodes: 2, my index: 1, protocol version 3
2019-07-27 14:25:44 139726684378880 [Warning] WSREP: Gap in state sequence. Need state transfer.
2019-07-27 14:25:44 139726291986176 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '10.1.1.49' --datadir '/data/db/'   --parent '1'  ''  '''

Nothing happens after that last line. The state is minimal because all volumes were empty. The --address is the pod IP of mariadb-1.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions