• Type: Bug
    • Status: Done
    • Priority: Medium
    • Resolution: Duplicate
    • Affects Version/s: 1.5.0
    • Fix Version/s: None
    • Component/s: None


      killed pod fails to re-join when the whole cluster is under load, the log is folloing



      2020-09-23T13:07:56.926249Z 0 [Note] [MY-000000] [Galera] Skipped GCache ring buffer recovery: could not determine history UUID.
      2020-09-23T13:07:56.928172Z 0 [Note] [MY-000000] [Galera] Passing config to GCS: base_dir = /var/lib/mysql/; base_host =; base_port = 4567; cert.log_conflicts = no; cert.optimistic_pa = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 10; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 4; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.freeze_purge_at_seqno = -1; gcache.keep_pages_count = 0; gcache.keep_pages_size = 0; gcache.mem_size = 0; = galera.cache; gcache.page_size = 128M; gcache.recover = yes; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 100; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo = false; pc.recovery = true; pc.version = 0; pc.wait_prim = true; pc.wait_prim_timeout = PT30S; pc.weight = 1; protonet.backend = asio; protonet.version = 0; repl.causal_read_timeout = PT30S; repl.commit_order = 3; repl.key_format = FLAT8; repl.max_ws_size = 2147483647; repl.proto_max = 10; socket.checksum = 2; socket.recv_buf_size = 212992; socket.ssl_ca = /etc/mysql/ssl-internal/ca.crt; socket.ssl_cert = /etc/mysql/ssl-internal/tls.crt; socket.ssl_cipher = ; socket.ssl_compression = YES; socket.ssl_key = /etc/mysql/ssl-internal/tls.key; 
      2020-09-23T13:07:56.950425Z 0 [Note] [MY-000000] [WSREP] Starting replication
      2020-09-23T13:07:56.950536Z 0 [Note] [MY-000000] [Galera] Connecting with bootstrap option: 0
      2020-09-23T13:07:56.950618Z 0 [Note] [MY-000000] [Galera] Setting GCS initial position to 00000000-0000-0000-0000-000000000000:-1
      2020-09-23T13:07:56.950764Z 0 [Note] [MY-000000] [Galera] protonet asio version 0
      2020-09-23T13:07:56.950957Z 0 [Note] [MY-000000] [Galera] Using CRC-32C for message checksums.
      2020-09-23T13:07:56.951063Z 0 [Note] [MY-000000] [Galera] initializing ssl context
      2020-09-23T13:07:56.951397Z 0 [Note] [MY-000000] [Galera] backend: asio
      2020-09-23T13:07:56.951560Z 0 [Note] [MY-000000] [Galera] gcomm thread scheduling priority set to other:0 
      2020-09-23T13:07:56.951774Z 0 [Warning] [MY-000000] [Galera] Fail to access the file (/var/lib/mysql//gvwstate.dat) error (No such file or directory). It is possible if node is booting for first time or re-booting after a graceful shutdown
      2020-09-23T13:07:56.951879Z 0 [Note] [MY-000000] [Galera] Restoring primary-component from disk failed. Either node is booting for first time or re-booting after a graceful shutdown
      2020-09-23T13:07:56.952209Z 0 [Note] [MY-000000] [Galera] GMCast version 0
      2020-09-23T13:07:56.955172Z 0 [Note] [MY-000000] [Galera] (cc6ba230, 'ssl://') listening at ssl://
      2020-09-23T13:07:56.955518Z 0 [Note] [MY-000000] [Galera] (cc6ba230, 'ssl://') multicast: , ttl: 1
      2020-09-23T13:07:56.956751Z 0 [Note] [MY-000000] [Galera] EVS version 1
      2020-09-23T13:07:56.957304Z 0 [Note] [MY-000000] [Galera] gcomm: connecting to group 'cluster1', peer 'cluster1-pxc-0.cluster1-pxc:,cluster1-pxc-2.cluster1-pxc:'
      2020-09-23T13:07:56.969882Z 0 [Note] [MY-000000] [Galera] SSL handshake successful, remote endpoint ssl:// local endpoint ssl:// cipher: ECDHE-RSA-AES256-GCM-SHA384 compression: none
      2020-09-23T13:07:56.971028Z 0 [Note] [MY-000000] [Galera] (cc6ba230, 'ssl://') connection established to 0dfacf76 ssl://
      2020-09-23T13:07:56.971277Z 0 [Note] [MY-000000] [Galera] (cc6ba230, 'ssl://') turning message relay requesting on, nonlive peers: 
      2020-09-23T13:07:56.971644Z 0 [Note] [MY-000000] [Galera] SSL handshake successful, remote endpoint ssl:// local endpoint ssl:// cipher: ECDHE-RSA-AES256-GCM-SHA384 compression: none
      2020-09-23T13:07:56.972762Z 0 [Note] [MY-000000] [Galera] (cc6ba230, 'ssl://') connection established to 433c30b5 ssl://
      2020-09-23T13:07:57.460823Z 0 [Note] [MY-000000] [Galera] EVS version upgrade 0 -> 1
      2020-09-23T13:07:57.461253Z 0 [Note] [MY-000000] [Galera] declaring 0dfacf76 at ssl:// stable
      2020-09-23T13:07:57.461636Z 0 [Note] [MY-000000] [Galera] declaring 433c30b5 at ssl:// stable
      2020-09-23T13:07:57.462032Z 0 [Note] [MY-000000] [Galera] PC protocol upgrade 0 -> 1
      2020-09-23T13:07:57.463316Z 0 [Note] [MY-000000] [Galera] Node 0dfacf76 state primary
      2020-09-23T13:07:57.464422Z 0 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node
      view (view_id(PRIM,0dfacf76,5)
      memb {
      joined {
      left {
      partitioned {
      2020-09-23T13:07:57.464853Z 0 [Note] [MY-000000] [Galera] Save the discovered primary-component to disk
      2020-09-23T13:07:57.958883Z 0 [Note] [MY-000000] [Galera] gcomm: connected
      2020-09-23T13:07:57.959394Z 0 [Note] [MY-000000] [Galera] Changing maximum packet size to 64500, resulting msg size: 32636
      2020-09-23T13:07:57.960045Z 0 [Note] [MY-000000] [Galera] Shifting CLOSED -> OPEN (TO: 0)
      2020-09-23T13:07:57.960501Z 0 [Note] [MY-000000] [Galera] Opened channel 'cluster1'
      2020-09-23T13:07:57.961243Z 0 [Note] [MY-000000] [Galera] New COMPONENT: primary = yes, bootstrap = no, my_idx = 2, memb_num = 3
      2020-09-23T13:07:57.961856Z 1 [Note] [MY-000000] [WSREP] Starting rollbacker thread 1
      2020-09-23T13:07:57.962599Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: Waiting for state UUID.
      2020-09-23T13:07:57.963216Z 2 [Note] [MY-000000] [WSREP] Starting applier thread 2
      2020-09-23T13:07:57.963976Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: sent state msg: ccb9a7ea-fd9d-11ea-99ff-4a526cdb7ca0
      2020-09-23T13:07:57.964640Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: got state msg: ccb9a7ea-fd9d-11ea-99ff-4a526cdb7ca0 from 0 (cluster1-pxc-0)
      2020-09-23T13:07:57.965228Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: got state msg: ccb9a7ea-fd9d-11ea-99ff-4a526cdb7ca0 from 1 (cluster1-pxc-2)
      2020-09-23T13:07:57.965808Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: got state msg: ccb9a7ea-fd9d-11ea-99ff-4a526cdb7ca0 from 2 (cluster1-pxc-1)
      2020-09-23T13:07:57.966349Z 0 [Note] [MY-000000] [Galera] Quorum results:
       version = 6,
       component = PRIMARY,
       conf_id = 4,
       members = 2/3 (primary/total),
       act_id = 12696,
       last_appl. = 12694,
       protocols = 2/10/4 (gcs/repl/appl),
       vote policy= 0,
       group UUID = 1377ab55-fcf2-11ea-b3ed-aae34615dc4c
      2020-09-23T13:07:57.967007Z 0 [Note] [MY-000000] [Galera] Flow-control interval: [173, 173]
      2020-09-23T13:07:57.967562Z 0 [Note] [MY-000000] [Galera] Shifting OPEN -> PRIMARY (TO: 12697)
      2020-09-23T13:07:57.968417Z 2 [Note] [MY-000000] [Galera] ####### processing CC 12697, local, ordered
      2020-09-23T13:07:57.969098Z 2 [Note] [MY-000000] [Galera] Drain monitors from -1 upto -1
      2020-09-23T13:07:57.969715Z 2 [Note] [MY-000000] [Galera] REPL Protocols: 10 (5, 3)
      2020-09-23T13:07:57.970330Z 2 [Note] [MY-000000] [Galera] ####### My UUID: cc6ba230-fd9d-11ea-bcd6-ca53239c04de
      2020-09-23T13:07:57.970962Z 2 [Note] [MY-000000] [Galera] Server cluster1-pxc-1 connected to cluster at position 1377ab55-fcf2-11ea-b3ed-aae34615dc4c:12697 with ID cc6ba230-fd9d-11ea-bcd6-ca53239c04de
      2020-09-23T13:07:57.971573Z 2 [Note] [MY-000000] [WSREP] Server status change disconnected -> connected
      2020-09-23T13:07:57.971932Z 2 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.
      2020-09-23T13:07:57.972207Z 2 [Note] [MY-000000] [Galera] State transfer required: 
       Group state: 1377ab55-fcf2-11ea-b3ed-aae34615dc4c:12697
       Local state: 00000000-0000-0000-0000-000000000000:-1
      2020-09-23T13:07:57.972458Z 2 [Note] [MY-000000] [WSREP] Server status change connected -> joiner
      2020-09-23T13:07:57.972710Z 2 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.
      2020-09-23T13:07:57.973237Z 0 [Note] [MY-000000] [WSREP] Initiating SST/IST transfer on JOINER side (wsrep_sst_xtrabackup-v2 --role 'joiner' --address '' --datadir '/var/lib/mysql/' --basedir '/usr/' --plugindir '/usr/lib64/mysql/plugin/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --parent '1' --mysqld-version '8.0.19-10' '' )
      2020-09-23T13:07:59.005118Z 2 [Note] [MY-000000] [WSREP] Prepared SST request: xtrabackup-v2|
      2020-09-23T13:07:59.005899Z 2 [Note] [MY-000000] [Galera] Cert index reset to 00000000-0000-0000-0000-000000000000:-1 (proto: 10), state transfer needed: yes
      2020-09-23T13:07:59.006782Z 0 [Note] [MY-000000] [Galera] Service thread queue flushed.
      2020-09-23T13:07:59.007806Z 2 [Note] [MY-000000] [Galera] ####### Assign initial position for certification: 00000000-0000-0000-0000-000000000000:-1, protocol version: 5
      2020-09-23T13:07:59.008450Z 2 [Note] [MY-000000] [Galera] Check if state gap can be serviced using IST
      2020-09-23T13:07:59.009009Z 2 [Note] [MY-000000] [Galera] Local UUID: 00000000-0000-0000-0000-000000000000 != Group UUID: 1377ab55-fcf2-11ea-b3ed-aae34615dc4c
      2020-09-23T13:07:59.009550Z 2 [Note] [MY-000000] [Galera] ####### IST uuid:00000000-0000-0000-0000-000000000000 f: 0, l: 12697, STRv: 3
      2020-09-23T13:07:59.010224Z 2 [Note] [MY-000000] [Galera] IST receiver addr using ssl://
      2020-09-23T13:07:59.010882Z 2 [Note] [MY-000000] [Galera] IST receiver using ssl
      2020-09-23T13:07:59.011969Z 2 [Note] [MY-000000] [Galera] Prepared IST receiver for 0-12697, listening at: ssl://
      2020-09-23T13:07:59.013687Z 0 [Note] [MY-000000] [Galera] Member 2.0 (cluster1-pxc-1) requested state transfer from 'cluster1-pxc-1,'. Selected 0.0 (cluster1-pxc-0)(SYNCED) as donor.
      2020-09-23T13:07:59.014607Z 0 [Note] [MY-000000] [Galera] Shifting PRIMARY -> JOINER (TO: 12757)
      2020-09-23T13:07:59.015544Z 2 [Note] [MY-000000] [Galera] Requesting state transfer: success, donor: 0
      2020-09-23T13:07:59.015973Z 2 [Note] [MY-000000] [Galera] Resetting GCache seqno map due to different histories.
      2020-09-23T13:07:59.016310Z 2 [Note] [MY-000000] [Galera] GCache history reset: old(00000000-0000-0000-0000-000000000000:0 -> 1377ab55-fcf2-11ea-b3ed-aae34615dc4c:12697
      2020-09-23T13:07:59.894933Z 0 [Note] [MY-000000] [WSREP-SST] Proceeding with SST.........
      2020-09-23T13:07:59.919673Z 0 [Note] [MY-000000] [WSREP-SST] ............Waiting for SST streaming to complete!
      2020-09-23T13:08:00.459807Z 0 [Note] [MY-000000] [Galera] (cc6ba230, 'ssl://') turning message relay requesting off
      2020-09-23T13:08:54.956414Z 0 [Note] [MY-000000] [Galera] Created page /var/lib/mysql/ of size 134217728 bytes
      2020-09-23T13:09:53.720732Z 0 [Note] [MY-000000] [Galera] Created page /var/lib/mysql/ of size 134217728 bytes
      2020-09-23T13:10:54.169574Z 0 [Note] [MY-000000] [Galera] Created page /var/lib/mysql/ of size 134217728 bytes
      2020-09-23T13:11:54.700660Z 0 [Note] [MY-000000] [Galera] Created page /var/lib/mysql/ of size 134217728 bytes
      2020-09-23T13:13:00.012209Z 0 [Note] [MY-000000] [Galera] Created page /var/lib/mysql/ of size 134217728 bytes
      2020-09-23T13:14:03.189252Z 0 [Note] [MY-000000] [Galera] Created page /var/lib/mysql/ of size 134217728 bytes
      2020-09-23T13:15:06.869575Z 0 [Note] [MY-000000] [Galera] Created page /var/lib/mysql/ of size 134217728 bytes
      2020-09-23T13:16:11.224241Z 0 [Note] [MY-000000] [Galera] Created page /var/lib/mysql/ of size 134217728 bytes
      2020-09-23T13:17:15.282854Z 0 [Note] [MY-000000] [Galera] Created page /var/lib/mysql/ of size 134217728 bytes
      2020-09-23T13:18:16.389946Z 0 [Note] [MY-000000] [Galera] Created page /var/lib/mysql/ of size 134217728 bytes
      2020-09-23T13:18:51.541317Z 0 [ERROR] [MY-000000] [WSREP-SST] ******************* FATAL ERROR ********************** 
      2020-09-23T13:18:51.542945Z 0 [Warning] [MY-000000] [Galera] 0.0 (cluster1-pxc-0): State transfer to 2.0 (cluster1-pxc-1) failed: -22 (Invalid argument)
      2020-09-23T13:18:51.543164Z 0 [ERROR] [MY-000000] [Galera] gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():1222: Will never receive state. Need to abort.
      2020-09-23T13:18:51.543247Z 0 [Note] [MY-000000] [Galera] gcomm: terminating thread
      2020-09-23T13:18:51.543320Z 0 [Note] [MY-000000] [Galera] gcomm: joining thread
      2020-09-23T13:18:51.543871Z 0 [Note] [MY-000000] [Galera] gcomm: closing backend
      2020-09-23T13:18:51.545355Z 0 [ERROR] [MY-000000] [WSREP-SST] xtrabackup_checkpoints missing. xtrabackup/SST failed on DONOR. Check DONOR log
      2020-09-23T13:18:51.545407Z 0 [ERROR] [MY-000000] [WSREP-SST] Line 2068
      2020-09-23T13:18:51.545480Z 0 [ERROR] [MY-000000] [WSREP-SST] ****************************************************** 
      2020-09-23T13:18:51.545518Z 0 [ERROR] [MY-000000] [WSREP-SST] Cleanup after exit with status:2
      2020-09-23T13:18:51.545693Z 0 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node
      view (view_id(NON_PRIM,0dfacf76,5)
      memb {
      joined {
      left {
      partitioned {
      2020-09-23T13:18:51.545800Z 0 [Note] [MY-000000] [Galera] PC protocol downgrade 1 -> 0
      2020-09-23T13:18:51.545849Z 0 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node
      view ((empty))
      2020-09-23T13:18:51.546618Z 0 [Note] [MY-000000] [Galera] gcomm: closed
      2020-09-23T13:18:51.546681Z 0 [Note] [MY-000000] [Galera] Member 0.0 (cluster1-pxc-0) synced with group.
      2020-09-23T13:18:51.546743Z 0 [Note] [MY-000000] [Galera] New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
      2020-09-23T13:18:51.546857Z 0 [Note] [MY-000000] [Galera] Flow-control interval: [100, 100]
      2020-09-23T13:18:51.546915Z 0 [Note] [MY-000000] [Galera] Received NON-PRIMARY.
      2020-09-23T13:18:51.546954Z 0 [Note] [MY-000000] [Galera] Shifting JOINER -> OPEN (TO: 66786)
      2020-09-23T13:18:51.547003Z 0 [Note] [MY-000000] [Galera] New SELF-LEAVE.
      2020-09-23T13:18:51.547068Z 0 [Note] [MY-000000] [Galera] Flow-control interval: [0, 0]
      2020-09-23T13:18:51.547108Z 0 [Note] [MY-000000] [Galera] Received SELF-LEAVE. Closing connection.
      2020-09-23T13:18:51.547148Z 0 [Note] [MY-000000] [Galera] Shifting OPEN -> CLOSED (TO: -1)
      2020-09-23T13:18:51.547192Z 0 [Note] [MY-000000] [Galera] RECV thread exiting 0: Success
      2020-09-23T13:18:51.551036Z 0 [ERROR] [MY-000000] [WSREP] Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '' --datadir '/var/lib/mysql/' --basedir '/usr/' --plugindir '/usr/lib64/mysql/plugin/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --parent '1' --mysqld-version '8.0.19-10' '' : 2 (No such file or directory)
      2020-09-23T13:18:51.551228Z 0 [ERROR] [MY-000000] [WSREP] Failed to read uuid:seqno from joiner script.
      2020-09-23T13:18:51.551315Z 0 [ERROR] [MY-000000] [WSREP] SST script aborted with error 2 (No such file or directory)
      2020-09-23T13:18:51.551747Z 3 [Note] [MY-000000] [Galera] Processing SST received
      2020-09-23T13:18:51.551826Z 3 [Note] [MY-000000] [Galera] SST received: 00000000-0000-0000-0000-000000000000:-1
      2020-09-23T13:18:51.551904Z 3 [System] [MY-000000] [WSREP] SST completed
      2020-09-23T13:18:51.552078Z 2 [Note] [MY-000000] [Galera] str_proto_ver_: 3 sst_seqno_: -1 cc_seqno: 12697 req->ist_len(): 72
      2020-09-23T13:18:51.552222Z 2 [ERROR] [MY-000000] [Galera] Application received wrong state: 
       Received: 00000000-0000-0000-0000-000000000000
       Required: 1377ab55-fcf2-11ea-b3ed-aae34615dc4c
      2020-09-23T13:18:51.552299Z 2 [ERROR] [MY-000000] [Galera] Application state transfer failed. This is unrecoverable condition, restart required.
      2020-09-23T13:18:51.552375Z 2 [Note] [MY-000000] [Galera] ReplicatorSMM::abort()
      2020-09-23T13:18:51.552441Z 2 [Note] [MY-000000] [Galera] Closing send monitor...
      2020-09-23T13:18:51.552514Z 2 [Note] [MY-000000] [Galera] Closed send monitor.
      2020-09-23T13:18:51.552595Z 2 [Note] [MY-000000] [Galera] recv_thread() joined.
      2020-09-23T13:18:51.552652Z 2 [Note] [MY-000000] [Galera] Closing replication queue.
      2020-09-23T13:18:51.552716Z 2 [Note] [MY-000000] [Galera] Closing slave action queue.
      2020-09-23T13:18:51.552782Z 2 [Note] [MY-000000] [Galera] mysqld: Terminated.
      2020-09-23T13:18:51.552846Z 2 [Note] [MY-000000] [WSREP] Initiating SST cancellation
      13:18:51 UTC - mysqld got signal 11 ;
      Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
      Thread pointer: 0x7f337c0008c0
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 7f338fffec60 thread_stack 0x46000
      /usr/sbin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x3d) [0x1fb00ed]
      /usr/sbin/mysqld(handle_fatal_signal+0x303) [0x11521f3]
      /lib64/ [0x7f33b474a630]
      /lib64/ [0x7f33b25eabc7]
      /usr/lib64/galera4/ [0x7f339f4546c7]
      /usr/lib64/galera4/ [0x7f339f6204fc]
      /usr/lib64/galera4/ [0x7f339f640c3c]
      /usr/lib64/galera4/ [0x7f339f62dd18]
      /usr/lib64/galera4/ [0x7f339f5fe8c8]
      /usr/lib64/galera4/ [0x7f339f5feb48]
      /usr/lib64/galera4/ [0x7f339f626f10]
      /usr/lib64/galera4/ [0x7f339f64621c]
      /usr/sbin/mysqld(wsrep::wsrep_provider_v26::run_applier(wsrep::high_priority_service*)+0xe) [0x26e4a4e]
      /usr/sbin/mysqld() [0x1193964]
      /usr/sbin/mysqld(start_wsrep_THD+0x327) [0xe9c3b7]
      /usr/sbin/mysqld() [0x24a26c9]
      /lib64/ [0x7f33b4742ea5]
      /lib64/ [0x7f33b26b18dd]
      Trying to get some variables.
      Some pointers may be invalid and cause the dump to abort.
      Query (0): is an invalid pointer
      Connection ID (thread ID): 2
      Status: NOT_KILLED
      You may download the Percona XtraDB Cluster operations manual by visiting You may find information
      in the manual which will help you identify the cause of the crash.
      Writing a core file



