Details

    • Bug
    • Status: Done
    • Medium
    • Resolution: Cannot Reproduce
    • None
    • None
    • None
    • None
    • XtraDB Cluster 3 Nodes on Ubuntu Linux 16.04 amd64

      XTrabackup version: percona-xtrabackup-24 ,now 2.4.9-1.xenial amd64

      Percona XTraDB Cluster version: percona-xtradb-cluster-57 5.7.20-29.24-1.xenial amd64g

       

    Description

      SST stopped working on one of the nodes of my 3 node percona cluster, I think after an xtrabackup update. Apparently there is some kind of "broken pipe" error on the transfer, but actually on both sides joiner and donor.

      The thing that puzzles me is, that the joiner is very quick to quit. I think it tries to read the seqno and fails, before even the donor starts transmitting. The donor then tries to transmit and quits after some time because he has lost connection (at least I think so).

      Other things I noticed and can't make sense of:

      • The xtrabackup donor command fails with "22 (Invalid argument)"

      Log of the joiner:

       

      2018-02-18T12:21:07.745001Z 0 [Note] WSREP: Initiating SST/IST transfer on JOINER side (wsrep_sst_xtrabackup-v2 --role 'joiner' --address '185.170.112.72' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my
      .cnf' --defaults-group-suffix '' --parent '3235' '' )
      2018-02-18T12:21:08.351007Z 2 [Note] WSREP: Prepared SST/IST request: xtrabackup-v2|185.170.112.72:4444/xtrabackup_sst//1
      2018-02-18T12:21:08.351063Z 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2018-02-18T12:21:08.351104Z 2 [Note] WSREP: REPL Protocols: 7 (3, 2)
      2018-02-18T12:21:08.351117Z 2 [Note] WSREP: Assign initial position for certification: 4140010, protocol version: 3
      2018-02-18T12:21:08.351169Z 0 [Note] WSREP: Service thread queue flushed.
      2018-02-18T12:21:08.351191Z 2 [Note] WSREP: Check if state gap can be serviced using IST
      2018-02-18T12:21:08.351213Z 2 [Note] WSREP: Local UUID: 00000000-0000-0000-0000-000000000000 != Group UUID: dce74b7a-210c-11e7-bf08-e2ca769708c5
      2018-02-18T12:21:08.351254Z 2 [Note] WSREP: State gap can't be serviced using IST. Switching to SST
      2018-02-18T12:21:08.352194Z 0 [Note] WSREP: Member 0.0 (mams_node_3) requested state transfer from '*any*'. Selected 1.0 (mams_node_1)(SYNCED) as donor.
      2018-02-18T12:21:08.352207Z 0 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 4140010)
      2018-02-18T12:21:08.352231Z 2 [Note] WSREP: Requesting state transfer: success, donor: 1
      2018-02-18T12:21:08.352266Z 2 [Note] WSREP: GCache history reset: old(00000000-0000-0000-0000-000000000000:0) -> new(dce74b7a-210c-11e7-bf08-e2ca769708c5:4140010)
      2018-02-18T12:21:08.985180Z WSREP_SST: [ERROR] ******************* FATAL ERROR ********************** 
      2018-02-18T12:21:08.986474Z WSREP_SST: [ERROR] xtrabackup process ended without creating '/var/tmp/pxc_sst_oXcWKzOT/joiner_ZnRWtOew/xtrabackup_galera_info'
      2018-02-18T12:21:08.987707Z WSREP_SST: [ERROR] ****************************************************** 
      2018-02-18T12:21:08.991696Z WSREP_SST: [ERROR] Cleanup after exit with status:32
      2018-02-18T12:21:09.001170Z 0 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '185.170.112.72' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --parent '3235' '' : 32 (Broken pipe)
      2018-02-18T12:21:09.001286Z 0 [ERROR] WSREP: Failed to read uuid:seqno from joiner script.
      2018-02-18T12:21:09.001295Z 0 [ERROR] WSREP: SST script aborted with error 32 (Broken pipe)
      2018-02-18T12:21:09.001347Z 0 [ERROR] WSREP: SST failed: 32 (Broken pipe)
      2018-02-18T12:21:09.001357Z 0 [ERROR] Aborting
      
      

      Log of the donor:

       

       

      2018-02-18T12:21:08.350207Z 0 [Note] WSREP: Member 0.0 (mams_node_3) requested state transfer from '*any*'. Selected 1.0 (mams_node_1)(SYNCED) as donor.
      2018-02-18T12:21:08.350232Z 0 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 4140010)
      2018-02-18T12:21:08.350303Z 4 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2018-02-18T12:21:08.350438Z 0 [Note] WSREP: Initiating SST/IST transfer on DONOR side (wsrep_sst_xtrabackup-v2 --role 'donor' --address '185.170.112.72:4444/xtrabackup_sst//1' --socket '/var/run/mysqld/mysqld.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' '' --gtid 'dce74b7a-210c-11e7-bf08-e2ca769708c5:4140010')
      2018-02-18T12:21:08.351418Z 4 [Note] WSREP: DONOR thread signaled with 0
      2018-02-18T12:21:09.001109Z 0 [Note] WSREP: declaring dcda9141 at ssl://185.170.112.68:4567 stable
      2018-02-18T12:21:09.001148Z 0 [Note] WSREP: forgetting 31cc58be (ssl://185.170.112.72:4567)
      2018-02-18T12:21:09.002325Z 0 [Note] WSREP: Node 9a8b8287 state primary
      2018-02-18T12:21:09.002709Z 0 [Note] WSREP: Current view of cluster as seen by this node
      view (view_id(PRIM,9a8b8287,138)
      memb {
      9a8b8287,0
      dcda9141,0
      }
      joined {
      }
      left {
      }
      partitioned {
      31cc58be,0
      }
      )
      2018-02-18T12:21:09.002727Z 0 [Note] WSREP: Save the discovered primary-component to disk
      2018-02-18T12:21:09.002964Z 0 [Note] WSREP: forgetting 31cc58be (ssl://185.170.112.72:4567)
      2018-02-18T12:21:09.003170Z 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2
      2018-02-18T12:21:09.003636Z 0 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 3325a4fb-14a6-11e8-a8dc-d72f76cc49ae
      2018-02-18T12:21:09.004100Z 0 [Note] WSREP: STATE EXCHANGE: sent state msg: 3325a4fb-14a6-11e8-a8dc-d72f76cc49ae
      2018-02-18T12:21:09.004607Z 0 [Note] WSREP: STATE EXCHANGE: got state msg: 3325a4fb-14a6-11e8-a8dc-d72f76cc49ae from 0 (mams_node_1)
      2018-02-18T12:21:09.004619Z 0 [Note] WSREP: STATE EXCHANGE: got state msg: 3325a4fb-14a6-11e8-a8dc-d72f76cc49ae from 1 (mams_node_2)
      2018-02-18T12:21:09.004627Z 0 [Note] WSREP: Quorum results:
      version = 4,
      component = PRIMARY,
      conf_id = 113,
      members = 2/2 (primary/total),
      act_id = 4140010,
      last_appl. = 4139921,
      protocols = 0/7/3 (gcs/repl/appl),
      group UUID = dce74b7a-210c-11e7-bf08-e2ca769708c5
      2018-02-18T12:21:09.004643Z 0 [Note] WSREP: Flow-control interval: [141, 141]
      2018-02-18T12:21:09.004647Z 0 [Note] WSREP: Trying to continue unpaused monitor
      2018-02-18T12:21:09.004699Z 9 [Note] WSREP: New cluster view: global state: dce74b7a-210c-11e7-bf08-e2ca769708c5:4140010, view# 114: Primary, number of nodes: 2, my index: 0, protocol version 3
      2018-02-18T12:21:09.004726Z 9 [Note] WSREP: Setting wsrep_ready to true
      2018-02-18T12:21:09.004734Z 9 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2018-02-18T12:21:09.004746Z 9 [Note] WSREP: REPL Protocols: 7 (3, 2)
      2018-02-18T12:21:09.004754Z 9 [Note] WSREP: Assign initial position for certification: 4140010, protocol version: 3
      2018-02-18T12:21:09.004789Z 0 [Note] WSREP: Service thread queue flushed.
      2018-02-18T12:21:09.887828Z 0 [Note] WSREP: (9a8b8287, 'ssl://0.0.0.0:4567') turning message relay requesting off
      2018-02-18T12:21:14.388663Z 0 [Note] WSREP: cleaning up 31cc58be (ssl://185.170.112.72:4567)
      2018-02-18T12:21:18.984724Z WSREP_SST: [INFO] Streaming the backup to joiner at 185.170.112.72 4444
      2018/02/18 13:21:49 socat[27048] E connect(7, AF=2 185.170.112.72:4444, 16): Connection refused
      2018-02-18T12:21:49.013335Z 13503 [Note] Aborted connection 13503 to db: 'unconnected' user: 'sstuser' host: 'localhost' (Got an error reading communication packets)
      2018-02-18T12:21:49.014779Z WSREP_SST: [ERROR] ******************* FATAL ERROR ********************** 
      2018-02-18T12:21:49.016024Z WSREP_SST: [ERROR] xtrabackup finished with error: 1. Check /var/lib/mysql//innobackup.backup.log
      --------------- innobackup.backup.log (START) --------------------
      180218 13:21:19 version_check Connecting to MySQL server with DSN 'dbi:mysql:;mysql_read_default_group=xtrabackup;mysql_socket=/var/run/mysqld/mysqld.sock' as 'sstuser' (using password: YES).
      180218 13:21:19 version_check Connected to MySQL server
      180218 13:21:19 version_check Executing a version check against the server...
      180218 13:21:19 version_check Done.
      180218 13:21:19 Connecting to MySQL server host: localhost, user: sstuser, password: set, port: not set, socket: /var/run/mysqld/mysqld.sock
      Using server version 5.7.19-17-57
      xtrabackup version 2.4.9 based on MySQL server 5.7.13 Linux (x86_64) (revision id: a467167cdd4)
      xtrabackup: uses posix_fadvise().
      xtrabackup: cd to /var/lib/mysql/
      xtrabackup: open files limit requested 0, set to 5000
      xtrabackup: using the following InnoDB configuration:
      xtrabackup: innodb_data_home_dir = .
      xtrabackup: innodb_data_file_path = ibdata1:12M:autoextend
      xtrabackup: innodb_log_group_home_dir = ./
      xtrabackup: innodb_log_files_in_group = 2
      xtrabackup: innodb_log_file_size = 50331648
      InnoDB: Number of pools: 1
      180218 13:21:19 >> log scanned up to (41548239055)
      xtrabackup: Generating a list of tablespaces
      InnoDB: Allocated tablespace ID 21 for sys/sys_config, old maximum was 0
      xtrabackup: Starting 4 threads for parallel data files transfer
      180218 13:21:19 [01] Streaming ./ibdata1
      180218 13:21:19 [02] Streaming ./sys/sys_config.ibd
      180218 13:21:19 [03] Streaming ./mysql/servers.ibd
      180218 13:21:19 [03] ...done
      180218 13:21:19 [04] Streaming ./mysql/innodb_table_stats.ibd
      180218 13:21:19 [02] ...done
      180218 13:21:19 [04] ...done
      180218 13:21:20 >> log scanned up to (41548239064)
      180218 13:21:21 >> log scanned up to (41548239064)
      180218 13:21:22 >> log scanned up to (41548239697)
      180218 13:21:23 >> log scanned up to (41548239697)
      180218 13:21:24 >> log scanned up to (41548239697)
      180218 13:21:25 >> log scanned up to (41548239744)
      180218 13:21:26 >> log scanned up to (41548239753)
      180218 13:21:27 >> log scanned up to (41548239753)
      180218 13:21:28 >> log scanned up to (41548239753)
      180218 13:21:29 >> log scanned up to (41548239753)
      180218 13:21:30 >> log scanned up to (41548239753)
      180218 13:21:31 >> log scanned up to (41548240402)
      180218 13:21:32 >> log scanned up to (41548240402)
      180218 13:21:34 >> log scanned up to (41548240449)
      180218 13:21:35 >> log scanned up to (41548240458)
      180218 13:21:36 >> log scanned up to (41548240458)
      180218 13:21:37 >> log scanned up to (41548241091)
      180218 13:21:38 >> log scanned up to (41548241091)
      180218 13:21:39 >> log scanned up to (41548241091)
      180218 13:21:40 >> log scanned up to (41548241138)
      180218 13:21:41 >> log scanned up to (41548241147)
      180218 13:21:42 >> log scanned up to (41548241147)
      180218 13:21:43 >> log scanned up to (41548241147)
      180218 13:21:44 >> log scanned up to (41548241147)
      180218 13:21:45 >> log scanned up to (41548241147)
      180218 13:21:46 >> log scanned up to (41548241780)
      180218 13:21:47 >> log scanned up to (41548244551)
      180218 13:21:48 >> log scanned up to (41548244551)
      xtrabackup: Error writing file 'UNOPENED' (Errcode: 32 - Broken pipe)
      [03] xtrabackup: Error: failed to copy datafile.
      --------------- innobackup.backup.log (END) ----------------------
      2018-02-18T12:21:49.017959Z WSREP_SST: [ERROR] ****************************************************** 
      2018-02-18T12:21:49.019306Z WSREP_SST: [ERROR] Cleanup after exit with status:22
      2018-02-18T12:21:49.028917Z 0 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'donor' --address '185.170.112.72:4444/xtrabackup_sst//1' --socket '/var/run/mysqld/mysqld.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' '' --gtid 'dce74b7a-210c-11e7-bf08-e2ca769708c5:4140010': 22 (Invalid argument)
      2018-02-18T12:21:49.029000Z 0 [ERROR] WSREP: Command did not run: wsrep_sst_xtrabackup-v2 --role 'donor' --address '185.170.112.72:4444/xtrabackup_sst//1' --socket '/var/run/mysqld/mysqld.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' '' --gtid 'dce74b7a-210c-11e7-bf08-e2ca769708c5:4140010'
      2018-02-18T12:21:49.029787Z 0 [Warning] WSREP: Could not find peer: 31cc58be-14a6-11e8-bbe0-9b0216319a2d
      2018-02-18T12:21:49.029803Z 0 [Warning] WSREP: 0.0 (mams_node_1): State transfer to -1.-1 (left the group) failed: -22 (Invalid argument)
      2018-02-18T12:21:49.029811Z 0 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 4140015)
      2018-02-18T12:21:49.030250Z 0 [Note] WSREP: Member 0.0 (mams_node_1) synced with group.
      2018-02-18T12:21:49.030261Z 0 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 4140015)
      2018-02-18T12:21:49.030336Z 6 [Note] WSREP: Synchronized with group, ready for connections
      2018-02-18T12:21:49.030353Z 6 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      
      
      

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            oweise Oliver Weise
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Smart Checklist