Uploaded image for project: 'Percona XtraDB Cluster'
  1. Percona XtraDB Cluster
  2. PXC-1900

LP #1575959: Node consistency compromized during pt-online-schema-change

    XMLWordPrintable

    Details

      Description

      **Reported in Launchpad by Alexander Ilyin last update 29-06-2016 04:17:47

      This behaviou happend two times in row when I was altering two big tables. Used versions:

      CentOS release 6.7 (Final)

      pt-online-schema-change 2.2.17

      Percona-XtraDB-Cluster-client-56.x86_64 1:5.6.28-25.14.1.el6 @percona-release-x86_64
      Percona-XtraDB-Cluster-devel-56.x86_64 1:5.6.28-25.14.1.el6 @percona-release-x86_64
      Percona-XtraDB-Cluster-full-56.x86_64 1:5.6.28-25.14.1.el6 @percona-release-x86_64
      Percona-XtraDB-Cluster-galera-3.x86_64 3.14-1.rhel6 @percona-release-x86_64
      Percona-XtraDB-Cluster-garbd-3.x86_64 3.14-1.rhel6 @percona-release-x86_64
      Percona-XtraDB-Cluster-server-56.x86_64 1:5.6.28-25.14.1.el6 @percona-release-x86_64
      Percona-XtraDB-Cluster-shared-56.x86_64 1:5.6.28-25.14.1.el6 @percona-release-x86_64
      Percona-XtraDB-Cluster-test-56.x86_64 1:5.6.28-25.14.1.el6 @percona-release-x86_64

      We have 3 servers in Percona XtraDB

      On 1st node I've started alter with pt-online-schema-change:

      pt-online-schema-change --print --execute --alter-foreign-keys-method=drop_swap --no-drop-old-table --host "127.0.0.1" --user "root" --password "***" --port "3306" --alter "DROP COLUMN platform, ADD partner_id INT unsigned NOT NULL DEFAULT '0', ADD COLUMN flags INT unsigned NOT NULL DEFAULT '0';" D=elite,t=user

      And in 10 minutes 3rd node was shutdown:

      2016-04-27 15:39:45 4558 [Warning] Aborted connection 2857066 to db: 'elite' user: 'elite' host: '*****' (Got an error reading communication packets)
      BF-BF X lock conflict,mode: 1027 supremum: 0
      conflicts states: my 0 locked 0
      RECORD LOCKS space id 2623 page no 1340865 n bits 152 index `PRIMARY` of table `elite`.`device` trx id 4637663494 lock_mode X locks rec but not gap
      2016-04-27 15:42:08 4558 [ERROR] Slave SQL: Could not execute Update_rows event on table elite.device; Can't find record in 'device', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 457, Error_code: 1032
      2016-04-27 15:42:08 4558 [Warning] WSREP: RBR event 3 Update_rows apply warning: 120, 1252701249
      2016-04-27 15:42:08 4558 [Warning] WSREP: Failed to apply app buffer: seqno: 1252701249, status: 1
      at galera/src/trx_handle.cpp:apply():351
      Retrying 2th time
      BF-BF X lock conflict,mode: 1027 supremum: 0
      conflicts states: my 0 locked 0
      RECORD LOCKS space id 2623 page no 1340865 n bits 152 index `PRIMARY` of table `elite`.`device` trx id 4637663494 lock_mode X locks rec but not gap
      2016-04-27 15:42:08 4558 [ERROR] Slave SQL: Could not execute Update_rows event on table elite.device; Can't find record in 'device', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 457, Error_code: 1032
      2016-04-27 15:42:08 4558 [Warning] WSREP: RBR event 3 Update_rows apply warning: 120, 1252701249
      2016-04-27 15:42:08 4558 [Warning] WSREP: Failed to apply app buffer: seqno: 1252701249, status: 1
      at galera/src/trx_handle.cpp:apply():351
      Retrying 3th time
      BF-BF X lock conflict,mode: 1027 supremum: 0
      conflicts states: my 0 locked 0
      RECORD LOCKS space id 2623 page no 1340865 n bits 152 index `PRIMARY` of table `elite`.`device` trx id 4637663494 lock_mode X locks rec but not gap
      2016-04-27 15:42:08 4558 [ERROR] Slave SQL: Could not execute Update_rows event on table elite.device; Can't find record in 'device', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 457, Error_code: 1032
      2016-04-27 15:42:08 4558 [Warning] WSREP: RBR event 3 Update_rows apply warning: 120, 1252701249
      2016-04-27 15:42:08 4558 [Warning] WSREP: Failed to apply app buffer: seqno: 1252701249, status: 1
      at galera/src/trx_handle.cpp:apply():351
      Retrying 4th time
      BF-BF X lock conflict,mode: 1027 supremum: 0
      conflicts states: my 0 locked 0
      RECORD LOCKS space id 2623 page no 1340865 n bits 152 index `PRIMARY` of table `elite`.`device` trx id 4637663494 lock_mode X locks rec but not gap
      2016-04-27 15:42:08 4558 [ERROR] Slave SQL: Could not execute Update_rows event on table elite.device; Can't find record in 'device', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 457, Error_code: 1032
      2016-04-27 15:42:08 4558 [Warning] WSREP: RBR event 3 Update_rows apply warning: 120, 1252701249
      2016-04-27 15:42:08 4558 [ERROR] WSREP: Failed to apply trx: source: 135aa33a-0c9d-11e6-8474-4bed42818caa version: 3 local: 0 state: APPLYING flags: 1 conn_id: 638350 trx_id: 4637945874 seqnos (l: 1784028, g: 1252701249, s: 1252701248, d: 1252701243, ts: 23188580704163)
      2016-04-27 15:42:08 4558 [ERROR] WSREP: Failed to apply trx 1252701249 4 times
      2016-04-27 15:42:08 4558 [ERROR] WSREP: Node consistency compromized, aborting...
      2016-04-27 15:42:08 4558 [Note] WSREP: Closing send monitor...
      2016-04-27 15:42:08 4558 [Note] WSREP: Closed send monitor.
      2016-04-27 15:42:08 4558 [Note] WSREP: gcomm: terminating thread
      2016-04-27 15:42:08 4558 [Note] WSREP: gcomm: joining thread
      2016-04-27 15:42:08 4558 [Note] WSREP: gcomm: closing backend
      2016-04-27 15:42:08 4558 [Note] WSREP: view(view_id(NON_PRIM,135aa33a,11) memb {
      e89d9ff6,0
      } joined {
      } left {
      } partitioned {
      135aa33a,0
      138ce52e,0
      })
      2016-04-27 15:42:08 4558 [Note] WSREP: view((empty))
      2016-04-27 15:42:08 4558 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
      2016-04-27 15:42:08 4558 [Note] WSREP: gcomm: closed
      2016-04-27 15:42:08 4558 [Note] WSREP: Flow-control interval: [512, 512]
      2016-04-27 15:42:08 4558 [Note] WSREP: Received NON-PRIMARY.
      2016-04-27 15:42:08 4558 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 1252701251)
      2016-04-27 15:42:08 4558 [Note] WSREP: Received self-leave message.
      2016-04-27 15:42:08 4558 [Note] WSREP: Flow-control interval: [0, 0]
      2016-04-27 15:42:08 4558 [Note] WSREP: Received SELF-LEAVE. Closing connection.
      2016-04-27 15:42:08 4558 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 1252701251)
      2016-04-27 15:42:08 4558 [Note] WSREP: RECV thread exiting 0: Success
      2016-04-27 15:42:08 4558 [Note] WSREP: recv_thread() joined.
      2016-04-27 15:42:08 4558 [Note] WSREP: Closing replication queue.
      2016-04-27 15:42:08 4558 [Note] WSREP: Closing slave action queue.
      2016-04-27 15:42:08 4558 [Note] WSREP: /usr/sbin/mysqld: Terminated.
      160427 15:42:10 mysqld_safe Number of processes running now: 0
      160427 15:42:10 mysqld_safe WSREP: not restarting wsrep node automatically
      160427 15:42:10 mysqld_safe mysqld from pid file /var/run/mysql/mysql.pid ended

        Smart Checklist

          Attachments

            Activity

              People

              Assignee:
              krunal.bauskar Krunal Bauskar (Inactive)
              Reporter:
              lpjirasync lpjirasync (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Dates

                Created:
                Updated: