Uploaded image for project: 'Percona Toolkit'
  1. Percona Toolkit
  2. PT-229

pt-online-schema-change does not retry on deadlock error when using percona 5.7

    Details

    • Type: Bug
    • Status: Done
    • Priority: Medium
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.0.6
    • Component/s: None
    • Labels:
      None
    • Environment:

      Debian 8 jessie

      percona-xtradb-cluster-server-5.7

      Server version: 5.7.19-17-57 Percona XtraDB Cluster (GPL), Release rel17, Revision 35cdc81, WSREP version 29.22, wsrep_29.22

       

      Description

      As per my forum post here: https://www.percona.com/forums/questions-discussions/percona-toolkit/50169-using-pt-online-schema-change-with-percona-5-7

      I was able to replicate the issue within a dev (Debian 8) vagrant environment using percona-xtradb-cluster-server-5.7:

      Server version: 5.7.19-17-57 Percona XtraDB Cluster (GPL), Release rel17, Revision 35cdc81, WSREP version 29.22, wsrep_29.22

      I set up a 3 node cluster with a basic test table:

      CREATE TABLE `test_a` (
      `test_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
      `column_a` varchar(80) DEFAULT NULL,
      `column_b` varchar(20) DEFAULT NULL,
      `active` tinyint(1) unsigned DEFAULT NULL,
      `created` timestamp NULL DEFAULT NULL,
      `modified` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
      PRIMARY KEY (`test_id`)
      ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;
      

      Which I populated with ~400k records and then I set up 4 basic php processes that constantly updated random records within the table. The processes wrote to random nodes within the cluster in order to replicate what happens in our production environment.

      Then whilst the 4 processes were busy updating random records I ran the pt-online-schema-change tool on one of the nodes

      PTDEBUG=1 pt-online-schema-change --alter "DROP COLUMN column_b" D=test_pt_online_schema_change,t=test_a --execute --ask-pass --u root --alter-foreign-keys-method=auto --chunk-size=5000 -P 6000 -S /tmp/mysql.6000.sock

      I've attached the output of the script but it essentially reaches a deadlock/ conflict and does not retry the INSERT into the new table command

      ...
      # Retry:3823 7147 Try 1 of 10
      # pt_online_schema_change:11332 7147 INSERT LOW_PRIORITY IGNORE INTO `test_pt_online_schema_change`.`_test_a_new` (`test_id`, `column_a`, `active`, `created`, `modified`) SELECT `test_id`, `column_a`, `active`, `created`, `modified` FROM `test_pt_online_schema_change`.`test_a` FORCE INDEX(`PRIMARY`) WHERE ((`test_id` >= ?)) AND ((`test_id` <= ?)) LOCK IN SHARE MODE /*pt-online-schema-change 7147 copy nibble*/ lower boundary: 1 upper boundary: 14998
      # Retry:3829 7147 Try code failed: DBD::mysql::st execute failed: WSREP detected deadlock/conflict and aborted the transaction. Try restarting the transaction [for Statement "INSERT LOW_PRIORITY IGNORE INTO `test_pt_online_schema_change`.`_test_a_new` (`test_id`, `column_a`, `active`, `created`, `modified`) SELECT `test_id`, `column_a`, `active`, `created`, `modified` FROM `test_pt_online_schema_change`.`test_a` FORCE INDEX(`PRIMARY`) WHERE ((`test_id` >= ?)) AND ((`test_id` <= ?)) LOCK IN SHARE MODE /*pt-online-schema-change 7147 copy nibble*/" with ParamValues: 0='1', 1='14998'] at /usr/bin/pt-online-schema-change line 11336.
      #
      # pt_online_schema_change:11242 7147 Retry fail: DBD::mysql::st execute failed: WSREP detected deadlock/conflict and aborted the transaction. Try restarting the transaction [for Statement "INSERT LOW_PRIORITY IGNORE INTO `test_pt_online_schema_change`.`_test_a_new` (`test_id`, `column_a`, `active`, `created`, `modified`) SELECT `test_id`, `column_a`, `active`, `created`, `modified` FROM `test_pt_online_schema_change`.`test_a` FORCE INDEX(`PRIMARY`) WHERE ((`test_id` >= ?)) AND ((`test_id` <= ?)) LOCK IN SHARE MODE /*pt-online-schema-change 7147 copy nibble*/" with ParamValues: 0='1', 1='14998'] at /usr/bin/pt-online-schema-change line 11336.
      #
      # Retry:3845 7147 Try code did not succeed
      # CleanupTask:6550 7147 Calling cleanup task CODE(0x32f6878)
      # pt_online_schema_change:9319 7147 Clean up triggers
      ...

      We've tried lower chunk sizes on our production environment but we still hit this problem.

        Smart Checklist

          Attachments

            Activity

              People

              • Assignee:
                carlos.salguero Carlos Salguero
                Reporter:
                futuresam Sam Grant
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0 minutes
                  0m
                  Logged:
                  Time Spent - 2 days
                  2d