Uploaded image for project: 'Percona Toolkit'
  1. Percona Toolkit
  2. PT-229

pt-online-schema-change does not retry on deadlock error when using percona 5.7

Details

    • Bug
    • Status: Done
    • Medium
    • Resolution: Fixed
    • None
    • 3.0.6
    • None
    • None
    • Debian 8 jessie

      percona-xtradb-cluster-server-5.7

      Server version: 5.7.19-17-57 Percona XtraDB Cluster (GPL), Release rel17, Revision 35cdc81, WSREP version 29.22, wsrep_29.22

       

    Description

      As per my forum post here: https://www.percona.com/forums/questions-discussions/percona-toolkit/50169-using-pt-online-schema-change-with-percona-5-7

      I was able to replicate the issue within a dev (Debian 8) vagrant environment using percona-xtradb-cluster-server-5.7:

      Server version: 5.7.19-17-57 Percona XtraDB Cluster (GPL), Release rel17, Revision 35cdc81, WSREP version 29.22, wsrep_29.22

      I set up a 3 node cluster with a basic test table:

      CREATE TABLE `test_a` (
      `test_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
      `column_a` varchar(80) DEFAULT NULL,
      `column_b` varchar(20) DEFAULT NULL,
      `active` tinyint(1) unsigned DEFAULT NULL,
      `created` timestamp NULL DEFAULT NULL,
      `modified` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
      PRIMARY KEY (`test_id`)
      ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;
      

      Which I populated with ~400k records and then I set up 4 basic php processes that constantly updated random records within the table. The processes wrote to random nodes within the cluster in order to replicate what happens in our production environment.

      Then whilst the 4 processes were busy updating random records I ran the pt-online-schema-change tool on one of the nodes

      PTDEBUG=1 pt-online-schema-change --alter "DROP COLUMN column_b" D=test_pt_online_schema_change,t=test_a --execute --ask-pass --u root --alter-foreign-keys-method=auto --chunk-size=5000 -P 6000 -S /tmp/mysql.6000.sock

      I've attached the output of the script but it essentially reaches a deadlock/ conflict and does not retry the INSERT into the new table command

      ...
      # Retry:3823 7147 Try 1 of 10
      # pt_online_schema_change:11332 7147 INSERT LOW_PRIORITY IGNORE INTO `test_pt_online_schema_change`.`_test_a_new` (`test_id`, `column_a`, `active`, `created`, `modified`) SELECT `test_id`, `column_a`, `active`, `created`, `modified` FROM `test_pt_online_schema_change`.`test_a` FORCE INDEX(`PRIMARY`) WHERE ((`test_id` >= ?)) AND ((`test_id` <= ?)) LOCK IN SHARE MODE /*pt-online-schema-change 7147 copy nibble*/ lower boundary: 1 upper boundary: 14998
      # Retry:3829 7147 Try code failed: DBD::mysql::st execute failed: WSREP detected deadlock/conflict and aborted the transaction. Try restarting the transaction [for Statement "INSERT LOW_PRIORITY IGNORE INTO `test_pt_online_schema_change`.`_test_a_new` (`test_id`, `column_a`, `active`, `created`, `modified`) SELECT `test_id`, `column_a`, `active`, `created`, `modified` FROM `test_pt_online_schema_change`.`test_a` FORCE INDEX(`PRIMARY`) WHERE ((`test_id` >= ?)) AND ((`test_id` <= ?)) LOCK IN SHARE MODE /*pt-online-schema-change 7147 copy nibble*/" with ParamValues: 0='1', 1='14998'] at /usr/bin/pt-online-schema-change line 11336.
      #
      # pt_online_schema_change:11242 7147 Retry fail: DBD::mysql::st execute failed: WSREP detected deadlock/conflict and aborted the transaction. Try restarting the transaction [for Statement "INSERT LOW_PRIORITY IGNORE INTO `test_pt_online_schema_change`.`_test_a_new` (`test_id`, `column_a`, `active`, `created`, `modified`) SELECT `test_id`, `column_a`, `active`, `created`, `modified` FROM `test_pt_online_schema_change`.`test_a` FORCE INDEX(`PRIMARY`) WHERE ((`test_id` >= ?)) AND ((`test_id` <= ?)) LOCK IN SHARE MODE /*pt-online-schema-change 7147 copy nibble*/" with ParamValues: 0='1', 1='14998'] at /usr/bin/pt-online-schema-change line 11336.
      #
      # Retry:3845 7147 Try code did not succeed
      # CleanupTask:6550 7147 Calling cleanup task CODE(0x32f6878)
      # pt_online_schema_change:9319 7147 Clean up triggers
      ...

      We've tried lower chunk sizes on our production environment but we still hit this problem.

      Attachments

        Activity

          People

            carlos.salguero Carlos Salguero (Inactive)
            futuresam Sam Grant
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0 minutes
                0m
                Logged:
                Time Spent - 2 days
                2d

                Smart Checklist