Uploaded image for project: 'Percona Toolkit'
  1. Percona Toolkit
  2. PT-1747

pt-online-schema-change: metadata lock can break database for rebuild_constraints

Details

    • Yes
    • Yes
    • Yes

    Description

      Summary: 
      pt-online-schema-change was bringing the database into a broken state when applying the "rebuild_constraints" foreign keys modification method if any of child tables were blocked by the metadata lock

      The database left in broken state if any of child tables blocked by metadata lock:

      create table t_prnt(prnt_id int auto_increment primary key);
      create table t_chld(chld_id int auto_increment primary key, prnt_id int);
      alter table t_chld add constraint FOREIGN KEY (prnt_id) REFERENCES t_prnt(prnt_id) ON DELETE CASCADE;
      replace into t_prnt values (1);
      insert into t_chld (prnt_id) values (1);
      
      # create an mdl against child table:
      mysql -e 'begin; select sleep(3600) from test.t_chld;'
      
      PTDEBUG=1 ./pt-online-schema-change --execute --alter-foreign-keys-method rebuild_constraints D=test,t=t_prnt --alter 'add column c int
      

      There is a part from ptdebug output:

      2019-07-20T11:01:33 Swapping tables...
      # Retry:3823 6891 Try 1 of 10
      # pt_online_schema_change:10617 6891 RENAME TABLE `test`.`t_prnt` TO `test`.`_t_prnt_old`, `test`.`_t_prnt_new` TO `test`.`t_prnt`
      ....
      # Retry:3835 6891 Calling wait code
      # Retry:3823 6891 Try 10 of 10
      # pt_online_schema_change:10915 6891 ALTER TABLE `test`.`t_chld` DROP FOREIGN KEY `t_chld_ibfk_1`, ADD CONSTRAINT `_t_chld_ibfk_1` FOREIGN KEY (`prnt_id`) REFERENCES `test`.`t_prnt` (`prnt_id`) ON DELETE CASCADE
      # Retry:3829 6891 Try code failed: DBD::mysql::db do failed: Lock wait timeout exceeded; try restarting transaction [for Statement "ALTER TABLE `test`.`t_chld` DROP FOREIGN KEY `t_chld_ibfk_1`, ADD CONSTRAINT `_t_chld_ibfk_1` FOREIGN KEY (`prnt_id`) REFERENCES `test`.`t_prnt` (`prnt_id`) ON DELETE CASCADE"] at ./pt-online-schema-change line 10916.
      # 
      # Retry:3845 6891 Try code did not succeed
      # CleanupTask:6584 6891 Calling cleanup task CODE(0x5649d741e2f8)
      # pt_online_schema_change:9396 6891 Clean up triggers
      2019-07-20T11:11:48 Dropping triggers...
      # Retry:3823 6891 Try 1 of 10
      # pt_online_schema_change:11296 6891 DROP TRIGGER IF EXISTS `test`.`pt_osc_test_t_prnt_del`
      # Retry:3840 6891 Try code succeeded
      # Retry:3823 6891 Try 1 of 10
      # pt_online_schema_change:11296 6891 DROP TRIGGER IF EXISTS `test`.`pt_osc_test_t_prnt_upd`
      # Retry:3840 6891 Try code succeeded
      # Retry:3823 6891 Try 1 of 10
      # pt_online_schema_change:11296 6891 DROP TRIGGER IF EXISTS `test`.`pt_osc_test_t_prnt_ins`
      # Retry:3840 6891 Try code succeeded
      2019-07-20T11:11:49 Dropped triggers OK.
      # pt_online_schema_change:9159 6891 Clean up new table
      # TableParser:3445 6891 Checking `test`.`_t_prnt_new`
      # TableParser:3451 6891 SHOW TABLES FROM `test` LIKE '\_t\_prnt\_new'
      # TableParser:3462 6891 Table does not exist
      # pt_online_schema_change:9165 6891 New table exists: no
      # pt_online_schema_change:9036 6891 Clean up done, report if orig table was altered
      Altered `test`.`t_prnt` but there were errors or warnings.
      Error updating foreign key constraints: 2019-07-20T11:11:48 DBD::mysql::db do failed: Lock wait timeout exceeded; try restarting transaction [for Statement "ALTER TABLE `test`.`t_chld` DROP FOREIGN KEY `t_chld_ibfk_1`, ADD CONSTRAINT `_t_chld_ibfk_1` FOREIGN KEY (`prnt_id`) REFERENCES `test`.`t_prnt` (`prnt_id`) ON DELETE CASCADE"] at ./pt-online-schema-change line 10916.
      
      # NibbleIterator:6175 6891 Finish nibble_sth
      # NibbleIterator:6175 6891 Finish explain_nibble_sth
      # Cxn:4096 6891 Destroying cxn
      # Cxn:4106 6891 DBI::db=HASH(0x5649d74176b0) Disconnecting dbh on 2228f53cb7c8 
      # Cxn:4096 6891 Destroying cxn
      # Cxn:4106 6891 DBI::db=HASH(0x5649d7410848) Disconnecting dbh on 2228f53cb7c8 
      

      At the end there are:
      _t_prnt_old old table
      t_chld references to the old table: CONSTRAINT `t_chld_ibfk_1` FOREIGN KEY (`prnt_id`) REFERENCES `_t_prnt_old` (`prnt_id`) ON DELETE CASCADE

      Due to online nature of pt-osc, the application could continue rely on delete cascade:
      Unfortunately
      delete from t_prnt where prnt_id=1;
      will not delete rows from t_chld after rename stage.
      At the same time values inserted during rebuild constraints stage are not going to the old table.

      Thus in order to recover after such faults:
      1. stop any update against parent and child tables
      2. investigate every delete and update statement in binary logs against t_prnt and apply same changes against _t_prnt_old

      Possible fixes:
      Do not rename _t_prnt_new to t_prnt until rebuild finished:
      Application will get errors, but new and old tables will be exactly the same.

      Attachments

        Activity

          People

            Unassigned Unassigned
            nickolay.ihalainen Nickolay Ihalainen
            Votes:
            1 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - Not Specified
                Not Specified
                Logged:
                Time Spent - 3 days, 2 hours, 40 minutes
                3d 2h 40m

                Smart Checklist