Uploaded image for project: 'Percona Toolkit'
  1. Percona Toolkit
  2. PT-178

pt-online-schema-change appears to ignore the --check-slave-lag option

Details

    • Bug
    • Status: Open
    • High
    • Resolution: Unresolved
    • 3.0.3
    • 3.6.0
    • None

    Description

      Using the --check-slave-lag option appears to be failing to check replication delay:

      Sample OSC with check-slave-lag
      [mon1]> pt-online-schema-change --recursion-method=none \
      >   --check-slave-lag=h=10.0.1.14,D=percona,P=3306 \
      >   --max-lag=1 --alter="drop column data" \
      >   --execute h=10.0.1.13,D=sandbox,t=test_osc
      No slaves found.  See --recursion-method if host db1 has slaves.
      Will check slave lag on:
      db2 -> 10.0.1.14:3306
      Operation, tries, wait:
        analyze_table, 10, 1
        copy_rows, 10, 0.25
        create_triggers, 10, 1
        drop_triggers, 10, 1
        swap_tables, 10, 1
        update_foreign_keys, 10, 1
      Altering `sandbox`.`test_osc`...
      Creating new table...
      Created new table sandbox._test_osc_new OK.
      Altering new table...
      Altered `sandbox`.`_test_osc_new` OK.
      2017-07-19T22:46:52 Creating triggers...
      2017-07-19T22:46:52 Created triggers OK.
      2017-07-19T22:46:52 Copying approximately 1 rows...
      2017-07-19T22:46:52 Copied rows OK.
      2017-07-19T22:46:52 Analyzing new table...
      2017-07-19T22:46:52 Swapping tables...
      2017-07-19T22:46:53 Swapped original and new tables OK.
      2017-07-19T22:46:53 Dropping old table...
      2017-07-19T22:46:53 Dropped old table `sandbox`.`_test_osc_old` OK.
      2017-07-19T22:46:53 Dropping triggers...
      2017-07-19T22:46:53 Dropped triggers OK.
      Successfully altered `sandbox`.`test_osc`.
      
      Sample of delay slave
      [mon1]> pt-heartbeat --monitor --host=10.0.1.14 --database=percona
      60.00s [ 21.25s,  4.25s,  1.42s ]
      60.00s [ 22.25s,  4.45s,  1.48s ]
      60.00s [ 23.25s,  4.65s,  1.55s ]
      60.00s [ 24.25s,  4.85s,  1.62s ]
      60.00s [ 25.25s,  5.05s,  1.68s ]
      60.00s [ 26.25s,  5.25s,  1.75s ]
      60.00s [ 27.25s,  5.45s,  1.82s ]
      60.00s [ 28.25s,  5.65s,  1.88s ]
      60.00s [ 29.25s,  5.85s,  1.95s ]
      60.00s [ 30.25s,  6.05s,  2.02s ]
      60.00s [ 31.25s,  6.25s,  2.08s ]
      60.03s [ 32.25s,  6.45s,  2.15s ]
      61.00s [ 33.27s,  6.65s,  2.22s ]
      61.00s [ 34.28s,  6.86s,  2.29s ]
      61.00s [ 35.30s,  7.06s,  2.35s ]
      

      Switching to a DSN table to monitor the single slave works as expected and the OSC waits.

      Sample OSC with DSN table
      [mon1]> pt-online-schema-change --recursion-method=dsn=D=percona,t=dsns \
        --max-lag=1 --alter="drop column data" \
        --execute h=10.0.1.13,D=sandbox,t=test_osc
      Found 1 slaves:
      db2 -> 10.0.1.14:3306
      Will check slave lag on:
      db2 -> 10.0.1.14:3306
      Operation, tries, wait:
        analyze_table, 10, 1
        copy_rows, 10, 0.25
        create_triggers, 10, 1
        drop_triggers, 10, 1
        swap_tables, 10, 1
        update_foreign_keys, 10, 1
      Altering `sandbox`.`test_osc`...
      Creating new table...
      Created new table sandbox._test_osc_new OK.
      Waiting forever for new table `sandbox`.`_test_osc_new` to replicate to db2...
      Waiting for db2:   0% 00:00 remain
      Waiting for db2:   0% 00:00 remain
      Altering new table...
      Altered `sandbox`.`_test_osc_new` OK.
      2017-07-19T22:56:14 Creating triggers...
      2017-07-19T22:56:14 Created triggers OK.
      2017-07-19T22:56:14 Copying approximately 99 rows...
      Replica lag is 59 seconds on db2.  Waiting.
      Replica lag is 60 seconds on db2.  Waiting.
      Replica lag is 60 seconds on db2.  Waiting.
      Replica lag is 59 seconds on db2.  Waiting.
      Replica lag is 59 seconds on db2.  Waiting.
      Replica lag is 60 seconds on db2.  Waiting.
      Replica lag is 60 seconds on db2.  Waiting.
      Replica lag is 59 seconds on db2.  Waiting.
      Replica lag is 60 seconds on db2.  Waiting.
      Replica lag is 59 seconds on db2.  Waiting.
      Replica lag is 59 seconds on db2.  Waiting.
      Replica lag is 60 seconds on db2.  Waiting.
      Replica lag is 59 seconds on db2.  Waiting.
      Replica lag is 60 seconds on db2.  Waiting.
      Replica lag is 59 seconds on db2.  Waiting.
      Replica db2 is stopped.  Waiting.
      2017-07-19T23:04:39 Copied rows OK.
      2017-07-19T23:04:39 Analyzing new table...
      2017-07-19T23:04:39 Swapping tables...
      2017-07-19T23:04:39 Swapped original and new tables OK.
      2017-07-19T23:04:39 Dropping old table...
      2017-07-19T23:04:40 Dropped old table `sandbox`.`_test_osc_old` OK.
      2017-07-19T23:04:40 Dropping triggers...
      2017-07-19T23:04:40 Dropped triggers OK.
      Successfully altered `sandbox`.`test_osc`.
      

      The slave was set with MASTER_DELAY=60 and pt-heartbeat was used to both monitor and also inject traffic to maintain the delay. As the DSN table is available as a workaround I set it to low, but the issue itself is quite significant

      Attachments

        Issue Links

          Activity

            People

              carlos.salguero Carlos Salguero (Inactive)
              ceri.williams Ceri Williams
              Votes:
              2 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:

                Smart Checklist