Uploaded image for project: 'Percona Toolkit'
  1. Percona Toolkit
  2. PT-2048

pt-osc spans excessive connections to the replica when executing in the source

Details

    • Bug
    • Status: Done
    • Medium
    • Resolution: Fixed
    • 3.3.1
    • 3.5.3
    • None
    • None
    • Yes
    • 2

    Description

      pt-osc opened 1600 connections to the replica after running 3 mins in the source/master: 

      --TOTAL_CONNECTIONS in the replica when pt-osc start running
      mysql> select users.*,now() from performance_schema.users where user like 'root%';
      +------+---------------------+-------------------+---------------------+
      | USER | CURRENT_CONNECTIONS | TOTAL_CONNECTIONS | now()               |
      +------+---------------------+-------------------+---------------------+
      | root |                   2 |              1586 | 2022-02-12 05:29:06 |
      +------+---------------------+-------------------+---------------------+
      1 row in set (0.00 sec)
      
      --TOTAL_CONNECTIONS in the replica when pt-osc ends
      mysql> select users.*,now() from performance_schema.users where user like 'root%';
      +------+---------------------+-------------------+---------------------+
      | USER | CURRENT_CONNECTIONS | TOTAL_CONNECTIONS | now()               |
      +------+---------------------+-------------------+---------------------+
      | root |                   2 |              3167 | 2022-02-12 05:33:01 |
      +------+---------------------+-------------------+---------------------+
      1 row in set (0.00 sec)

      pt-osc completes successfully

      # pt-online-schema-change --alter-foreign-keys-method rebuild_constraints --channel $(hostname) --check-interval 60 --chunk-index PRIMARY --chunk-size-limit 255 --chunk-time 0.25 --critical-load Threads_running=1000 --host localhost --max-lag 60 --max-load Threads_running=100 --no-check-alter --no-check-plan --no-check-replication-filters --password verysecretpassword1^ --preserve-triggers --recurse 1 --recursion-method processlist --set-vars wait_timeout=10000 --slave-password verysecretpassword1^ --slave-user root --user root --execute --alter "force" D=dbtest,t=joinit1
      Found 1 slaves:
      phong-dinh-node4 -> phong-dinh-node4.lxd:socket
      Will check slave lag on:
      phong-dinh-node4 -> phong-dinh-node4.lxd:socket
      Operation, tries, wait:
        analyze_table, 10, 1
        copy_rows, 10, 0.25
        create_triggers, 10, 1
        drop_triggers, 10, 1
        swap_tables, 10, 1
        update_foreign_keys, 10, 1
      No foreign keys reference `dbtest`.`joinit1`; ignoring --alter-foreign-keys-method.
      Altering `dbtest`.`joinit1`...
      Creating new table...
      Created new table dbtest._joinit1_new OK.

       

      It's hard to capture pt-osc did in the processlist of the replica. But from the slow log, pt-osc just do nothing useful.

      # administrator command: Quit;
      # Time: 2022-02-12T05:24:22.305769Z
      # User@Host: root[root] @ phong-dinh-node3.lxd [10.124.33.40] Id: 1588
      # Schema: Last_errno: 0 Killed: 0
      # Query_time: 0.001378 Lock_time: 0.000052 Rows_sent: 1 Rows_examined: 1184 Rows_affected: 0 Bytes_sent: 211
      SET timestamp=1644643462;
      SHOW VARIABLES LIKE 'innodb\_lock_wait_timeout';
      # Time: 2022-02-12T05:24:22.305928Z
      # User@Host: root[root] @ phong-dinh-node3.lxd [10.124.33.40] Id: 1588
      # Schema: Last_errno: 0 Killed: 0
      # Query_time: 0.000088 Lock_time: 0.000000 Rows_sent: 0 Rows_examined: 0 Rows_affected: 0 Bytes_sent: 11
      SET timestamp=1644643462;
      SET SESSION innodb_lock_wait_timeout=1;
      # Time: 2022-02-12T05:24:22.307041Z
      # User@Host: root[root] @ phong-dinh-node3.lxd [10.124.33.40] Id: 1588
      # Schema: Last_errno: 0 Killed: 0
      # Query_time: 0.001019 Lock_time: 0.000055 Rows_sent: 1 Rows_examined: 1184 Rows_affected: 0 Bytes_sent: 210
      SET timestamp=1644643462;
      SHOW VARIABLES LIKE 'lock\_wait_timeout';
      # Time: 2022-02-12T05:24:22.307221Z
      # User@Host: root[root] @ phong-dinh-node3.lxd [10.124.33.40] Id: 1588
      # Schema: Last_errno: 0 Killed: 0
      # Query_time: 0.000102 Lock_time: 0.000000 Rows_sent: 0 Rows_examined: 0 Rows_affected: 0 Bytes_sent: 11
      SET timestamp=1644643462;
      SET SESSION lock_wait_timeout=60; 

      it seems pt-osc opens new connection after each chunk just to check replication lag and this is unnecessarily expensive.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              phong.dinh Phong Dinh (Inactive)
              Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Smart Checklist