Details
-
Bug
-
Status: Done
-
Medium
-
Resolution: Fixed
-
3.3.1
-
None
-
None
-
Yes
-
2
Description
pt-osc opened 1600 connections to the replica after running 3 mins in the source/master:
--TOTAL_CONNECTIONS in the replica when pt-osc start running mysql> select users.*,now() from performance_schema.users where user like 'root%'; +------+---------------------+-------------------+---------------------+ | USER | CURRENT_CONNECTIONS | TOTAL_CONNECTIONS | now() | +------+---------------------+-------------------+---------------------+ | root | 2 | 1586 | 2022-02-12 05:29:06 | +------+---------------------+-------------------+---------------------+ 1 row in set (0.00 sec) --TOTAL_CONNECTIONS in the replica when pt-osc ends mysql> select users.*,now() from performance_schema.users where user like 'root%'; +------+---------------------+-------------------+---------------------+ | USER | CURRENT_CONNECTIONS | TOTAL_CONNECTIONS | now() | +------+---------------------+-------------------+---------------------+ | root | 2 | 3167 | 2022-02-12 05:33:01 | +------+---------------------+-------------------+---------------------+ 1 row in set (0.00 sec)
pt-osc completes successfully
# pt-online-schema-change --alter-foreign-keys-method rebuild_constraints --channel $(hostname) --check-interval 60 --chunk-index PRIMARY --chunk-size-limit 255 --chunk-time 0.25 --critical-load Threads_running=1000 --host localhost --max-lag 60 --max-load Threads_running=100 --no-check-alter --no-check-plan --no-check-replication-filters --password verysecretpassword1^ --preserve-triggers --recurse 1 --recursion-method processlist --set-vars wait_timeout=10000 --slave-password verysecretpassword1^ --slave-user root --user root --execute --alter "force" D=dbtest,t=joinit1 Found 1 slaves: phong-dinh-node4 -> phong-dinh-node4.lxd:socket Will check slave lag on: phong-dinh-node4 -> phong-dinh-node4.lxd:socket Operation, tries, wait: analyze_table, 10, 1 copy_rows, 10, 0.25 create_triggers, 10, 1 drop_triggers, 10, 1 swap_tables, 10, 1 update_foreign_keys, 10, 1 No foreign keys reference `dbtest`.`joinit1`; ignoring --alter-foreign-keys-method. Altering `dbtest`.`joinit1`... Creating new table... Created new table dbtest._joinit1_new OK.
It's hard to capture pt-osc did in the processlist of the replica. But from the slow log, pt-osc just do nothing useful.
# administrator command: Quit; # Time: 2022-02-12T05:24:22.305769Z # User@Host: root[root] @ phong-dinh-node3.lxd [10.124.33.40] Id: 1588 # Schema: Last_errno: 0 Killed: 0 # Query_time: 0.001378 Lock_time: 0.000052 Rows_sent: 1 Rows_examined: 1184 Rows_affected: 0 Bytes_sent: 211 SET timestamp=1644643462; SHOW VARIABLES LIKE 'innodb\_lock_wait_timeout'; # Time: 2022-02-12T05:24:22.305928Z # User@Host: root[root] @ phong-dinh-node3.lxd [10.124.33.40] Id: 1588 # Schema: Last_errno: 0 Killed: 0 # Query_time: 0.000088 Lock_time: 0.000000 Rows_sent: 0 Rows_examined: 0 Rows_affected: 0 Bytes_sent: 11 SET timestamp=1644643462; SET SESSION innodb_lock_wait_timeout=1; # Time: 2022-02-12T05:24:22.307041Z # User@Host: root[root] @ phong-dinh-node3.lxd [10.124.33.40] Id: 1588 # Schema: Last_errno: 0 Killed: 0 # Query_time: 0.001019 Lock_time: 0.000055 Rows_sent: 1 Rows_examined: 1184 Rows_affected: 0 Bytes_sent: 210 SET timestamp=1644643462; SHOW VARIABLES LIKE 'lock\_wait_timeout'; # Time: 2022-02-12T05:24:22.307221Z # User@Host: root[root] @ phong-dinh-node3.lxd [10.124.33.40] Id: 1588 # Schema: Last_errno: 0 Killed: 0 # Query_time: 0.000102 Lock_time: 0.000000 Rows_sent: 0 Rows_examined: 0 Rows_affected: 0 Bytes_sent: 11 SET timestamp=1644643462; SET SESSION lock_wait_timeout=60;
it seems pt-osc opens new connection after each chunk just to check replication lag and this is unnecessarily expensive.