-
Type:
Bug
-
Status: Done
-
Priority:
Medium
-
Resolution: Fixed
-
Affects Version/s: 5.6.45-28.36, 5.7.27-31.39
-
Fix Version/s: 5.6.47-28.40, 5.7.29-31.43
-
Component/s: None
-
Labels:None
In the situation when a node runs a procedure, accessing a table receives update from another node which could be resolved using InnoDB Buffer Pool Prefetch cache, this node behaves slowly having many "WSREP: BF lock wait long"
How to Repeat.
1. Download *tar.gz package, cd to the mysql-test directory:
2. Start MTR:
]$ ./mtr --suite=galera_3nodes --mysqld=--wsrep-provider=`pwd`/../lib/libgalera_smm.so --start galera_certification_ccc --mysqld=--wsrep_debug=on &
3. Load attached dump:
../bin/mysql -h127.0.0.1 -P13010 -uroot test < t1.sql
4. Connect to node 1 and create a procedure:
create procedure p1() begin DECLARE done INT DEFAULT FALSE; DECLARE c INT DEFAULT 0; DECLARE pk1v VARCHAR(32); DECLARE pk2v VARCHAR(127); DECLARE orderval INT DEFAULT 0; DECLARE printed INT DEFAULT FALSE; declare pk2c cursor for SELECT DISTINCT pk2, pk1, ordercol FROM t1 ORDER BY ordercol; DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE; start transaction; update t1 set sp1=0, sp2=UNIX_TIMESTAMP() where sp1=1; open pk2c; pk2c_loop: LOOP FETCH pk2c INTO pk2v, pk1v, orderval; IF done THEN LEAVE pk2c_loop; END IF; SET c=c+1; UPDATE t1 SET sp1=1, sp2=UNIX_TIMESTAMP() where pk2=pk2v and pk1=pk1v; IF orderval > 1075858151 THEN IF NOT printed THEN SELECT 'Now run updates'; SELECT SLEEP(100); SET printed=TRUE; ELSE LEAVE pk2c_loop; END IF; END IF; END LOOP; close pk2c; update t1 set sp1=10, sp2=UNIX_TIMESTAMP() WHERE pk1='animi iste et odio cupiditate qu' and pk2='voluptatem ipsa dolores ducimus.'; update t1 set sp1=11, sp2=UNIX_TIMESTAMP() WHERE pk1='animi iste et odio cupiditate qu' and pk2='voluptatem ipsa dolores ducimus.'; COMMIT; END
5. Run procedure for the first time:
mysql> call p1()| +-----------------+ | Now run updates | +-----------------+ | Now run updates | +-----------------+ 1 row in set (3.22 sec) +------------+ | SLEEP(100) | +------------+ | 0 | +------------+ 1 row in set (1 min 43.22 sec) Query OK, 0 rows affected (1 min 43.36 sec)
6. Once you see "Now run updates" go to another terminal, connect to node 2 and run updates:
../bin/mysql -h127.0.0.1 -P13014 -uroot test -e "update t1 set u=2 WHERE pk1='a alias rerum consequuntur eveni' and pk2='excepturi laboriosam incidunt error in.';" & ../bin/mysql -h127.0.0.1 -P13014 -uroot test -e "update t1 set u=1 WHERE pk1='a alias rerum consequuntur eveni' and pk2='excepturi laboriosam incidunt error in.';" &
So far, so good. You won't see any issue.
7. Once the procedure finishes executing run it second time:
mysql> call p1()| +-----------------+ | Now run updates | +-----------------+ | Now run updates | +-----------------+ 1 row in set (3.36 sec)
8. Once you see "Now run updates" go to another window and run:
$ ../bin/mysql -h127.0.0.1 -P13014 -uroot test -e "update t1 set u=2 WHERE pk1='a alias rerum consequuntur eveni' and pk2='excepturi laboriosam incidunt error in.';" &
9. Check SHOW PROCESSLIST on the first node:
$ time ../bin/mysql -h127.0.0.1 -P13010 -uroot -e "show processlist" +----+-------------+-----------------+------+---------+------+-------------------------------------------------+------------------+-----------+---------------+ | Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined | +----+-------------+-----------------+------+---------+------+-------------------------------------------------+------------------+-----------+---------------+ | 1 | system user | | NULL | Sleep | 2064 | wsrep: aborter idle | NULL | 0 | 0 | | 2 | system user | | NULL | Sleep | 1593 | System lock | NULL | 0 | 0 | | 11 | root | localhost:34824 | test | Query | 1593 | wsrep: initiating pre-commit for write set (12) | COMMIT | 2 | 35016 | | 14 | root | localhost:49304 | NULL | Query | 0 | starting | show processlist | 0 | 0 | +----+-------------+-----------------+------+---------+------+-------------------------------------------------+------------------+-----------+---------------+ real 0m30.022s user 0m0.010s sys 0m0.008s
Notice that thread in "System lock" and "wsrep: initiating pre-commit for write set (12)" states are runing for too long. Also SHOW PROCESSLIST command runs ridicoulosly 30 seconds.
10. Now switch to the terminal where you have procedure on the first node running and find it was sleeping less than requested 100 seconds:
+------------+ | SLEEP(100) | +------------+ | 1 | +------------+ 1 row in set (6.87 sec)
11. Check error log:
2019-10-24 15:00:37 41723 [Note] WSREP: cluster conflict due to high priority abort for threads: 2019-10-24 15:00:37 41723 [Note] WSREP: Winning thread: THD: 2, mode: applier, state: executing, conflict: no conflict, seqno: 11 SQL: (null) 2019-10-24 15:00:37 41723 [Note] WSREP: Victim thread: THD: 8, mode: local, state: executing, conflict: no conflict, seqno: -1 SQL: SELECT SLEEP(100) 2019-10-24 15:00:37 41723 [Note] WSREP: BF kill (1, seqno: 11), victim: (8) trx: 1385 2019-10-24 15:00:37 41723 [Note] WSREP: Aborting query: SELECT SLEEP(100) conf 0 trx: 1385 2019-10-24 15:00:37 41723 [Note] WSREP: kill trx QUERY_EXEC for 1385 2019-10-24 15:00:37 41723 [Note] WSREP: kill query for: 8 2019-10-24 15:00:40 41723 [Note] WSREP: wsrep_sync_wait: thd->variables.wsrep_sync_wait = 15, mask = 1 WSREP: BF lock wait long
Killing query, executing the procedure, does not help. All subsequent accesses to this node continue running slowly, queries on the t1 table hanging or getting canceled due to lock wait timeout.
Log provided for version 5.6.45, but the issue is repeatable with version 5.7.27 too.
- relates to
-
PXC-3243 Modify the BF-abort process to propagate and abort and retry the Stored Procedure instead of the statement
-
- Done
-
-
PXC-2687 BF abort of a SP causes a write-after-free error
-
- On Hold
-
-
PXC-2910 BF aborting a stored procedure with an exception handler can hang PXC
-
- On Hold
-
-
PXC-2729 BF-aborting a SLEEP in an SP can cause the cluster to deadlock
-
- Done
-