-
Type:
Bug
-
Status: Done
-
Priority:
Medium
-
Resolution: Fixed
-
Affects Version/s: 5.7.24-31.33
-
Fix Version/s: 5.7.27-31.39
-
Component/s: None
-
Labels:None
REPLACE statements changing the same row running in parallel on multiple nodes locking alter table with MDL.
mysql> show processlist; +----+-------------+-----------+------+---------+-------+------------------------------------------------------+------------------------------------------------+-----------+---------------+ | Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined | +----+-------------+-----------+------+---------+-------+------------------------------------------------------+------------------------------------------------+-----------+---------------+ | 1 | system user | | NULL | Sleep | 13955 | wsrep: applier idle | NULL | 0 | 0 | | 2 | system user | | NULL | Sleep | 13955 | wsrep: aborter idle | NULL | 0 | 0 | | 4 | system user | | NULL | Sleep | 13954 | wsrep: applier idle | NULL | 0 | 0 | | 26 | root | localhost | NULL | Query | 3424 | Waiting for table metadata lock | alter table test.t add key(pad) | 0 | 0 | | 27 | root | localhost | NULL | Query | 2998 | wsrep: initiating pre-commit for write set (8065701) | commit | 0 | 0 | | 28 | root | localhost | NULL | Query | 1250 | Waiting for table metadata lock | replace test.t (id,pad) values(1000000,rand()) | 0 | 0 | | 29 | root | localhost | NULL | Query | 1250 | Waiting for table metadata lock | replace test.t (id,pad) values(1000000,rand()) | 0 | 0 | | 30 | root | localhost | NULL | Query | 1250 | Waiting for table metadata lock | replace test.t (id,pad) values(1000000,rand()) | 0 | 0 | | 31 | root | localhost | NULL | Query | 1250 | Waiting for table metadata lock | replace test.t (id,pad) values(1000000,rand()) | 0 | 0 | | 32 | root | localhost | NULL | Query | 1250 | Waiting for table metadata lock | replace test.t (id,pad) values(1000000,rand()) | 0 | 0 | | 33 | root | localhost | NULL | Query | 1250 | Waiting for table metadata lock | replace test.t (id,pad) values(1000000,rand()) | 0 | 0 | | 34 | root | localhost | NULL | Query | 1250 | Waiting for table metadata lock | replace test.t (id,pad) values(1000000,rand()) | 0 | 0 | | 44 | root | localhost | NULL | Query | 0 | starting | show processlist | 0 | 0 | +----+-------------+-----------+------+---------+-------+------------------------------------------------------+------------------------------------------------+-----------+---------------+
Second node:
mysql> show processlist; +----+-------------+-----------+------+---------+-------+------------------------------------------------------+------------------------------------------------+-----------+---------------+ | Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined | +----+-------------+-----------+------+---------+-------+------------------------------------------------------+------------------------------------------------+-----------+---------------+ | 1 | system user | | NULL | Sleep | 11203 | wsrep: aborter idle | NULL | 0 | 0 | | 2 | system user | | | Sleep | 3725 | Waiting for table metadata lock | alter table test.t add key(pad) | 0 | 0 | | 4 | system user | | NULL | Sleep | 3767 | wsrep: committed write set (8000164) | NULL | 0 | 0 | | 7 | root | localhost | NULL | Query | 1633 | Waiting for table metadata lock | replace test.t (id,pad) values(1000000,rand()) | 0 | 0 | | 8 | root | localhost | NULL | Query | 1633 | Waiting for table metadata lock | replace test.t (id,pad) values(1000000,rand()) | 0 | 0 | | 9 | root | localhost | NULL | Query | 1633 | Waiting for table metadata lock | replace test.t (id,pad) values(1000000,rand()) | 0 | 0 | | 10 | root | localhost | NULL | Query | 1633 | Waiting for table metadata lock | replace test.t (id,pad) values(1000000,rand()) | 0 | 0 | | 11 | root | localhost | NULL | Killed | 3299 | wsrep: initiating pre-commit for write set (8065700) | commit | 0 | 0 | | 12 | root | localhost | NULL | Query | 1633 | Waiting for table metadata lock | replace test.t (id,pad) values(1000000,rand()) | 0 | 0 | | 13 | root | localhost | NULL | Query | 1633 | Waiting for table metadata lock | replace test.t (id,pad) values(1000000,rand()) | 0 | 0 | | 14 | root | localhost | NULL | Query | 1633 | Waiting for table metadata lock | replace test.t (id,pad) values(1000000,rand()) | 0 | 0 | | 21 | root | localhost | NULL | Query | 0 | starting | show processlist | 0 | 0 | +----+-------------+-----------+------+---------+-------+------------------------------------------------------+------------------------------------------------+-----------+---------------+
Very similar issue happens on production system:
- There is an application thread executing several replace queries in transaction
- If commit returns error application re-sends last replace statement and executing commit statement
- One of these commit staled in "wsrep: initiating pre-commit for write set" (doing galera::Monitor<galera::ReplicatorSMM::ApplyOrder>::self_cancel)
- alter table statement (online index creation) is trying to get exclusive MDL.
How to Reproduce
I have a script:
#!/bin/bash for i in `seq 8` ; do docker exec pxc1 bash -c "while true ; do echo \"replace test.t (id,pad) values(1000000,rand());commit;\";done|mysql -uroot -psecret -f" &> pxc1.out.$i & docker exec pxc2 bash -c "while true ; do echo \"replace test.t (id,pad) values(1000000,rand());commit;\";done|mysql -uroot -psecret -f" &> pxc2.out.$i & done wait
and a cluster created from:
docker run -it --name pxc1 -e CLUSTER_NAME=pxc -e MYSQL_ROOT_PASSWORD=secret percona/percona-xtradb-cluster docker run -it --name pxc2 -v $PWD/entrypoint.sh:/entrypoint.sh -e CLUSTER_NAME=pxc -e MYSQL_ROOT_PASSWORD=secret percona/percona-xtradb-cluster --wsrep_cluster_address=gcomm://172.17.0.2
entrypoint.sh in my local directory is a modified one from percona/percona-xtradb-cluster:
I've commented out mysql_tzinfo_to_sql and added SET @@SESSION.wsrep_on=0; before SET @@SESSION.SQL_LOG_BIN=0;
On both nodes I've set global trx_commit=0, sync_binlog=0, autocommit=0
The table
create table t(id int primary key, pad char(200));
filled the table with 8M records (numbers from 1-8M, pad = id).
when load.sh workload started, executed alter table test.t add key(pad); on the first node.
after 2k seconds killed commit in "wsrep: initiating replication for write set (-1) " state on the first node and after that commit (non-system thread) on the second node.
From this point I have a stale with MDL waiting