Details
-
Bug
-
Status: Done
-
Medium
-
Resolution: Fixed
-
5.7.30-31.43, 5.7.31-31.45
-
None
-
None
Description
It is possible to hit a writer node deadlock when ALTER TABLE is run few times on another node, concurrent with a read/write load on the writer node.
Reproduction steps:
$ dbdeployer --sandbox-binary /opt/percona_xtradb_cluster deploy --topology=pxc replication 5.7.30
Change wsrep_slave_threads to 8 or 16 (not sure how critical it is to do this, default with dbdeployer is 2).
$ sysbench /usr/share/sysbench/oltp_read_write.lua \ --db-driver=mysql --auto_inc=off \ --tables=1 --table_size=1000000 \ --mysql-host=127.0.0.1 --mysql-port=26630 \ --mysql-user=root --mysql-password='msandbox' \ --mysql-db=test prepare $ sysbench /usr/share/sysbench/oltp_read_write.lua \ --db-driver=mysql --db-ps-mode=disable --skip_trx=on \ --mysql-user=root --mysql-password='msandbox' \ --mysql-db=test --mysql-host=127.0.0.1 --mysql-port=26630 \ --mysql-ignore-errors=all --table_size=1000000 \ --tables=1 --threads=40 --report-interval=1 \ --time=0 --events=0 --rate=100 run | grep tps
On any other node run (default, TOI mode):
ALTER TABLE sbtest1 ENGINE=InnoDB; ALTER TABLE sbtest1 ENGINE=InnoDB; ... copy as many times as you want, 10 should do ...
After a while, writer node will get locked up with a similar processlist:
+-----+-------------+-----------------+------+---------+-------+------------------------------------------------------+----------------------------------------- | Id | User | Host | db | Command | Time | State | Info +-----+-------------+-----------------+------+---------+-------+------------------------------------------------------+----------------------------------------- | 1 | system user | | NULL | Sleep | 36435 | wsrep: aborter idle | NULL | 2 | system user | | NULL | Sleep | 32702 | wsrep: committed write set (1266293) | NULL | 4 | system user | | NULL | Sleep | 32693 | wsrep: committed write set (1269834) | NULL | 48 | root | localhost:42650 | test | Query | 32654 | wsrep: initiating pre-commit for write set (1336952) | UPDATE sbtest1 SET k=k+1 WHERE id=500510 ... | 86 | root | localhost:42742 | test | Query | 32654 | wsrep: initiating pre-commit for write set (1336940) | UPDATE sbtest1 SET k=k+1 WHERE id=404705 | 87 | root | localhost:42744 | test | Query | 32654 | wsrep: initiating pre-commit for write set (1336942) | UPDATE sbtest1 SET k=k+1 WHERE id=531825 | 88 | system user | | NULL | Sleep | 32719 | wsrep: applier idle | NULL ... | 98 | system user | | NULL | Sleep | 32701 | wsrep: committed write set (1266747) | NULL | 99 | system user | | | Sleep | 32689 | Waiting for table metadata lock | alter table test.sbtest1 engine=innodb | 100 | system user | | NULL | Sleep | 32696 | wsrep: committed write set (1268536) | NULL | 101 | system user | | NULL | Sleep | 32692 | wsrep: committed write set (1270277) | NULL | 105 | root | localhost:56760 | test | Sleep | 1266 | | NULL | 110 | root | localhost:59474 | test | Sleep | 1124 | | NULL | 111 | root | localhost:59476 | test | Sleep | 1124 | | NULL | 112 | root | localhost:52090 | NULL | Query | 0 | starting | show processlist +-----+-------------+-----------------+------+---------+-------+------------------------------------------------------+-----------------------------------------
Issue looks similar to PXC-2382.
Attached are:
- thread apply (all) bt outputs
- processlist
Over at highram there are few sets of core dumps and all the other details I gathered.