Uploaded image for project: 'Percona XtraDB Cluster'
  1. Percona XtraDB Cluster
  2. PXC-2382

Deadlock for alter table (online DDL) and conflicting replace statement

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Done
    • Priority: Medium
    • Resolution: Fixed
    • Affects Version/s: 5.7.24-31.33
    • Fix Version/s: 5.7.27-31.39
    • Component/s: None
    • Labels:
      None

      Description

      REPLACE statements changing the same row running in parallel on multiple nodes locking alter table with MDL.

      mysql> show processlist;
      +----+-------------+-----------+------+---------+-------+------------------------------------------------------+------------------------------------------------+-----------+---------------+
      | Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined |
      +----+-------------+-----------+------+---------+-------+------------------------------------------------------+------------------------------------------------+-----------+---------------+
      | 1 | system user | | NULL | Sleep | 13955 | wsrep: applier idle | NULL | 0 | 0 |
      | 2 | system user | | NULL | Sleep | 13955 | wsrep: aborter idle | NULL | 0 | 0 |
      | 4 | system user | | NULL | Sleep | 13954 | wsrep: applier idle | NULL | 0 | 0 |
      | 26 | root | localhost | NULL | Query | 3424 | Waiting for table metadata lock | alter table test.t add key(pad) | 0 | 0 |
      | 27 | root | localhost | NULL | Query | 2998 | wsrep: initiating pre-commit for write set (8065701) | commit | 0 | 0 |
      | 28 | root | localhost | NULL | Query | 1250 | Waiting for table metadata lock | replace test.t (id,pad) values(1000000,rand()) | 0 | 0 |
      | 29 | root | localhost | NULL | Query | 1250 | Waiting for table metadata lock | replace test.t (id,pad) values(1000000,rand()) | 0 | 0 |
      | 30 | root | localhost | NULL | Query | 1250 | Waiting for table metadata lock | replace test.t (id,pad) values(1000000,rand()) | 0 | 0 |
      | 31 | root | localhost | NULL | Query | 1250 | Waiting for table metadata lock | replace test.t (id,pad) values(1000000,rand()) | 0 | 0 |
      | 32 | root | localhost | NULL | Query | 1250 | Waiting for table metadata lock | replace test.t (id,pad) values(1000000,rand()) | 0 | 0 |
      | 33 | root | localhost | NULL | Query | 1250 | Waiting for table metadata lock | replace test.t (id,pad) values(1000000,rand()) | 0 | 0 |
      | 34 | root | localhost | NULL | Query | 1250 | Waiting for table metadata lock | replace test.t (id,pad) values(1000000,rand()) | 0 | 0 |
      | 44 | root | localhost | NULL | Query | 0 | starting | show processlist | 0 | 0 |
      +----+-------------+-----------+------+---------+-------+------------------------------------------------------+------------------------------------------------+-----------+---------------+

      Second node:

       mysql> show processlist;
      +----+-------------+-----------+------+---------+-------+------------------------------------------------------+------------------------------------------------+-----------+---------------+
      | Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined |
      +----+-------------+-----------+------+---------+-------+------------------------------------------------------+------------------------------------------------+-----------+---------------+
      | 1 | system user | | NULL | Sleep | 11203 | wsrep: aborter idle | NULL | 0 | 0 |
      | 2 | system user | | | Sleep | 3725 | Waiting for table metadata lock | alter table test.t add key(pad) | 0 | 0 |
      | 4 | system user | | NULL | Sleep | 3767 | wsrep: committed write set (8000164) | NULL | 0 | 0 |
      | 7 | root | localhost | NULL | Query | 1633 | Waiting for table metadata lock | replace test.t (id,pad) values(1000000,rand()) | 0 | 0 |
      | 8 | root | localhost | NULL | Query | 1633 | Waiting for table metadata lock | replace test.t (id,pad) values(1000000,rand()) | 0 | 0 |
      | 9 | root | localhost | NULL | Query | 1633 | Waiting for table metadata lock | replace test.t (id,pad) values(1000000,rand()) | 0 | 0 |
      | 10 | root | localhost | NULL | Query | 1633 | Waiting for table metadata lock | replace test.t (id,pad) values(1000000,rand()) | 0 | 0 |
      | 11 | root | localhost | NULL | Killed | 3299 | wsrep: initiating pre-commit for write set (8065700) | commit | 0 | 0 |
      | 12 | root | localhost | NULL | Query | 1633 | Waiting for table metadata lock | replace test.t (id,pad) values(1000000,rand()) | 0 | 0 |
      | 13 | root | localhost | NULL | Query | 1633 | Waiting for table metadata lock | replace test.t (id,pad) values(1000000,rand()) | 0 | 0 |
      | 14 | root | localhost | NULL | Query | 1633 | Waiting for table metadata lock | replace test.t (id,pad) values(1000000,rand()) | 0 | 0 |
      | 21 | root | localhost | NULL | Query | 0 | starting | show processlist | 0 | 0 |
      +----+-------------+-----------+------+---------+-------+------------------------------------------------------+------------------------------------------------+-----------+---------------+
      

       Very similar issue happens on production system:

      1. There is an application thread executing several replace queries in transaction
      2. If commit returns error application re-sends last replace statement and executing commit statement
      3. One of these commit staled in "wsrep: initiating pre-commit for write set" (doing galera::Monitor<galera::ReplicatorSMM::ApplyOrder>::self_cancel)
      4. alter table statement (online index creation) is trying to get exclusive MDL.

      How to Reproduce

      I have a script:

      #!/bin/bash
      for i in `seq 8` ; do
       docker exec pxc1 bash -c "while true ; do echo \"replace test.t (id,pad) values(1000000,rand());commit;\";done|mysql -uroot -psecret -f" &> pxc1.out.$i &
       docker exec pxc2 bash -c "while true ; do echo \"replace test.t (id,pad) values(1000000,rand());commit;\";done|mysql -uroot -psecret -f" &> pxc2.out.$i &
      done
      wait

      and a cluster created from:

      docker run -it --name pxc1 -e CLUSTER_NAME=pxc -e MYSQL_ROOT_PASSWORD=secret percona/percona-xtradb-cluster
      docker run -it --name pxc2 -v $PWD/entrypoint.sh:/entrypoint.sh -e CLUSTER_NAME=pxc -e MYSQL_ROOT_PASSWORD=secret percona/percona-xtradb-cluster --wsrep_cluster_address=gcomm://172.17.0.2

      entrypoint.sh in my local directory is a modified one from percona/percona-xtradb-cluster:
      I've commented out mysql_tzinfo_to_sql and added SET @@SESSION.wsrep_on=0; before SET @@SESSION.SQL_LOG_BIN=0;

      On both nodes I've set global trx_commit=0, sync_binlog=0, autocommit=0

      The table

      create table t(id int primary key, pad char(200));

      filled the table with 8M records (numbers from 1-8M, pad = id).
      when load.sh workload started, executed alter table test.t add key(pad); on the first node.
      after 2k seconds killed commit in "wsrep: initiating replication for write set (-1) " state on the first node and after that commit (non-system thread) on the second node.

      From this point I have a stale with MDL waiting

        Attachments

          Activity

            People

            Assignee:
            krunal.bauskar Krunal Bauskar (Inactive)
            Reporter:
            nickolay.ihalainen Nickolay Ihalainen
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - Not Specified
                Not Specified
                Logged:
                Time Spent - 1 day, 5 hours, 45 minutes
                1d 5h 45m

                  Smart Checklist