Uploaded image for project: 'Percona Server for MySQL'
  1. Percona Server for MySQL
  2. PS-7197

Multi-threaded Replica hangs when slave_trans_retires gets exhausted

Details

    • Bug
    • Status: Done
    • Medium
    • Resolution: Fixed
    • 8.0.19-10, 5.7.30-33
    • 8.0.21-12, 5.7.32-35
    • None
    • None

    Description

      Originally reported at https://bugs.mysql.com/bug.php?id=99440

      Description

      Multi-threaded slave randomly stacks, fails with lock wait timeout, or asserts.

      How to Repeat

      1. Start replication setup using MTR:

      $ ./mtr --suite=rpl --start --mysqld=--binlog_format=mixed --mysqld=--slave_preserve_commit_order=1 --mysqld=--slave_parallel_workers=16 --mysqld=--slave_parallel_type=logical_clock --mysqld=--binlog_order_commits=1 --mysqld=--log_slave_updates=1 rpl_alter &
      

      2. Connect to the slave and run:

      set global innodb_lock_wait_timeout=1;
      CHANGE MASTER TO master_host='127.0.0.1', master_port=13000, master_user='root';
      start slave;
      

      3. On the master:

      CREATE TABLE `table_name` (   `id` bigint(20) NOT NULL AUTO_INCREMENT,   `a_id` bigint(20) NOT NULL,   `b_id` bigint(20) NOT NULL,   `c_id` smallint(5) NOT NULL,   `d_id` varchar(255) NOT NULL DEFAULT '',   `created_at` datetime NOT NULL,   PRIMARY KEY (`id`),   UNIQUE KEY `unique_table_name` (`a_id`,`b_id`),   KEY `index_b` (`b_id`) ) ENGINE=InnoDB  DEFAULT CHARSET=utf8 COMMENT='table definition';
      

      4. Run mysqlslap job:

      $ ../bin/mysqlslap -uroot -h127.0.0.1 -P13000  --query=' INSERT IGNORE INTO table_name (d_id, b_id, c_id, a_id) VALUES ('20123', 1234203, 3, 12345212);' --create-schema=test --iterations=100 --concurrency=100
      

      5. On the slave observe:

      5.1. Slowly moving Exec_Master_Log_Pos in the SHOW SLAVE STATUS output
      5.2. Long-waiting threads for dependent or preceding transaction
      5.3. After some time slave error:

                     Last_SQL_Errno: 3030
                     Last_SQL_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 4 failed executing transaction 'ANONYMOUS' at master log master-bin.000001, end_log_pos 14803. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
      

      5.4. Then restart slave (if you are not lucky - multiple times) and observe the crash:

      2020-07-21T14:22:45.507956Z 56 [Warning] [MY-010584] [Repl] Slave SQL for channel '': Worker 10 failed executing transaction 'ANONYMOUS' at master log master-bin.000001, end_log_pos 30875; Error 'Lock wait timeout exceeded; try restarting transaction' on query. Default database: 'test'. Query: 'INSERT IGNORE INTO table_name (d_id, b_id, c_id, a_id) VALUES (20123, 1234203, 3, 12345212)', Error_code: MY-001205
      mysqld: /home/sveta/src/percona-server/sql/sql_error.h:381: uint Diagnostics_area::mysql_errno() const: Assertion `m_status == DA_ERROR' failed.
      14:22:45 UTC - mysqld got signal 6 ;
      Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
      Thread pointer: 0x7f7bb40008d0
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 7f7c583cfcd0 thread_stack 0x46000
      /home/sveta/build/ps-8.0/bin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x59) [0x56520ac5ce7f]
      /home/sveta/build/ps-8.0/bin/mysqld(handle_fatal_signal+0x2ac) [0x565209a2351a]
      /lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0) [0x7f7c9579e3c0]
      /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb) [0x7f7c94e2d18b]
      /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b) [0x7f7c94e0c859]
      /lib/x86_64-linux-gnu/libc.so.6(+0x25729) [0x7f7c94e0c729]
      /lib/x86_64-linux-gnu/libc.so.6(+0x36f36) [0x7f7c94e1df36]
      /home/sveta/build/ps-8.0/bin/mysqld(Diagnostics_area::mysql_errno() const+0x3e) [0x5652093afa50]
      /home/sveta/build/ps-8.0/bin/mysqld(Slave_worker::check_and_report_end_of_retries(THD*)+0x1fa) [0x56520a906b14]
      /home/sveta/build/ps-8.0/bin/mysqld(Slave_worker::retry_transaction(unsigned int, unsigned long long, unsigned int, unsigned long long)+0xc3) [0x56520a906c7b]
      /home/sveta/build/ps-8.0/bin/mysqld(slave_worker_exec_job_group(Slave_worker*, Relay_log_info*)+0x4c9) [0x56520a9091ea]
      /home/sveta/build/ps-8.0/bin/mysqld(+0x462d60d) [0x56520a92360d]
      /home/sveta/build/ps-8.0/bin/mysqld(+0x51495a1) [0x56520b43f5a1]
      /lib/x86_64-linux-gnu/libpthread.so.0(+0x9609) [0x7f7c95792609]
      /lib/x86_64-linux-gnu/libc.so.6(clone+0x43) [0x7f7c94f09103]
      
      Trying to get some variables.
      Some pointers may be invalid and cause the dump to abort.
      Query (0): Connection ID (thread ID): 48
      Status: NOT_KILLED
      

      Attachments

        Issue Links

          Activity

            People

              venkatesh.prasad Venkatesh Prasad
              sveta.smirnova Sveta Smirnova
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - Not Specified
                  Not Specified
                  Logged:
                  Time Spent - 2 weeks, 3 days, 2 hours, 30 minutes
                  2w 3d 2h 30m

                  Smart Checklist