Details
-
Bug
-
Status: Done
-
Medium
-
Resolution: Fixed
-
5.7.x, 8.0.x
-
None
-
Yes
Description
Assertion failure `m_status == DA_ERROR`
This assertion is observed only when both slave_transaction_retries and slave_preserve_commit_order is enabled on the replica server and is more likely to happen when slave_transaction_retries is set to a lower value.
In general, if a replication applier thread fails to execute a transaction because of an InnoDB deadlock or because the transaction's execution time exceeded InnoDB's innodb_lock_wait_timeout, it automatically retries slave_transaction_retries times before stopping with an error.
And when slave_preserve_commit_order is enabled, the replica server ensures that transactions are externalized on the replica in the same order as they appear in the replica's relay log, and prevents gaps in the sequence of transactions that have been executed from the relay log. If a thread's execution is completed before its preceding thread, then the executing thread waits until all previous transactions are committed before committing.
Similarly, when the waiting thread has locked the rows which are needed by the thread executing the preivous transaction, then the innodb deadlock detection algorithm kicks in and the preceding thread asks the waiting thread to rollback (only if its sequence number is lesser than that of the waiting thread).
When this happens, the waiting thread wakes up from the cond_wait and it gets to know that It was asked to roll back by the other transaction as it was holding a lock that is needed by the other transaction to progress, and goes ahead with retrying the transaction.
But just before the transaction is retried, the worker checks if it encountered any errors during its execution. If there is no error, it simulates ER_LOCK_DEADLOCK error in order for it to be considered as a temporary error so that the worker thread retries the transaction.
However, when the retries are exhausted, the worker thread logs an error into the error log by accessing the thread's diagnostic_area by calling `thd->get_stmt_da()->mysql_errno()`. If the error had been simulate (not called through `my_error` function call), the diagnostic_area would still be empty and thus making the assertion `DBUG_ASSERT(m_status == DA_ERROR);` to fail.
Thread pointer: 0x7f7bb40008d0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 7f7c583cfcd0 thread_stack 0x46000 /home/sveta/build/ps-8.0/bin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x59) [0x56520ac5ce7f] /home/sveta/build/ps-8.0/bin/mysqld(handle_fatal_signal+0x2ac) [0x565209a2351a] /lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0) [0x7f7c9579e3c0] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb) [0x7f7c94e2d18b] /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b) [0x7f7c94e0c859] /lib/x86_64-linux-gnu/libc.so.6(+0x25729) [0x7f7c94e0c729] /lib/x86_64-linux-gnu/libc.so.6(+0x36f36) [0x7f7c94e1df36] /home/sveta/build/ps-8.0/bin/mysqld(Diagnostics_area::mysql_errno() const+0x3e) [0x5652093afa50] /home/sveta/build/ps-8.0/bin/mysqld(Slave_worker::check_and_report_end_of_retries(THD*)+0x1fa) [0x56520a906b14] /home/sveta/build/ps-8.0/bin/mysqld(Slave_worker::retry_transaction(unsigned int, unsigned long long, unsigned int, unsigned long long)+0xc3) [0x56520a906c7b] /home/sveta/build/ps-8.0/bin/mysqld(slave_worker_exec_job_group(Slave_worker*, Relay_log_info*)+0x4c9) [0x56520a9091ea] /home/sveta/build/ps-8.0/bin/mysqld(+0x462d60d) [0x56520a92360d] /home/sveta/build/ps-8.0/bin/mysqld(+0x51495a1) [0x56520b43f5a1] /lib/x86_64-linux-gnu/libpthread.so.0(+0x9609) [0x7f7c95792609] /lib/x86_64-linux-gnu/libc.so.6(clone+0x43) [0x7f7c94f09103] Trying to get some variables. Some pointers may be invalid and cause the dump to abort. Query (0): Connection ID (thread ID): 48 Status: NOT_KILLED
Attachments
Issue Links
- relates to
-
PS-7197 Multi-threaded Replica hangs when slave_trans_retires gets exhausted
-
- Done
-