Affects Version/s: None
Fix Version/s: 5.7.22-22
**Reported in Launchpad by Kenny Gryp last update 19-01-2018 16:24:54
The problem that we are is facing is similar to what is described in https://bugs.mysql.com/bug.php?id=85447, which has been fixed in 5.7.18 (and the test case no longer fails).
This is what happens:
There are 2 XA Transactions that deadlock which causes replication to fail with the error:
There are several ongoing XA Transactions at the same time.
When we look at the locks that are being held, we see:
There is a S lock on a supremum pseudo-record in one XA transaction which 'conflicts' with the X lock another XA transaction wants to take.
How does this happen?
What happens on the original master is as follows...
We have the following rows: ...,80000,100000,...
row 80000 is on for example innodb page 99 and row 100000 is on page 100. There is still some space available to add a row to page 99.
make sure the slave is using log_slave_updates & binlog_format=mixed
You can simulate it like this:
Now depending on binlog_format and tx_isolation, you will get different results:
All this is to be expected. (This basically means always use RBR with XA)
I have tried everything to be able to reproduce this issue the customer is facing and am not able to reproduce this yet
The customer is using binlog_format=MIXED with tx_isolation=READ COMMITTED, the events are RBR, so that test case above does not fail, but the workload of the customer which looks similar does fail.
You can see with verbose locks in `SHOW ENGINE INNODB STATUS` that there is an S lock on a row in page 938216, and then an supremum S lock on page 938215 :
It must be that somehow replication is putting an S lock on a supremum record, even though the TRX is in READ-COMMITTED. I do not yet know when this happens.