Uploaded image for project: 'Percona Server for MySQL'
  1. Percona Server for MySQL
  2. PS-1494

LP #1308016: Duplicate UK values in READ-COMMITTED (again)



    • Type: Bug
    • Status: Done
    • Priority: High
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None


      **Reported in Launchpad by Przemek last update 18-01-2018 08:29:32

      On a table having PK and UK keys defined, it is possible to crash nodes with consistency errors or lock whole cluster for writes.
      This is a result of InnoDB behaviour as reported in upstream MySQL bug: bugs.mysql.com/bug.php?id=69979

      The good thing is that it is harder to break Galera cluster under the same conditions then normal asynchronous replication. For example, I am not able to break PXC cluster with just two concurrent sessions where the p1() procedure from Kevin Lewis' test case is run. But with 3 or more, running on the same node, PXC crashes due to consistency compromised. This happens faster in read-committed isolation level, but happens also with repeatable-read. The result error is like this:

      2014-04-15 13:19:53 9570 [ERROR] Slave SQL: Could not execute Write_rows event on table test.t1; Duplicate entry '1' for key 'a', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 219, Error_code: 1062
      2014-04-15 13:19:53 9570 [Warning] WSREP: RBR event 4 Write_rows apply warning: 121, 649369
      2014-04-15 13:19:53 9570 [ERROR] WSREP: Failed to apply trx: source: fe30ab01-c48a-11e3-95e6-933a2f300241 version: 3 local: 0 state: APPLYING flags: 1 conn_id: 5 trx_id: 2442782 seqnos (l: 302939, g: 649369, s: 649368, d: 649368, ts: 6527240877214333)
      2014-04-15 13:19:53 9570 [ERROR] WSREP: Failed to apply trx 649369 4 times
      2014-04-15 13:19:53 9570 [ERROR] WSREP: Node consistency compromized, aborting...

      When I call the procedure from two different nodes in repeatable-read - the cluster gets locked for writes and the only way to fix is to kill -9 mysqld on one of the nodes. Example hanged state:

      percona1 mysql> show processlist;

      Id User Host db Command Time State Info Rows_sent Rows_examined


      1 system user   NULL Sleep 3905 wsrep aborter idle NULL 0 0
      2 system user   NULL Sleep 508 System lock NULL 0 0
      3 system user   NULL Sleep 508 committed 356488 NULL 0 0
      4 cmon percona5:48670 NULL Sleep 79   NULL 1 1
      8 root localhost test Killed 266 wsrep in pre-commit stage insert into t1 values (22,22,22,22,22,22) 0 0
      9 root localhost test Killed 508 wsrep in pre-commit stage start transaction 0 16
      15 root localhost test Query 0 init show processlist 0 0

      7 rows in set (0.00 sec)

      Unfortunately upstream bug is marked as "not a bug", but maybe there is a way to fix that in Galera replication?




            lpjirasync lpjirasync (Inactive)
            0 Vote for this issue
            1 Start watching this issue



                Smart Checklist