Details
-
Improvement
-
Status: Done
-
High
-
Resolution: Fixed
-
8.0.x
-
None
-
Yes
-
Yes
Description
MODIFIED TO SPLIT INTO 2 TICKETS - One for each parameter
Number of cases were reported where PXC nodes are failing with "WSREP: Node consistency compromised, aborting" message when in fact there is no data inconsistency confirmed, but foreign key relationships are present as well as parallel apply (wsrep_slave_threads>0).
Typical case is when a child table update/delete/insert fails as if the dependent row on the parent table wasn't yet update. While in the same time, it is seen that the writer node did the writes in proper order and the failed secondary node doesn't seem to have the data issue.
For instance, for error type of "[ERROR] Slave SQL: Could not execute Update_rows event on table", it was seen that the said table was a parent of referenced child with ON DELETE CASCADE clause.
Or another example: "[ERROR] Slave SQL: Could not execute Delete_rows event on table (...); Cannot delete or update a parent row: a foreign key constraint fails " when more complex relationships exist.
All those cases, where real data inconsistency between the nodes is not the case, lead to a conclusion that the default algorithms in how parallel applying and certification is done are not safe for workloads with foreign key constraints.
Therefore, the right solution appears to be to change the defaults to more safe ones, that is:
wsrep_provider_options="cert.optimistic_pa=NO"Or, maybe if possible - change the defaults only when FK keys are used in the cluster.
Both are not described in Percona docs, hence https://jira.percona.com/browse/PXC-2958
An example possibly related report: https://jira.percona.com/browse/PXC-1863