Uploaded image for project: 'Percona XtraDB Cluster'
  1. Percona XtraDB Cluster
  2. PXC-1990

LP #1698863: One node flapping makes whole cluster enter NON-PRIMARY states


    • Type: Bug
    • Status: On Hold
    • Priority: Low
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:


      **Reported in Launchpad by Przemek last update 20-06-2017 10:12:15

      When just one node has a flapping network connection, it sometimes causes the whole cluster to enter non-Primary state.
      Fully repeatable with default wsrep settings and 3 or 4 node cluster.
      Tested with PXC 5.7.18/Galera 3.20, using this example bash script on node pxc4 (

      1. cat flap.sh

      for i in


      ; do
      iptables -I OUTPUT -p all -d -j DROP
      sleep 5
      iptables -F
      sleep 3

      Example error logs from pxc4 and one of the other nodes in attachment.

      What is worrisome is that instead of cluster just expel the faulty node, before which cluster will just pause writes till timeouts are reached, it gets actually confused and enters non-primary state at some point, even though physical connection between the remaining three nodes is absolutely fine, like this:

      2017-06-19T13:22:21.158663Z 0 [Note] WSREP: (87f15ec8, 'tcp://') connection to peer 40019a16 with addr tcp:// timed out, no messages seen in PT3S
      2017-06-19T13:22:21.159054Z 0 [Note] WSREP: (87f15ec8, 'tcp://') turning message relay requesting on, nonlive peers: tcp://
      2017-06-19T13:22:21.364134Z 0 [Note] WSREP: Current view of cluster as seen by this node
      view (view_id(NON_PRIM,0c4b3f62,202)

      { 87f15ec8,0 b7f30d85,0 }

      joined {
      left {

      { 0c4b3f62,0 40019a16,3 }

      2017-06-19T13:22:21.364252Z 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 2
      2017-06-19T13:22:21.364287Z 0 [Note] WSREP: Flow-control interval: [141, 141]
      2017-06-19T13:22:21.364304Z 0 [Note] WSREP: Received NON-PRIMARY.
      2017-06-19T13:22:21.364317Z 0 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 751686)
      2017-06-19T13:22:21.364392Z 2 [Note] WSREP: New cluster view: global state: 3efe5400-aa6d-11e6-b772-625f9abee4ba:751686, view# -1: non-Primary, number of nodes: 2, my index: 0, protocol version 3

        Smart Checklist




              • Assignee:
                lpjirasync lpjirasync (Inactive)
              • Votes:
                0 Vote for this issue
                2 Start watching this issue


                • Created: