Uploaded image for project: 'Percona XtraDB Cluster'
  1. Percona XtraDB Cluster
  2. PXC-3020

Cluster goes down if a node is restarted during load

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Done
    • Priority: High
    • Resolution: Duplicate
    • Affects Version/s: 8.0.18-internal
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Start a 3 node cluster(8.0.18-26.4.3.1) in vagrant
      Run a load on all three nodes

      Node1:
      sysbench /usr/share/sysbench/tpcc.lua --mysql-db=test --mysql-user=root --time=30 --threads=64 --report-interval=1 --tables=3 --scale=5 --db-driver=mysql prepare
      Node2:
      sysbench /usr/share/sysbench/tpcc.lua --mysql-db=test2 --mysql-user=root --time=30 --threads=64 --report-interval=1 --tables=3 --scale=5 --db-driver=mysql prepare
      Node3:
      sysbench /usr/share/sysbench/tpcc.lua --mysql-db=test3 --mysql-user=root --time=30 --threads=64 --report-interval=1 --tables=3 --scale=5 --db-driver=mysql prepare

      Kill(kill -9) the mysqld process on node3
      Start the mysqld server on node3 using sudo systemctl start mysql. The server starts but goes down.
      Node3 logs display:

      2020-02-28T14:41:21.885811Z 0 [ERROR] [MY-000000] [Galera] failed to open gcomm backend connection: 110: failed to reach primary view (pc.wait_prim_timeout): 110 (Connection timed out)
       at gcomm/src/pc.cpp:connect():159
      2020-02-28T14:41:21.885853Z 0 [ERROR] [MY-000000] [Galera] gcs/src/gcs_core.cpp:gcs_core_open():220: Failed to open backend connection: -110 (Connection timed out)
      2020-02-28T14:41:21.886179Z 0 [ERROR] [MY-000000] [Galera] gcs/src/gcs.cpp:gcs_open():1694: Failed to open channel 'cluster_one' at 'gcomm://192.168.100.10,192.168.100.20,192.168.100.30': -110 (Connection timed out)
      2020-02-28T14:41:21.886201Z 0 [ERROR] [MY-000000] [Galera] gcs connect failed: Connection timed out
      2020-02-28T14:41:21.886223Z 0 [ERROR] [MY-000000] [WSREP] Provider/Node (gcomm://192.168.100.10,192.168.100.20,192.168.100.30) failed to establish connection with cluster (reason: 7)
      2020-02-28T14:41:21.886349Z 0 [ERROR] [MY-010119] [Server] Aborting
      2020-02-28T14:41:21.886431Z 0 [Note] [MY-010120] [Server] Binlog end

      Node1 server stops processing the load:

      mysql> show global status like 'wsrep_local_state_comment';
      +---------------------------+-------------+
      | Variable_name | Value |
      +---------------------------+-------------+
      | wsrep_local_state_comment | Initialized |
      +---------------------------+-------------+
      mysql> select * from mysql.wsrep_cluster_members;
      ERROR 1047 (08S01): WSREP has not yet prepared node for application use

      Node1 logs display:

      2020-02-28T14:41:27.763094Z 0 [Note] [MY-000000] [Galera] Received NON-PRIMARY.
      2020-02-28T14:41:27.763137Z 2 [Note] [MY-000000] [Galera] ####### processing CC -1, local, ordered
      2020-02-28T14:41:27.763158Z 2 [Note] [MY-000000] [Galera] ####### My UUID: eab0c960-5a34-11ea-915b-12cf86275d03
      2020-02-28T14:41:27.763169Z 2 [Note] [MY-000000] [Galera] ####### ST not required
      2020-02-28T14:41:27.763191Z 2 [Note] [MY-000000] [Galera] ================================================
      View:
       id: eab19b21-5a34-11ea-a691-7a377f1f506a:-1
       status: non-primary
       protocol_version: 4
       capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
       final: no
       own_index: 1
       members(2):
       0: 726fd247-5a35-11ea-bc59-3e4f26152d22, unspecified
       1: eab0c960-5a34-11ea-915b-12cf86275d03, node1
      =================================================
      2020-02-28T14:41:27.763203Z 2 [Note] [MY-000000] [Galera] Non-primary view
      2020-02-28T14:41:27.763213Z 2 [Note] [MY-000000] [Galera] server node1 state change: connected -> connected
      2020-02-28T14:41:27.763222Z 2 [Note] [MY-000000] [WSREP] Server status change connected -> connected
      2020-02-28T14:41:27.763235Z 2 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.
      2020-02-28T14:42:07.755852Z 0 [Note] [MY-000000] [Galera] (eab0c960, 'tcp://0.0.0.0:4567') reconnecting to 50d0f5e9 (tcp://192.168.100.30:4567), attempt 30
      2020-02-28T14:42:52.271723Z 0 [Note] [MY-000000] [Galera] (eab0c960, 'tcp://0.0.0.0:4567') reconnecting to 50d0f5e9 (tcp://192.168.100.30:4567), attempt 60
      ...

      Node2 server also stops processing the load:

      mysql> show global status like 'wsrep_local_state_comment';
      +---------------------------+-------------+
      | Variable_name | Value |
      +---------------------------+-------------+
      | wsrep_local_state_comment | Initialized |
      +---------------------------+-------------+
      mysql> select * from mysql.wsrep_cluster_members;
      ERROR 1047 (08S01): WSREP has not yet prepared node for application use

      Node2 logs display:

      2020-02-28T14:41:27.757451Z 0 [Note] [MY-000000] [Galera] New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 2
      2020-02-28T14:41:27.757561Z 0 [Note] [MY-000000] [Galera] Flow-control interval: [141, 141]
      2020-02-28T14:41:27.757576Z 0 [Note] [MY-000000] [Galera] Received NON-PRIMARY.
      2020-02-28T14:41:27.757621Z 20 [Note] [MY-000000] [Galera] ####### processing CC -1, local, ordered
      2020-02-28T14:41:27.757645Z 20 [Note] [MY-000000] [Galera] ####### My UUID: 726fd247-5a35-11ea-bc59-3e4f26152d22
      2020-02-28T14:41:27.757655Z 20 [Note] [MY-000000] [Galera] ####### ST not required
      2020-02-28T14:41:27.757680Z 20 [Note] [MY-000000] [Galera] ================================================
      View:
       id: eab19b21-5a34-11ea-a691-7a377f1f506a:-1
       status: non-primary
       protocol_version: 4
       capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
       final: no
       own_index: 0
       members(2):
       0: 726fd247-5a35-11ea-bc59-3e4f26152d22, node2
       1: eab0c960-5a34-11ea-915b-12cf86275d03, unspecified
      =================================================
      2020-02-28T14:41:27.757691Z 20 [Note] [MY-000000] [Galera] Non-primary view
      2020-02-28T14:41:27.757702Z 20 [Note] [MY-000000] [Galera] server node2 state change: connected -> connected
      2020-02-28T14:41:27.757711Z 20 [Note] [MY-000000] [WSREP] Server status change connected -> connected
      2020-02-28T14:41:27.757726Z 20 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.
      2020-02-28T14:42:07.860595Z 0 [Note] [MY-000000] [Galera] (726fd247, 'tcp://0.0.0.0:4567') reconnecting to 50d0f5e9 (tcp://192.168.100.30:4567), attempt 30
      2020-02-28T14:42:52.373421Z 0 [Note] [MY-000000] [Galera] (726fd247, 'tcp://0.0.0.0:4567') reconnecting to 50d0f5e9 (tcp://192.168.100.30:4567), attempt 60
      ...

      The whole cluster becomes unavailable.

        Smart Checklist

          Attachments

            Issue Links

              Activity

                People

                Assignee:
                kenn.takara Kenn Takara
                Reporter:
                manish.chawla Manish Chawla
                Votes:
                1 Vote for this issue
                Watchers:
                4 Start watching this issue

                  Dates

                  Created:
                  Updated:
                  Resolved:

                    Time Tracking

                    Estimated:
                    Original Estimate - Not Specified
                    Not Specified
                    Remaining:
                    Remaining Estimate - Not Specified
                    Not Specified
                    Logged:
                    Time Spent - 4 hours, 30 minutes
                    4h 30m