Uploaded image for project: 'Percona XtraDB Cluster'
  1. Percona XtraDB Cluster
  2. PXC-3936

state transfer with SSL disabled in wsrep_provider_options crashes Receiver and Donor node

Details

    • Bug
    • Status: Done
    • Medium
    • Resolution: Fixed
    • Not 5.7.x, 8.0.30-22 (Q3 2022)
    • 8.0.32-24 (Q1 2023)
    • None
    • Ubuntu 20

    • Yes

    Description

      Hey,

      When setting socket.ssl = NO in wsrep_provider_options without explicitly specifying if SSL is used in the [SST] part of my.cnf, not only will the state transfer fail, but it will crash the donor node as well as the receiver node.

      The respective parts of  my.cnf

      [mysqld]
      wsrep_provider_options = 'gcache.size=8G;cert.log_conflicts=YES;gmcast.segment=2;socket.ssl = NO;socket.ssl_cipher=AES128-SHA'
      
      
      [sst]
      sst-syslog=1
      tca=/etc/mysql/certs/ca.pem
      tcert=/etc/mysql/certs/ca.pem
      tkey=/etc/mysql/certs/server-key.pem
      

      Logs from the donor:

      2022-05-07T11:08:19.714667Z 0 [Note] [MY-000000] [Galera] Member 0.2 (receiver.mydomain.com) requested state transfer from '*any*'. Selected 2.2 (donor.mydomain.com)(SYNCED) as donor.
      2022-05-07T11:08:19.715002Z 0 [Note] [MY-000000] [Galera] Shifting SYNCED -> DONOR/DESYNCED (TO: 18)
      2022-05-07T11:08:19.715214Z 1 [Note] [MY-000000] [Galera] Detected STR version: 1, req_len: 129, req: STRv1
      2022-05-07T11:08:19.715443Z 1 [Note] [MY-000000] [Galera] Cert index preload: 18 -> 18
      2022-05-07T11:08:19.716135Z 1 [Note] [MY-000000] [WSREP] Initiating SST cancellation
      11:08:19 UTC - mysqld got signal 11 ;
      Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
      
      Build ID: 2165eff2f1909b2f032b76b423382ec097755ae3
      Server Version: 8.0.27-18.1 Percona XtraDB Cluster (GPL), Release rel18, Revision ac35177, WSREP version 26.4.3, wsrep_26.4.3
      
      Thread pointer: 0x7efd20000b60
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 7efd28ee9d80 thread_stack 0x100000
      /usr/sbin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x41) [0x559a8aa39ea1]
      /usr/sbin/mysqld(handle_fatal_signal+0x393) [0x559a89a58f63]
      /lib/x86_64-linux-gnu/libpthread.so.0(+0x143c0) [0x7efd789913c0]
      /usr/lib/galera4/libgalera_smm.so(+0xaee83) [0x7efd6c1c4e83]
      /usr/lib/galera4/libgalera_smm.so(+0xa2200) [0x7efd6c1b8200]
      /usr/lib/galera4/libgalera_smm.so(+0xa5c16) [0x7efd6c1bbc16]
      /usr/lib/galera4/libgalera_smm.so(+0x1f43dd) [0x7efd6c30a3dd]
      /usr/lib/galera4/libgalera_smm.so(+0x1f48be) [0x7efd6c30a8be]
      /usr/lib/galera4/libgalera_smm.so(+0x2247c0) [0x7efd6c33a7c0]
      /usr/lib/galera4/libgalera_smm.so(+0x22e7c8) [0x7efd6c3447c8]
      /usr/lib/galera4/libgalera_smm.so(+0x1efc28) [0x7efd6c305c28]
      /usr/lib/galera4/libgalera_smm.so(+0x1efe02) [0x7efd6c305e02]
      /usr/lib/galera4/libgalera_smm.so(+0x215443) [0x7efd6c32b443]
      /usr/lib/galera4/libgalera_smm.so(+0x233282) [0x7efd6c349282]
      /usr/sbin/mysqld(wsrep::wsrep_provider_v26::run_applier(wsrep::high_priority_service*)+0x12) [0x559a8b1f86b2]
      /usr/sbin/mysqld(+0x14813e8) [0x559a89aa83e8]
      /usr/sbin/mysqld(start_wsrep_THD+0x359) [0x559a89780629]
      /usr/sbin/mysqld(+0x29a0fc1) [0x559a8afc7fc1]
      /lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7efd78985609]
      /lib/x86_64-linux-gnu/libc.so.6(clone+0x43) [0x7efd78145163]
      
      Trying to get some variables.
      Some pointers may be invalid and cause the dump to abort.
      Query (0): Connection ID (thread ID): 1
      Status: NOT_KILLED
      
      You may download the Percona XtraDB Cluster operations manual by visiting
      http://www.percona.com/software/percona-xtradb-cluster/. You may find information
      in the manual which will help you identify the cause of the crash.
      

      Logs of the receiver:

      2022-05-07T10:37:30.578584Z 1 [Note] [MY-000000] [Galera] State transfer required:
      	Group state: ce384ef2-cd80-11ec-9112-e351ac0863d2:11
      	Local state: 11d34f37-cdf1-11ec-a4cc-f6f7eba2c6cf:68
      2022-05-07T10:37:30.578634Z 1 [Note] [MY-000000] [WSREP] Server status change connected -> joiner
      2022-05-07T10:37:30.578685Z 1 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.
      2022-05-07T10:37:30.578854Z 0 [Note] [MY-000000] [WSREP] Initiating SST/IST transfer on JOINER side (wsrep_sst_xtrabackup-v2 --role 'joiner' --address '<receiverIP>' --datadir '/var/lib/mysql/' --basedir '/usr/' --plugindir '/usr/lib/mysql/plugin/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --parent '60917' --mysqld-version '8.0.27-18.1'  --binlog '/var/lib/mysql/mysql-bin' )
      2022-05-07T10:37:31.637723Z 1 [Note] [MY-000000] [WSREP] Prepared SST request: xtrabackup-v2|10.16.64.92:4444/xtrabackup_sst//1
      2022-05-07T10:37:31.637929Z 1 [Note] [MY-000000] [Galera] Check if state gap can be serviced using IST
      2022-05-07T10:37:31.638013Z 1 [Note] [MY-000000] [Galera] Local UUID: 11d34f37-cdf1-11ec-a4cc-f6f7eba2c6cf != Group UUID: ce384ef2-cd80-11ec-9112-e351ac0863d2
      2022-05-07T10:37:31.638112Z 1 [Note] [MY-000000] [Galera] ####### IST uuid:11d34f37-cdf1-11ec-a4cc-f6f7eba2c6cf f: 0, l: 11, STRv: 3
      2022-05-07T10:37:31.638272Z 1 [Note] [MY-000000] [Galera] IST receiver addr using ssl://<receiverIP>:4568
      2022-05-07T10:37:31.638386Z 1 [Note] [MY-000000] [Galera] IST receiver using ssl
      2022-05-07T10:37:31.638569Z 1 [Note] [MY-000000] [Galera] Prepared IST receiver for 0-11, listening at: ssl://<receiverIP>:4568
      2022-05-07T10:37:31.642303Z 0 [Note] [MY-000000] [Galera] Member 2.2 (receiver.mydomain.com) requested state transfer from '*any*'. Selected 0.2 (donor.mydomain.com)(SYNCED) as donor.
      2022-05-07T10:37:31.642422Z 0 [Note] [MY-000000] [Galera] Shifting PRIMARY -> JOINER (TO: 11)
      2022-05-07T10:37:31.642515Z 1 [Note] [MY-000000] [Galera] Requesting state transfer: success, donor: 0
      2022-05-07T10:37:31.643064Z 1 [Note] [MY-000000] [Galera] Resetting GCache seqno map due to different histories.
      2022-05-07T10:37:31.643135Z 1 [Note] [MY-000000] [Galera] GCache history reset: 11d34f37-cdf1-11ec-a4cc-f6f7eba2c6cf:68 -> ce384ef2-cd80-11ec-9112-e351ac0863d2:11
      2022-05-07T10:37:31.643294Z 0 [Note] [MY-000000] [WSREP] Initiating SST cancellation
      2022-05-07T10:37:31.643362Z 0 [Note] [MY-000000] [WSREP] Terminating SST process
      2022-05-07T10:37:31.645039Z 0 [ERROR] [MY-000000] [WSREP-SST] Removing /var/lib/mysql//xtrabackup_galera_info file due to signal
      2022-05-07T10:37:31.645564Z 1 [Note] [MY-000000] [Galera] GCache DEBUG: RingBuffer::seqno_reset(): discarded 17808 bytes
      2022-05-07T10:37:31.645631Z 1 [Note] [MY-000000] [Galera] GCache DEBUG: RingBuffer::seqno_reset(): found 1/2 locked buffers
      10:37:31 UTC - mysqld got signal 11 ;
      Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
      
      Build ID: 2165eff2f1909b2f032b76b423382ec097755ae3
      Server Version: 8.0.27-18.1 Percona XtraDB Cluster (GPL), Release rel18, Revision ac35177, WSREP version 26.4.3, wsrep_26.4.3
      
      Thread pointer: 0x0
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      2022-05-07T10:37:31.647815Z 0 [ERROR] [MY-000000] [WSREP-SST] ******************* FATAL ERROR **********************
      2022-05-07T10:37:31.647898Z 0 [ERROR] [MY-000000] [WSREP-SST] SST script interrupted
      2022-05-07T10:37:31.647957Z 0 [ERROR] [MY-000000] [WSREP-SST] ******************* FATAL ERROR **********************
      2022-05-07T10:37:31.648005Z 0 [ERROR] [MY-000000] [WSREP-SST] Cleanup after exit with status:32
      stack_bottom = 0 thread_stack 0x100000
      2022-05-07T10:37:31.659201Z 0 [ERROR] [MY-000000] [WSREP] Process was aborted.
      2022-05-07T10:37:31.659310Z 0 [ERROR] [MY-000000] [WSREP] Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '<receiverIP>' --datadir '/var/lib/mysql/' --basedir '/usr/' --plugindir '/usr/lib/mysql/plugin/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --parent '60917' --mysqld-version '8.0.27-18.1'  --binlog '/var/lib/mysql/mysql-bin' : 2 (No such file or directory)
      2022-05-07T10:37:31.659378Z 0 [ERROR] [MY-000000] [WSREP] Failed to read uuid:seqno from joiner script.
      2022-05-07T10:37:31.659428Z 0 [ERROR] [MY-000000] [WSREP] SST script aborted with error 2 (No such file or directory)
      2022-05-07T10:37:31.659538Z 3 [Note] [MY-000000] [Galera] Processing SST received
      2022-05-07T10:37:31.659599Z 3 [Note] [MY-000000] [Galera] SST received: 00000000-0000-0000-0000-000000000000:-1
      2022-05-07T10:37:31.659655Z 3 [System] [MY-000000] [WSREP] SST completed
      2022-05-07T10:37:31.659814Z 1 [Note] [MY-000000] [Galera]  str_proto_ver_: 3 sst_seqno_: -1 cc_seqno: 11 req->ist_len(): 66
      2022-05-07T10:37:31.659882Z 1 [ERROR] [MY-000000] [Galera] Application received wrong state:
      	Received: 00000000-0000-0000-0000-000000000000
      	Required: ce384ef2-cd80-11ec-9112-e351ac0863d2
      2022-05-07T10:37:31.659934Z 1 [ERROR] [MY-000000] [Galera] Application state transfer failed. This is unrecoverable condition, restart required.
      2022-05-07T10:37:31.660014Z 1 [Note] [MY-000000] [Galera] ReplicatorSMM::abort()
      2022-05-07T10:37:31.660065Z 1 [Note] [MY-000000] [Galera] Closing send monitor...
      2022-05-07T10:37:31.660217Z 1 [Note] [MY-000000] [Galera] Closed send monitor.
      2022-05-07T10:37:31.660271Z 1 [Note] [MY-000000] [Galera] gcomm: terminating thread
      2022-05-07T10:37:31.660325Z 1 [Note] [MY-000000] [Galera] gcomm: joining thread
      /usr/sbin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x41) [0x559ac9be2ea1]
      2022-05-07T10:37:31.661689Z 1 [Note] [MY-000000] [Galera] gcomm: closing backend
      /usr/sbin/mysqld(handle_fatal_signal+0x393) [0x559ac8c01f63]
      /lib/x86_64-linux-gnu/libpthread.so.0(+0x143c0) [0x7f0a2ba283c0]
      /usr/lib/galera4/libgalera_smm.so(+0xaee83) [0x7f0a1f25be83]
      /usr/lib/galera4/libgalera_smm.so(+0xa2200) [0x7f0a1f24f200]
      /usr/lib/galera4/libgalera_smm.so(+0xa3adf) [0x7f0a1f250adf]
      /usr/lib/galera4/libgalera_smm.so(+0x1f824b) [0x7f0a1f3a524b]
      /usr/lib/galera4/libgalera_smm.so(+0x1fa0a6) [0x7f0a1f3a70a6]
      /lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7f0a2ba1c609]
      /lib/x86_64-linux-gnu/libc.so.6(clone+0x43) [0x7f0a2b1dc163]
      You may download the Percona XtraDB Cluster operations manual by visiting
      http://www.percona.com/software/percona-xtradb-cluster/. You may find information
      in the manual which will help you identify the cause of the crash.
      

      To us it looks like the cause is:

      • When setting socket.ssl = NO, the donor does not expect SSL Traffic in state transfer connections.
      • But unless explicitly configuring [SST] to not use SSL, it will use SSL.
      • This makes sent/received packages unreadable for both ends
      • This causes crashes on the donor and the receiver.

      On a three node cluster, where

      • Node 3 dropped out and tried to re-join with an automatic state transfer from Node 2
      • Node 3 will crash and hang
      • Node 2 will also crash and hang
      • Node 1 will be left alone and therefore transition into Non-Primary.

      How we would it expect to behave:

      • Error handling on sender and receiver side
      • Maybe even explicitly detecting misalignment on SSL-config and print an error hinting to this
      • but definitely not crashing MySQL.

      Attachments

        Issue Links

          Activity

            People

              amonar Anton Matvienko
              Izzy Isobel Smith
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Smart Checklist