Details
-
Bug
-
Status: Done
-
Medium
-
Resolution: Fixed
-
Not 5.7.x, 8.0.30-22 (Q3 2022)
-
None
-
Ubuntu 20
-
Yes
Description
Hey,
When setting socket.ssl = NO in wsrep_provider_options without explicitly specifying if SSL is used in the [SST] part of my.cnf, not only will the state transfer fail, but it will crash the donor node as well as the receiver node.
The respective parts of my.cnf
[mysqld]
wsrep_provider_options = 'gcache.size=8G;cert.log_conflicts=YES;gmcast.segment=2;socket.ssl = NO;socket.ssl_cipher=AES128-SHA'
[sst]
sst-syslog=1
tca=/etc/mysql/certs/ca.pem
tcert=/etc/mysql/certs/ca.pem
tkey=/etc/mysql/certs/server-key.pem
Logs from the donor:
2022-05-07T11:08:19.714667Z 0 [Note] [MY-000000] [Galera] Member 0.2 (receiver.mydomain.com) requested state transfer from '*any*'. Selected 2.2 (donor.mydomain.com)(SYNCED) as donor. 2022-05-07T11:08:19.715002Z 0 [Note] [MY-000000] [Galera] Shifting SYNCED -> DONOR/DESYNCED (TO: 18) 2022-05-07T11:08:19.715214Z 1 [Note] [MY-000000] [Galera] Detected STR version: 1, req_len: 129, req: STRv1 2022-05-07T11:08:19.715443Z 1 [Note] [MY-000000] [Galera] Cert index preload: 18 -> 18 2022-05-07T11:08:19.716135Z 1 [Note] [MY-000000] [WSREP] Initiating SST cancellation 11:08:19 UTC - mysqld got signal 11 ; Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware. Build ID: 2165eff2f1909b2f032b76b423382ec097755ae3 Server Version: 8.0.27-18.1 Percona XtraDB Cluster (GPL), Release rel18, Revision ac35177, WSREP version 26.4.3, wsrep_26.4.3 Thread pointer: 0x7efd20000b60 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 7efd28ee9d80 thread_stack 0x100000 /usr/sbin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x41) [0x559a8aa39ea1] /usr/sbin/mysqld(handle_fatal_signal+0x393) [0x559a89a58f63] /lib/x86_64-linux-gnu/libpthread.so.0(+0x143c0) [0x7efd789913c0] /usr/lib/galera4/libgalera_smm.so(+0xaee83) [0x7efd6c1c4e83] /usr/lib/galera4/libgalera_smm.so(+0xa2200) [0x7efd6c1b8200] /usr/lib/galera4/libgalera_smm.so(+0xa5c16) [0x7efd6c1bbc16] /usr/lib/galera4/libgalera_smm.so(+0x1f43dd) [0x7efd6c30a3dd] /usr/lib/galera4/libgalera_smm.so(+0x1f48be) [0x7efd6c30a8be] /usr/lib/galera4/libgalera_smm.so(+0x2247c0) [0x7efd6c33a7c0] /usr/lib/galera4/libgalera_smm.so(+0x22e7c8) [0x7efd6c3447c8] /usr/lib/galera4/libgalera_smm.so(+0x1efc28) [0x7efd6c305c28] /usr/lib/galera4/libgalera_smm.so(+0x1efe02) [0x7efd6c305e02] /usr/lib/galera4/libgalera_smm.so(+0x215443) [0x7efd6c32b443] /usr/lib/galera4/libgalera_smm.so(+0x233282) [0x7efd6c349282] /usr/sbin/mysqld(wsrep::wsrep_provider_v26::run_applier(wsrep::high_priority_service*)+0x12) [0x559a8b1f86b2] /usr/sbin/mysqld(+0x14813e8) [0x559a89aa83e8] /usr/sbin/mysqld(start_wsrep_THD+0x359) [0x559a89780629] /usr/sbin/mysqld(+0x29a0fc1) [0x559a8afc7fc1] /lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7efd78985609] /lib/x86_64-linux-gnu/libc.so.6(clone+0x43) [0x7efd78145163] Trying to get some variables. Some pointers may be invalid and cause the dump to abort. Query (0): Connection ID (thread ID): 1 Status: NOT_KILLED You may download the Percona XtraDB Cluster operations manual by visiting http://www.percona.com/software/percona-xtradb-cluster/. You may find information in the manual which will help you identify the cause of the crash.
Logs of the receiver:
2022-05-07T10:37:30.578584Z 1 [Note] [MY-000000] [Galera] State transfer required: Group state: ce384ef2-cd80-11ec-9112-e351ac0863d2:11 Local state: 11d34f37-cdf1-11ec-a4cc-f6f7eba2c6cf:68 2022-05-07T10:37:30.578634Z 1 [Note] [MY-000000] [WSREP] Server status change connected -> joiner 2022-05-07T10:37:30.578685Z 1 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification. 2022-05-07T10:37:30.578854Z 0 [Note] [MY-000000] [WSREP] Initiating SST/IST transfer on JOINER side (wsrep_sst_xtrabackup-v2 --role 'joiner' --address '<receiverIP>' --datadir '/var/lib/mysql/' --basedir '/usr/' --plugindir '/usr/lib/mysql/plugin/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --parent '60917' --mysqld-version '8.0.27-18.1' --binlog '/var/lib/mysql/mysql-bin' ) 2022-05-07T10:37:31.637723Z 1 [Note] [MY-000000] [WSREP] Prepared SST request: xtrabackup-v2|10.16.64.92:4444/xtrabackup_sst//1 2022-05-07T10:37:31.637929Z 1 [Note] [MY-000000] [Galera] Check if state gap can be serviced using IST 2022-05-07T10:37:31.638013Z 1 [Note] [MY-000000] [Galera] Local UUID: 11d34f37-cdf1-11ec-a4cc-f6f7eba2c6cf != Group UUID: ce384ef2-cd80-11ec-9112-e351ac0863d2 2022-05-07T10:37:31.638112Z 1 [Note] [MY-000000] [Galera] ####### IST uuid:11d34f37-cdf1-11ec-a4cc-f6f7eba2c6cf f: 0, l: 11, STRv: 3 2022-05-07T10:37:31.638272Z 1 [Note] [MY-000000] [Galera] IST receiver addr using ssl://<receiverIP>:4568 2022-05-07T10:37:31.638386Z 1 [Note] [MY-000000] [Galera] IST receiver using ssl 2022-05-07T10:37:31.638569Z 1 [Note] [MY-000000] [Galera] Prepared IST receiver for 0-11, listening at: ssl://<receiverIP>:4568 2022-05-07T10:37:31.642303Z 0 [Note] [MY-000000] [Galera] Member 2.2 (receiver.mydomain.com) requested state transfer from '*any*'. Selected 0.2 (donor.mydomain.com)(SYNCED) as donor. 2022-05-07T10:37:31.642422Z 0 [Note] [MY-000000] [Galera] Shifting PRIMARY -> JOINER (TO: 11) 2022-05-07T10:37:31.642515Z 1 [Note] [MY-000000] [Galera] Requesting state transfer: success, donor: 0 2022-05-07T10:37:31.643064Z 1 [Note] [MY-000000] [Galera] Resetting GCache seqno map due to different histories. 2022-05-07T10:37:31.643135Z 1 [Note] [MY-000000] [Galera] GCache history reset: 11d34f37-cdf1-11ec-a4cc-f6f7eba2c6cf:68 -> ce384ef2-cd80-11ec-9112-e351ac0863d2:11 2022-05-07T10:37:31.643294Z 0 [Note] [MY-000000] [WSREP] Initiating SST cancellation 2022-05-07T10:37:31.643362Z 0 [Note] [MY-000000] [WSREP] Terminating SST process 2022-05-07T10:37:31.645039Z 0 [ERROR] [MY-000000] [WSREP-SST] Removing /var/lib/mysql//xtrabackup_galera_info file due to signal 2022-05-07T10:37:31.645564Z 1 [Note] [MY-000000] [Galera] GCache DEBUG: RingBuffer::seqno_reset(): discarded 17808 bytes 2022-05-07T10:37:31.645631Z 1 [Note] [MY-000000] [Galera] GCache DEBUG: RingBuffer::seqno_reset(): found 1/2 locked buffers 10:37:31 UTC - mysqld got signal 11 ; Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware. Build ID: 2165eff2f1909b2f032b76b423382ec097755ae3 Server Version: 8.0.27-18.1 Percona XtraDB Cluster (GPL), Release rel18, Revision ac35177, WSREP version 26.4.3, wsrep_26.4.3 Thread pointer: 0x0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... 2022-05-07T10:37:31.647815Z 0 [ERROR] [MY-000000] [WSREP-SST] ******************* FATAL ERROR ********************** 2022-05-07T10:37:31.647898Z 0 [ERROR] [MY-000000] [WSREP-SST] SST script interrupted 2022-05-07T10:37:31.647957Z 0 [ERROR] [MY-000000] [WSREP-SST] ******************* FATAL ERROR ********************** 2022-05-07T10:37:31.648005Z 0 [ERROR] [MY-000000] [WSREP-SST] Cleanup after exit with status:32 stack_bottom = 0 thread_stack 0x100000 2022-05-07T10:37:31.659201Z 0 [ERROR] [MY-000000] [WSREP] Process was aborted. 2022-05-07T10:37:31.659310Z 0 [ERROR] [MY-000000] [WSREP] Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '<receiverIP>' --datadir '/var/lib/mysql/' --basedir '/usr/' --plugindir '/usr/lib/mysql/plugin/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --parent '60917' --mysqld-version '8.0.27-18.1' --binlog '/var/lib/mysql/mysql-bin' : 2 (No such file or directory) 2022-05-07T10:37:31.659378Z 0 [ERROR] [MY-000000] [WSREP] Failed to read uuid:seqno from joiner script. 2022-05-07T10:37:31.659428Z 0 [ERROR] [MY-000000] [WSREP] SST script aborted with error 2 (No such file or directory) 2022-05-07T10:37:31.659538Z 3 [Note] [MY-000000] [Galera] Processing SST received 2022-05-07T10:37:31.659599Z 3 [Note] [MY-000000] [Galera] SST received: 00000000-0000-0000-0000-000000000000:-1 2022-05-07T10:37:31.659655Z 3 [System] [MY-000000] [WSREP] SST completed 2022-05-07T10:37:31.659814Z 1 [Note] [MY-000000] [Galera] str_proto_ver_: 3 sst_seqno_: -1 cc_seqno: 11 req->ist_len(): 66 2022-05-07T10:37:31.659882Z 1 [ERROR] [MY-000000] [Galera] Application received wrong state: Received: 00000000-0000-0000-0000-000000000000 Required: ce384ef2-cd80-11ec-9112-e351ac0863d2 2022-05-07T10:37:31.659934Z 1 [ERROR] [MY-000000] [Galera] Application state transfer failed. This is unrecoverable condition, restart required. 2022-05-07T10:37:31.660014Z 1 [Note] [MY-000000] [Galera] ReplicatorSMM::abort() 2022-05-07T10:37:31.660065Z 1 [Note] [MY-000000] [Galera] Closing send monitor... 2022-05-07T10:37:31.660217Z 1 [Note] [MY-000000] [Galera] Closed send monitor. 2022-05-07T10:37:31.660271Z 1 [Note] [MY-000000] [Galera] gcomm: terminating thread 2022-05-07T10:37:31.660325Z 1 [Note] [MY-000000] [Galera] gcomm: joining thread /usr/sbin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x41) [0x559ac9be2ea1] 2022-05-07T10:37:31.661689Z 1 [Note] [MY-000000] [Galera] gcomm: closing backend /usr/sbin/mysqld(handle_fatal_signal+0x393) [0x559ac8c01f63] /lib/x86_64-linux-gnu/libpthread.so.0(+0x143c0) [0x7f0a2ba283c0] /usr/lib/galera4/libgalera_smm.so(+0xaee83) [0x7f0a1f25be83] /usr/lib/galera4/libgalera_smm.so(+0xa2200) [0x7f0a1f24f200] /usr/lib/galera4/libgalera_smm.so(+0xa3adf) [0x7f0a1f250adf] /usr/lib/galera4/libgalera_smm.so(+0x1f824b) [0x7f0a1f3a524b] /usr/lib/galera4/libgalera_smm.so(+0x1fa0a6) [0x7f0a1f3a70a6] /lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7f0a2ba1c609] /lib/x86_64-linux-gnu/libc.so.6(clone+0x43) [0x7f0a2b1dc163] You may download the Percona XtraDB Cluster operations manual by visiting http://www.percona.com/software/percona-xtradb-cluster/. You may find information in the manual which will help you identify the cause of the crash.
To us it looks like the cause is:
- When setting socket.ssl = NO, the donor does not expect SSL Traffic in state transfer connections.
- But unless explicitly configuring [SST] to not use SSL, it will use SSL.
- This makes sent/received packages unreadable for both ends
- This causes crashes on the donor and the receiver.
On a three node cluster, where
- Node 3 dropped out and tried to re-join with an automatic state transfer from Node 2
- Node 3 will crash and hang
- Node 2 will also crash and hang
- Node 1 will be left alone and therefore transition into Non-Primary.
How we would it expect to behave:
- Error handling on sender and receiver side
- Maybe even explicitly detecting misalignment on SSL-config and print an error hinting to this
- but definitely not crashing MySQL.