Uploaded image for project: 'Percona XtraDB Cluster'
  1. Percona XtraDB Cluster
  2. PXC-1630

LP #1284803: Joining node connects and crashed under load | Assertion `meta->gtid.seqno == wsrep_thd_trx_seqno(thd)' failed.

    Details

      Description

      **Reported in Launchpad by shinguz last update 24-06-2014 18:20:54

      We have stopped one node in a clean cluster, started an import and after a while we restarted the node again during import. This node joined the cluster and crashes after a short time.

      We try to reproduce with core dump enabled.

      } partitioned {
      })
      140221 13:29:45 [Note] WSREP: gcomm: connected
      140221 13:29:45 [Note] WSREP: Changing maximum packet size to 64500,
      resulting msg size: 32636
      140221 13:29:45 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
      140221 13:29:45 [Note] WSREP: Opened channel 'Galera Dev Cluster'
      140221 13:29:45 [Note] WSREP: New COMPONENT: primary = yes,
      bootstrap = no, my_idx = 0, memb_num = 3
      140221 13:29:45 [Note] WSREP: Waiting for SST to complete.
      140221 13:29:45 [Note] WSREP: STATE_EXCHANGE: sent state UUID:
      3a734d78-9afc-11e3-975c-c7405de7027b
      140221 13:29:45 [Note] WSREP: STATE EXCHANGE: sent state msg:
      3a734d78-9afc-11e3-975c-c7405de7027b
      140221 13:29:45 [Note] WSREP: STATE EXCHANGE: got state msg:
      3a734d78-9afc-11e3-975c-c7405de7027b from 0 (Node C)
      140221 13:29:45 [Note] WSREP: STATE EXCHANGE: got state msg:
      3a734d78-9afc-11e3-975c-c7405de7027b from 1 (Node B)
      140221 13:29:45 [Note] WSREP: STATE EXCHANGE: got state msg:
      3a734d78-9afc-11e3-975c-c7405de7027b from 2 (Node A)
      140221 13:29:45 [Note] WSREP: Quorum results:
      version = 3,
      component = PRIMARY,
      conf_id = 10,
      members = 2/3 (joined/total),
      act_id = 175711,
      last_appl. = -1,
      protocols = 0/5/2 (gcs/repl/appl),
      group UUID = 646d078b-98a2-11e3-b7a6-93cfe86c89f3
      140221 13:29:45 [Note] WSREP: Flow-control interval: [28, 28]
      140221 13:29:45 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 175711)
      140221 13:29:45 [Note] WSREP: State transfer required:
      Group state: 646d078b-98a2-11e3-b7a6-93cfe86c89f3:175711
      Local state: 646d078b-98a2-11e3-b7a6-93cfe86c89f3:156159
      140221 13:29:45 [Note] WSREP: New cluster view: global state:
      646d078b-98a2-11e3-b7a6-93cfe86c89f3:175711, view# 11: Primary,
      number of nodes: 3, my index: 0, protocol version 2
      140221 13:29:45 [Warning] WSREP: Gap in state sequence. Need state
      transfer.
      140221 13:29:47 [Note] WSREP: Running: 'wsrep_sst_rsync --role
      'joiner' --address '10.10.17.151' --auth 'sst:secret' --datadir
      '/var/lib/mysql/datadir/' --defaults-file '/etc/mysql/my.cnf' –
      parent '5892''
      140221 13:29:47 [Note] WSREP: Prepared SST request:
      rsync|10.10.17.151:4444/rsync_sst
      140221 13:29:47 [Note] WSREP: wsrep_notify_cmd is not defined,
      skipping notification.
      140221 13:29:47 [Note] WSREP: REPL Protocols: 5 (3, 1)
      140221 13:29:47 [Note] WSREP: Assign initial position for
      certification: 175711, protocol version: 3
      140221 13:29:47 [Note] WSREP: Service thread queue flushed.
      140221 13:29:47 [Note] WSREP: Prepared IST receiver, listening at:
      tcp://10.10.17.151:4568
      140221 13:29:47 [Note] WSREP: Node 0.0 (Node C) requested state
      transfer from 'any'. Selected 1.0 (Node B)(SYNCED) as donor.
      140221 13:29:47 [Note] WSREP: Shifting PRIMARY -> JOINER (TO:
      175711)
      140221 13:29:47 [Note] WSREP: Requesting state transfer: success,
      donor: 1
      WSREP_SST: [INFO] Joiner cleanup. (20140221 13:29:48.482)
      WSREP_SST: [INFO] Joiner cleanup done. (20140221 13:29:48.988)
      140221 13:29:48 [Note] WSREP: SST complete, seqno: 156159
      140221 13:29:48 [Note] Plugin 'FEDERATED' is disabled.
      140221 13:29:48 InnoDB: The InnoDB memory heap is disabled
      140221 13:29:48 InnoDB: Mutexes and rw_locks use GCC atomic builtins
      140221 13:29:48 InnoDB: Compressed tables use zlib 1.2.3.3
      140221 13:29:48 InnoDB: Using Linux native AIO
      140221 13:29:49 InnoDB: Initializing buffer pool, size = 48.0G
      140221 13:29:51 InnoDB: Completed initialization of buffer pool
      140221 13:29:51 InnoDB: highest supported file format is Barracuda.
      140221 13:29:54 InnoDB: Waiting for the background threads to start
      140221 13:29:55 InnoDB: 5.5.34 started; log sequence number
      90436392703
      140221 13:29:55 [Note] Server hostname (bind-address): '0.0.0.0';
      port: 3306
      140221 13:29:55 [Note] - '0.0.0.0' resolves to '0.0.0.0';
      140221 13:29:55 [Note] Server socket created on IP: '0.0.0.0'.
      140221 13:29:55 [Note] Event Scheduler: Loaded 0 events
      140221 13:29:55 [Note] WSREP: Signalling provider to continue.
      140221 13:29:55 [Note] WSREP: SST received: 646d078b-98a2-11e3-b7a6-
      93cfe86c89f3:156159
      140221 13:29:55 [Note] WSREP: Receiving IST: 19552 writesets, seqnos
      156159-175711
      140221 13:29:55 [Note] /usr/sbin/mysqld: ready for connections.
      Version: '5.5.34-log' socket: '/var/lib/mysql/datadir/mysql.sock'
      port: 3306 MySQL Community Server (GPL), wsrep_25.9.r3928
      140221 13:29:55 [Warning] IP address '10.10.0.203' could not be
      resolved: Name or service not known
      140221 13:29:58 [Warning] IP address '10.10.17.24' could not be
      resolved: Name or service not known
      mysqld: /tmp/mysql-5.5.34/sql/wsrep_applier.cc:309:
      wsrep_cb_status_t wsrep_commit_cb(void*, uint32_t, const
      wsrep_trx_meta_t*, wsrep_bool_t*, bool): Assertion `meta->gtid.seqno
      == wsrep_thd_trx_seqno(thd)' failed.
      13:30:03 UTC - mysqld got signal 6 ;
      This could be because you hit a bug. It is also possible that this
      binary
      or one of the libraries it was linked against is corrupt, improperly
      built,
      or misconfigured. This error can also be caused by malfunctioning
      hardware.
      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed,
      something is definitely wrong and this may fail.

      key_buffer_size=8388608
      read_buffer_size=131072
      max_used_connections=67
      max_threads=1024
      thread_count=66
      connection_count=66
      It is possible that mysqld could use up to
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads
      = 2248680 K bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.

      Thread pointer: 0x7f442000d9f0
      Attempting backtrace. You can use the following information to find
      out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 7f44f4d8be68 thread_stack 0x40000
      /usr/sbin/mysqld(my_print_stacktrace+0x35)[0x828ce5]
      /usr/sbin/mysqld(handle_fatal_signal+0x403)[0x6ae413]
      /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x7f53b8201cb0]
      /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35)[0x7f53b682f425]
      /lib/x86_64-linux-gnu/libc.so.6(abort+0x17b)[0x7f53b6832b8b]
      /lib/x86_64-linux-gnu/libc.so.6(+0x2f0ee)[0x7f53b68280ee]
      /lib/x86_64-linux-gnu/libc.so.6(+0x2f192)[0x7f53b6828192]
      /usr/sbin/mysqld(_Z15wsrep_commit_cbPvjPK14wsrep_trx_metaPbb+0x1ec)
      [0x67050c]
      /usr/lib/galera/libgalera_smm.so(_ZN6galera13ReplicatorSMM9apply_trx
      EPvPNS_9TrxHandleE+0x110)[0x7f53b532def0]
      /usr/lib/galera/libgalera_smm.so(_ZN6galera13ReplicatorSMM8recv_ISTE
      Pv+0x24e)[0x7f53b533c0ee]
      /usr/lib/galera/libgalera_smm.so(_ZN6galera13ReplicatorSMM10async_re
      cvEPv+0x308)[0x7f53b5330fa8]
      /usr/lib/galera/libgalera_smm.so(galera_recv+0x23)[0x7f53b5340eb3]
      /usr/sbin/mysqld[0x671542]
      /usr/sbin/mysqld(start_wsrep_THD+0x3a9)[0x524ad9]
      /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a)[0x7f53b81f9e9a]
      /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f53b68ecccd]

      Trying to get some variables.
      Some pointers may be invalid and cause the dump to abort.
      Query (0): is an invalid pointer
      Connection ID (thread ID): 65
      Status: NOT_KILLED

      The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html
      contains
      information that should help you find out what is causing the crash.
      140221 13:30:03 mysqld_safe Number of processes running now: 0
      140221 13:30:03 mysqld_safe WSREP: not restarting wsrep node
      automatically
      140221 13:30:03 mysqld_safe mysqld from pid file
      /var/lib/mysql/datadir/ip-10-10-17-151.pid ended

      show global variables like '%version%';
      -----------------------------------------------------------------------+

      Variable_name Value

      -----------------------------------------------------------------------+

      innodb_version 5.5.34
      version 5.5.34-log
      version_comment MySQL Community Server (GPL),wsrep_25.9.r3928
      version_compile_machine x86_64
      version_compile_os Linux

      -----------------------------------------------------------------------+

      show global status like '%version%';
      ------------------------------------+

      Variable_name Value

      ------------------------------------+

      wsrep_protocol_version 5
      wsrep_provider_version 25.3.2(r170)

      ------------------------------------+

      show global variables like '%wsrep_pro%';
      wsrep_provider,/usr/lib/galera/libgalera_smm.so
      wsrep_provider_options,"base_host = 10.10.9.76; base_port = 4567;
      cert.log_conflicts = no; evs.causal_keepalive_period = PT1S;
      evs.debug_log_mask = 0x1; evs.inactive_check_period = PT0.5S;
      evs.inactive_timeout = PT15S; evs.info_log_mask = 0;
      evs.install_timeout = PT15S; evs.join_retrans_period = PT1S;
      evs.keepalive_period = PT1S; evs.max_install_timeouts = 1;
      evs.send_window = 4; evs.stats_report_period = PT1M;
      evs.suspect_timeout = PT5S; evs.use_aggregate = true;
      evs.user_send_window = 2; evs.version = 0; evs.view_forget_timeout =
      PT5M; gcache.dir = /var/lib/mysql/datadir/; gcache.keep_pages_size =
      0; gcache.mem_size = 0; gcache.name =
      /var/lib/mysql/datadir//galera.cache; gcache.page_size = 128M;
      gcache.size = 8G; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit
      = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500;
      gcs.max_throttle = 0.25; gcs.recv_q_hard_limit =
      9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor =
      NO; gmcast.listen_addr = tcp://0.0.0.0:4567; gmcast.mcast_addr = ;
      gmcast.mcast_ttl = 1; gmcast.peer_timeout = PT3S; gmcast.segment =
      0; gmcast.time_wait = PT5S; gmcast.version = 0; ist.recv_addr =
      10.10.9.76; pc.checksum = false; pc.ignore_quorum = false;
      pc.ignore_sb = false; pc.linger = PT20S; pc.npvo = false; pc.version
      = 0; pc.weight = 1; protonet.backend = asio; protonet.version = 0;
      repl.causal_read_timeout = PT30S; repl.commit_order = 3;
      repl.key_format = FLAT8; repl.proto_max = 5; socket.checksum = 2"

        Smart Checklist

          Attachments

            Activity

              People

              • Assignee:
                krunal.bauskar Krunal Bauskar
                Reporter:
                lpjirasync lpjirasync (Inactive)
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated: