Uploaded image for project: 'Percona Server'
  1. Percona Server
  2. PS-4716

MySQL connection exhaustion due to too many in CLOSE_WAIT state

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Done
    • Priority: Medium
    • Resolution: Fixed
    • Affects Version/s: 5.7.22-22, 5.7.23-23
    • Fix Version/s: 5.7.25-28
    • Component/s: None
    • Labels:
    • Environment:

      Linux - Ubuntu Trusty

      3.13.0-88-generic

      Ver 5.7.22-22 for debian-linux-gnu on x86_64 (Percona Server (GPL), Release '22', Revision 'f62d93c')

      Description

      *strong text*The affected node was recently upgrade to v5.7.22 from v5.6. This is the second time we noticed max connections error on this host. On troubleshooting, high number of TCP connections in CLOSE_WAIT were noticed. Most are from localhost and we assume its from percona's mysqld exporter. During the issue, exporter's log reported hitting open file limit.

      http: Accept error: accept tcp 127.0.0.1:9104: accept4: too many open files; retrying in 1s

      Restarting exporter didn't resolve the issue though it didn't complain about `too many open files`.

      pt-pmp has the following output.

      3489 __lll_lock_wait(libpthread.so.0),_L_lock_1081(libpthread.so.0),pthread_mutex_lock(libpthread.so.0),THD::init,THD::THD,Channel_info::create_thd,Channel_info_tcpip_socket::create_thd,handle_connection,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      510 __lll_lock_wait(libpthread.so.0),_L_lock_1081(libpthread.so.0),pthread_mutex_lock(libpthread.so.0),THD::init,THD::THD,Channel_info::create_thd,Channel_info_local_socket::create_thd,handle_connection,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      18 libaio::??(libaio.so.1),LinuxAIOHandler::collect,LinuxAIOHandler::poll,os_aio_handler,fil_aio_wait,io_handler_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      8 nanosleep(libpthread.so.0),os_thread_sleep,buf_lru_manager,start_thread(libpthread.so.0),clone(libc.so.6)
      7 pthread_cond_wait,os_event::wait_low,srv_worker_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      3 pthread_cond_wait,os_event::wait_low,buf_flush_page_cleaner_worker,start_thread(libpthread.so.0),clone(libc.so.6)
      1 sigwait(libpthread.so.0),signal_hand,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      1 sigwaitinfo(libc.so.6),::??,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      1 pthread_cond_wait,os_event::wait_low,srv_purge_coordinator_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      1 pthread_cond_wait,os_event::wait_low,buf_resize_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      1 pthread_cond_wait,os_event::wait_low,buf_dump_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      1 pthread_cond_wait,Event_queue::cond_wait,Event_queue::get_top_for_execution_if_time,Event_scheduler::run,event_scheduler_thread,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      1 pthread_cond_wait,compress_gtid_table,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      1 pthread_cond_timedwait,os_event::timed_wait,os_event::wait_time_low,srv_monitor_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      1 pthread_cond_timedwait,os_event::timed_wait,os_event::wait_time_low,srv_error_monitor_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      1 pthread_cond_timedwait,os_event::timed_wait,os_event::wait_time_low,lock_wait_timeout_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      1 pthread_cond_timedwait,os_event::timed_wait,os_event::wait_time_low,ib_wqueue_timedwait,fts_optimize_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      1 pthread_cond_timedwait,os_event::timed_wait,os_event::wait_time_low,dict_stats_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      1 pthread_cond_timedwait,os_event::timed_wait,os_event::wait_time_low,buf_flush_page_cleaner_coordinator,start_thread(libpthread.so.0),clone(libc.so.6)
      1 pthread_cond_timedwait,audit_log::??(audit_log.so),start_thread(libpthread.so.0),clone(libc.so.6)
      1 poll(libc.so.6),vio_io_wait,vio_socket_io_wait,vio_read,vio_read_buff,::??,::??,my_net_read,cli_safe_read_with_ok,handle_slave_io,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      1 poll(libc.so.6),Mysqld_socket_listener::listen_for_connection_event,mysqld_main,__libc_start_main(libc.so.6),_start
      1 nanosleep(libpthread.so.0),os_thread_sleep,srv_master_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      1 __lll_lock_wait(libpthread.so.0),_L_lock_909(libpthread.so.0),pthread_mutex_lock(libpthread.so.0),show_binlogs,mysql_execute_command,mysql_parse,dispatch_command,do_command,handle_connection,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      1 __lll_lock_wait(libpthread.so.0),_L_lock_909(libpthread.so.0),pthread_mutex_lock(libpthread.so.0),PolyLock_lock_log::rdlock,sys_var::value_ptr,get_one_variable_ext,System_variable::init,PFS_system_variable_cache::do_materialize_all,table_session_variables::rnd_init,ha_perfschema::rnd_init,handler::ha_rnd_init,init_read_record,join_init_read_record,sub_select,JOIN::exec,TABLE_LIST::materialize_derived,join_materialize_derived,QEP_TAB::prepare_scan,sub_select,JOIN::exec,handle_query,::??,mysql_execute_command,mysql_parse,dispatch_command,do_command,handle_connection,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      1 __lll_lock_wait(libpthread.so.0),_L_lock_1081(libpthread.so.0),pthread_mutex_lock(libpthread.so.0),Log_in_use::operator,Global_THD_manager::do_for_all_thd,MYSQL_BIN_LOG::purge_logs_before_date,MYSQL_BIN_LOG::purge,MYSQL_BIN_LOG::ordered_commit,MYSQL_BIN_LOG::commit,ha_commit_trans,trans_commit,Xid_log_event::do_commit,Xid_apply_log_event::do_apply_event,Log_event::apply_event,apply_event_and_update_pos,handle_slave_sql,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)

      Strace of mysql process and threads revealed that slave io_thread was functional and data was being written to relay_log files but sql_thread seemed down. This is because the changes were being applied - mysql-bin.* positition was stalled since the issue showed up and relay logs were also growing since then.

      The same issue has been witnessed on two other nodes although there the process won't even respond. I don't have a gdb trace for them but strace data did show the same symptomps - growing relay log but not its own binary log. The issue seems related to PS-4590.

      Resolution

      • mysqladmin shutdown didn't work since user couldn't log in due to Too Many Connections.
      • kill `pidof mysqld` didn't work. No messages in error.log about receiving shutdown signal
      • kill -9 `pidof mysqld` killed mysqld process. It was immediately spawned by mysqld_safe and after a while the system stopped responding. I couldn't ssh.

        Smart Checklist

          Attachments

            Issue Links

              Activity

                People

                • Assignee:
                  yura.sorokin Yura Sorokin
                  Reporter:
                  shashanksahni12@gmail.com Shashank Sahni
                • Votes:
                  2 Vote for this issue
                  Watchers:
                  8 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved:

                    Time Tracking

                    Estimated:
                    Original Estimate - Not Specified
                    Not Specified
                    Remaining:
                    Remaining Estimate - Not Specified
                    Not Specified
                    Logged:
                    Time Spent - 6 hours, 15 minutes
                    6h 15m