Uploaded image for project: 'Percona Monitoring and Management'
  1. Percona Monitoring and Management
  2. PMM-5538

Heavy Load with Distinct Queries on Slowlog enabled could cause no data being reported

Details

    • 1
    • 08 - Server Features
    • Yes
    • Yes
    • Server Features

    Description

      PZ found that If we try to run heavy load on a mysql instance and run distinct queries, then pmm-agent is most likely to fail to send these buckets to QAN, the Failed message could be seen in PMM-agent.log file, for example in this case

      Mar 01 20:15:37 ps8-1-n-1-stq-tb1-s100000x pmm-agent[4346]: INFO[2020-03-01T20:15:37.449+00:00] time="2020-03-01T20:15:37Z" level=error msg="Error scraping for collect.info_schema.innodb_tablespaces: Error 1109:
      Mar 01 20:15:37 ps8-1-n-1-stq-tb1-s100000x pmm-agent[4346]: INFO[2020-03-01T20:15:37.538+00:00] time="2020-03-01T20:15:37Z" level=error msg="Error scraping for collect.engine_tokudb_status: Error 1286: Unknown st
      Mar 01 20:15:37 ps8-1-n-1-stq-tb1-s100000x pmm-agent[4346]: INFO[2020-03-01T20:15:37.580+00:00] time="2020-03-01T20:15:37Z" level=error msg="Error scraping for collect.heartbeat: Error 1049: Unknown database 'hea
      Mar 01 20:16:03 ps8-1-n-1-stq-tb1-s100000x pmm-agent[4346]: INFO[2020-03-01T20:16:03.206+00:00] Sending 27378 buckets.                        agentID=/agent_id/5a3b6350-7bc5-4bc9-925e-6a1d4a500667 component=agent
      Mar 01 20:16:12 ps8-1-n-1-stq-tb1-s100000x pmm-agent[4346]: WARN[2020-03-01T20:16:12.882+00:00] Failed to send QanCollect request.            component=client
      Mar 01 20:16:12 ps8-1-n-1-stq-tb1-s100000x pmm-agent[4346]: INFO[2020-03-01T20:16:12.885+00:00] Connection closed.                            component=client
      Mar 01 20:16:12 ps8-1-n-1-stq-tb1-s100000x pmm-agent[4346]: INFO[2020-03-01T20:16:12.886+00:00] Process: context canceled.                    agentID=/agent_id/dcfff9bb-957d-4736-96f4-720c623aeb17 component=agent
      Mar 01 20:16:12 ps8-1-n-1-stq-tb1-s100000x pmm-agent[4346]: INFO[2020-03-01T20:16:12.891+00:00] Sending status: STOPPING (port 42001).        agentID=/agent_id/dcfff9bb-957d-4736-96f4-720c623aeb17 component=agent
      Mar 01 20:16:12 ps8-1-n-1-stq-tb1-s100000x pmm-agent[4346]: WARN[2020-03-01T20:16:12.891+00:00] Failed to send StateChanged request.          component=client
      Mar 01 20:16:12 ps8-1-n-1-stq-tb1-s100000x pmm-agent[4346]: INFO[2020-03-01T20:16:12.893+00:00] Done.                                         component=actions-runner
      Mar 01 20:16:12 ps8-1-n-1-stq-tb1-s100000x pmm-agent[4346]: INFO[2020-03-01T20:16:12.893+00:00] Starting 0, restarting 0, and stopping 2 agent processes.  component=supervisor
      Mar 01 20:16:12 ps8-1-n-1-stq-tb1-s100000x pmm-agent[4346]: INFO[2020-03-01T20:16:12.893+00:00] Context done with context canceled. Reader closed with <nil>.  agentID=/agent_id/5a3b6350-7bc5-4bc9-925e-6a1d4a50066
      Mar 01 20:16:12 ps8-1-n-1-stq-tb1-s100000x pmm-agent[4346]: INFO[2020-03-01T20:16:12.893+00:00] Stopped.                                      component=local-server/JSON
      Mar 01 20:16:12 ps8-1-n-1-stq-tb1-s100000x pmm-agent[4346]: INFO[2020-03-01T20:16:12.894+00:00] Process: context canceled.                    agentID=/agent_id/c5e3a293-637c-41b4-bc4f-46a5bb653757 component=agent
      Mar 01 20:16:12 ps8-1-n-1-stq-tb1-s100000x pmm-agent[4346]: INFO[2020-03-01T20:16:12.902+00:00] Sending status: STOPPING (port 42000).        agentID=/agent_id/c5e3a293-637c-41b4-bc4f-46a5bb653757 component=agent
      Mar 01 20:16:12 ps8-1-n-1-stq-tb1-s100000x pmm-agent[4346]: WARN[2020-03-01T20:16:12.902+00:00] Failed to send StateChanged request.          component=client
      Mar 01 20:16:12 ps8-1-n-1-stq-tb1-s100000x pmm-agent[4346]: INFO[2020-03-01T20:16:12.903+00:00] Sending status: WAITING.                      agentID=/agent_id/5a3b6350-7bc5-4bc9-925e-6a1d4a500667 component=agent
      Mar 01 20:16:12 ps8-1-n-1-stq-tb1-s100000x pmm-agent[4346]: INFO[2020-03-01T20:16:12.904+00:00] Sending status: DONE.                         agentID=/agent_id/5a3b6350-7bc5-4bc9-925e-6a1d4a500667 component=agent
      Mar 01 20:16:12 ps8-1-n-1-stq-tb1-s100000x pmm-agent[4346]: WARN[2020-03-01T20:16:12.904+00:00] Failed to send StateChanged request.          component=client
      Mar 01 20:16:12 ps8-1-n-1-stq-tb1-s100000x pmm-agent[4346]: WARN[2020-03-01T20:16:12.904+00:00] Failed to send StateChanged request.          component=client
      Mar 01 20:16:12 ps8-1-n-1-stq-tb1-s100000x pmm-agent[4346]: INFO[2020-03-01T20:16:12.905+00:00] Done.                                         component=local-server
      Mar 01 20:16:12 ps8-1-n-1-stq-tb1-s100000x pmm-agent[4346]: INFO[2020-03-01T20:16:12.906+00:00] Process: exited: signal: terminated.          agentID=/agent_id/dcfff9bb-957d-4736-96f4-720c623aeb17 component=agent
      Mar 01 20:16:12 ps8-1-n-1-stq-tb1-s100000x pmm-agent[4346]: INFO[2020-03-01T20:16:12.906+00:00] Sending status: DONE (port 42001).          
      

       

      The problem is when, > 
       "Sending 27378 buckets"  and fails to do that due to too short timeout or anything else, 

      This has potential of system magically stopping working (reporting data) as user workload increases (generating more distinct queries) 
       
      STR: 

      1. started pmm, dev-latest version
      2. PS 5.7.x, with slow log enabled
      3. set long_query_time=0
      4. set log_slow_rate_limit=1;
      5. start running load script from here
      6. added the instance to monitoring with --query-source=slowlog
      7. check buckets in pmm-agent.log
      8. check for Failed message in pmm-agent.log
      9. check for err message in pmm-agent.log

      Update: 

      The issue was recreated with : 
      `Server version: 8.0.18-9 Percona Server (GPL), Release 9, Revision 53e606f` 
      Slow_log enabled, log_slow_rate_limit=1, long_query_time=0

      The pmm-workload PHP script was executed with the following: 

      DB=10, Tables=50, Queries=1000, basically ended up with generating more then 42k buckets which failed just as mentioned in the description, I am attaching log files and screenshot. 

       

      summary_ip-10-178-0-198_2020_03_06_08_28_21.zip

       

      Attachments

        Issue Links

          Activity

            People

              david.mikus David Mikus (Inactive)
              puneet.kala Puneet Kala
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Smart Checklist