Uploaded image for project: 'Percona Monitoring and Management'
  1. Percona Monitoring and Management
  2. PMM-5365

Client fails to send non-UTF-8 query analytics content to server

Details

    • Bug
    • Status: Done
    • High
    • Resolution: Fixed
    • 2.2.1, 2.3.0, 2.4.0, 2.6.1
    • 2.8.0
    • PMM Agent
    • 1
    • Platform Sprint 17, Platform Sprint 18
    • Yes
    • Yes
    • No
    • No

    Description

      Hello All,

      We are currently testing PMM2, it is working quiet well so far, however we experiencing some problems with one of our Nodes. We currently monitor a Galera Cluster (Percona XtraDB Cluster 5.7.28-31.41.1.el8 running on CentOS Linux release 8.1.1911 (Core)) of 5 Nodes, 4 are already in PMM2, 3 of this 4 nodes are working well, system metric, mysql metrics and slow query logs are collected/sent to the server well.

      One node only sends its metrics but not the slow query log, after enabling debug mode, I've seen the following messages:

      Jan 24 08:38:59 node1 pmm-agent[2]: #033[37mDEBU#033[0m[date] No need to reset: same file, old position 609104731, new file size 609104731.  #033[37magentID#033[0m=/agent_id/641159FD-8C33-4778-9CCF-013EBE92D1A4 #033[37mcomponent#033[0m=slowlog/reader #033[37mfile#033[0m=/var/log/mysql/mysql-slow.log #033[37mtype#033[0m=qan_mysql_slowlog_agent
      Jan 24 08:39:00 node1 pmm-agent[2]: #033[37mDEBU#033[0m[date] Made 231 buckets out of 231 classes in 08:38:25+34 interval. Wait time: 34.389665111s.  #033[37magentID#033[0m=/agent_id/641159FD-8C33-4778-9CCF-013EBE92D1A4 #033[37mcomponent#033[0m=agent-builtin #033[37mtype#033[0m=qan_mysql_slowlog_agent
      Jan 24 08:39:00 node1 pmm-agent[2]: #033[37mDEBU#033[0m[date] Scheduling next aggregation in 59.983990147s at 08:40:00.  #033[37magentID#033[0m=/agent_id/641159FD-8C33-4778-9CCF-013EBE92D1A4 #033[37mcomponent#033[0m=agent-builtin #033[37mtype#033[0m=qan_mysql_slowlog_agent
      Jan 24 08:39:00 node1 pmm-agent[2]: #033[36mINFO#033[0m[date] Sending 231 buckets.                          #033[36magentID#033[0m=/agent_id/641159FD-8C33-4778-9CCF-013EBE92D1A4 #033[36mcomponent#033[0m=agent-builtin #033[36mtype#033[0m=qan_mysql_slowlog_agent
      Jan 24 08:39:00 node1 pmm-agent[2]: #033[37mDEBU#033[0m[date] Sending message (461889 bytes):
      Jan 24 08:39:00 node1 pmm-agent[2]: id: 8
      Jan 24 08:39:00 node1 pmm-agent[2]: qan_collect: <
      Jan 24 08:39:00 node1 pmm-agent[2]:  metrics_bucket: <
      Jan 24 08:39:00 node1 pmm-agent[2]:    common: <
      ... SLOW QUERY LOG ENTRIES ...
      Jan 24 08:39:00 node1 pmm-agent[2]:  #033[37mcomponent#033[0m=channel
      Jan 24 08:39:00 node1 pmm-agent[2]: #033[37mDEBU#033[0m[date] Closing with error: rpc error: code = Internal desc = grpc: error while marshaling: proto: field "agent.MetricsBucket.Common.Example" contains invalid UTF-8
      Jan 24 08:39:00 node1 pmm-agent[2]: failed to send message
      Jan 24 08:39:00 node1 pmm-agent[2]: github.com/percona/pmm-agent/client/channel.(*Channel).send
      Jan 24 08:39:00 node1 pmm-agent[2]: #011/tmp/go/src/github.com/percona/pmm-agent/client/channel/channel.go:183
      Jan 24 08:39:00 node1 pmm-agent[2]: github.com/percona/pmm-agent/client/channel.(*Channel).SendRequest
      Jan 24 08:39:00 node1 pmm-agent[2]: #011/tmp/go/src/github.com/percona/pmm-agent/client/channel/channel.go:156
      Jan 24 08:39:00 node1 pmm-agent[2]: github.com/percona/pmm-agent/client.(*Client).processSupervisorRequests.func2
      Jan 24 08:39:00 node1 pmm-agent[2]: #011/tmp/go/src/github.com/percona/pmm-agent/client/client.go:234
      Jan 24 08:39:00 node1 pmm-agent[2]: runtime.goexit
      Jan 24 08:39:00 node1 pmm-agent[2]: #011/usr/local/go/src/runtime/asm_amd64.s:1337  #033[37mcomponent#033[0m=channel
      Jan 24 08:39:00 node1 pmm-agent[2]: #033[33mWARN#033[0m[date] Failed to send QanCollect request.            #033[33mcomponent#033[0m=client
      Jan 24 08:39:00 node1 pmm-agent[2]: #033[37mDEBU#033[0m[date] Exiting receiver goroutine.                   #033[37mcomponent#033[0m=channel
      Jan 24 08:39:00 node1 pmm-agent[2]: #033[37mDEBU#033[0m[date] Channel closed: failed to send message: rpc error: code = Internal desc = grpc: error while marshaling: proto: field "agent.MetricsBucket.Common.Example" contains invalid UTF-8.  #033[37mcomponent#033[0m=client
      Jan 24 08:39:00 node1 pmm-agent[2]: #033[36mINFO#033[0m[date] Connection closed.                            #033[36mcomponent#033[0m=client
      Jan 24 08:39:00 node1 pmm-agent[2]: #033[36mINFO#033[0m[date] Process: context canceled.                    #033[36magentID#033[0m=/agent_id/9C263F77-FFF7-49A9-B9B8-07B868566795 #033[36mcomponent#033[0m=agent-process #033[36mtype#033[0m=mysqld_exporter
      Jan 24 08:39:00 node1 pmm-agent[2]: #033[36mINFO#033[0m[date] Process: context canceled.                    #033[36magentID#033[0m=/agent_id/0F3F8677-ADAF-4FAF-989F-5FC230CBE29F #033[36mcomponent#033[0m=agent-process #033[36mtype#033[0m=node_exporter
      Jan 24 08:39:00 node1 pmm-agent[2]: #033[36mINFO#033[0m[date] Context done with context canceled. Reader closed with <nil>.  #033[36magentID#033[0m=/agent_id/641159FD-8C33-4778-9CCF-013EBE92D1A4 #033[36mcomponent#033[0m=agent-builtin #033[36mtype#033[0m=qan_mysql_slowlog_agent
      Jan 24 08:39:00 node1 pmm-agent[2]: #033[36mINFO#033[0m[date] Stopped.                                      #033[36mcomponent#033[0m=local-server/JSON
      Jan 24 08:39:00 node1 pmm-agent[2]: #033[36mINFO#033[0m[date] Starting 0, restarting 0, and stopping 2 agent processes.  #033[36mcomponent#033[0m=supervisor
      Jan 24 08:39:00 node1 pmm-agent[2]: #033[36mINFO#033[0m[date] Done.                                         #033[36mcomponent#033[0m=actions-runner// Code-Platzhalter
      

      Based on the messages, it seems, the rpc agent used internally is not able to marshall the message containing the slow query log, because of a non UTF-8 Character, however I analysed the slow query log of all servers and all slow query logs contain non UTF-8 characters, so why is only this one node not working?

      May a bug in the pmm-agent? I am not sure, please help.

      Additional Info:
      We now turned off slow query log on that server, cleared the logfile entirely, then we enabled the slow query log again and restarted pmm-agent, afterward the slow queries were reported to the server for a few minutes until the same error appeared again. Seems like a bug in pmm-agent while marshalling slow query logs containing non UTF-8 chars and this is causing a cacellation of the rpc connection to the server which ends up in a whole restart of the agent it seems.

      If you need any further Information, feel free to contact me.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              romulus Thomas bruckmann
              Votes:
              1 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - Not Specified
                  Not Specified
                  Logged:
                  Time Spent - 2 days, 3 hours
                  2d 3h

                  Smart Checklist