Details
-
Bug
-
Status: Done
-
High
-
Resolution: Fixed
-
2.2.1, 2.3.0, 2.4.0, 2.6.1
-
OS: CentOS Linux release 8.1.1911 (Core) (minimal installations with epel repo activated and nagios plugins for mysql)
MySQL Galera: Percona XtraDB Cluster 5.7.28-31.41.1.el8
Langpacks:
glibc-all-langpacks.x86_64 2.28-72.el8 @BaseOS
glibc-langpack-de.x86_64 2.28-72.el8 @BaseOS
glibc-langpack-en.x86_64 2.28-72.el8 @BaseOS
langpacks-de.noarch 1.0-12.el8 @AppStream
langpacks-en.noarch 1.0-12.el8 @AppStream
slang.x86_64 2.3.2-3.el8 @anacondaMost recent updates are installed.
OS: CentOS Linux release 8.1.1911 (Core) (minimal installations with epel repo activated and nagios plugins for mysql) MySQL Galera: Percona XtraDB Cluster 5.7.28-31.41.1.el8 Langpacks: glibc-all-langpacks.x86_64 2.28-72.el8 @BaseOS glibc-langpack-de.x86_64 2.28-72.el8 @BaseOS glibc-langpack-en.x86_64 2.28-72.el8 @BaseOS langpacks-de.noarch 1.0-12.el8 @AppStream langpacks-en.noarch 1.0-12.el8 @AppStream slang.x86_64 2.3.2-3.el8 @anaconda Most recent updates are installed.
-
1
-
Platform Sprint 17, Platform Sprint 18
-
Yes
-
Yes
-
No
-
No
Description
Hello All,
We are currently testing PMM2, it is working quiet well so far, however we experiencing some problems with one of our Nodes. We currently monitor a Galera Cluster (Percona XtraDB Cluster 5.7.28-31.41.1.el8 running on CentOS Linux release 8.1.1911 (Core)) of 5 Nodes, 4 are already in PMM2, 3 of this 4 nodes are working well, system metric, mysql metrics and slow query logs are collected/sent to the server well.
One node only sends its metrics but not the slow query log, after enabling debug mode, I've seen the following messages:
Jan 24 08:38:59 node1 pmm-agent[2]: #033[37mDEBU#033[0m[date] No need to reset: same file, old position 609104731, new file size 609104731. #033[37magentID#033[0m=/agent_id/641159FD-8C33-4778-9CCF-013EBE92D1A4 #033[37mcomponent#033[0m=slowlog/reader #033[37mfile#033[0m=/var/log/mysql/mysql-slow.log #033[37mtype#033[0m=qan_mysql_slowlog_agent Jan 24 08:39:00 node1 pmm-agent[2]: #033[37mDEBU#033[0m[date] Made 231 buckets out of 231 classes in 08:38:25+34 interval. Wait time: 34.389665111s. #033[37magentID#033[0m=/agent_id/641159FD-8C33-4778-9CCF-013EBE92D1A4 #033[37mcomponent#033[0m=agent-builtin #033[37mtype#033[0m=qan_mysql_slowlog_agent Jan 24 08:39:00 node1 pmm-agent[2]: #033[37mDEBU#033[0m[date] Scheduling next aggregation in 59.983990147s at 08:40:00. #033[37magentID#033[0m=/agent_id/641159FD-8C33-4778-9CCF-013EBE92D1A4 #033[37mcomponent#033[0m=agent-builtin #033[37mtype#033[0m=qan_mysql_slowlog_agent Jan 24 08:39:00 node1 pmm-agent[2]: #033[36mINFO#033[0m[date] Sending 231 buckets. #033[36magentID#033[0m=/agent_id/641159FD-8C33-4778-9CCF-013EBE92D1A4 #033[36mcomponent#033[0m=agent-builtin #033[36mtype#033[0m=qan_mysql_slowlog_agent Jan 24 08:39:00 node1 pmm-agent[2]: #033[37mDEBU#033[0m[date] Sending message (461889 bytes): Jan 24 08:39:00 node1 pmm-agent[2]: id: 8 Jan 24 08:39:00 node1 pmm-agent[2]: qan_collect: < Jan 24 08:39:00 node1 pmm-agent[2]: metrics_bucket: < Jan 24 08:39:00 node1 pmm-agent[2]: common: < ... SLOW QUERY LOG ENTRIES ... Jan 24 08:39:00 node1 pmm-agent[2]: #033[37mcomponent#033[0m=channel Jan 24 08:39:00 node1 pmm-agent[2]: #033[37mDEBU#033[0m[date] Closing with error: rpc error: code = Internal desc = grpc: error while marshaling: proto: field "agent.MetricsBucket.Common.Example" contains invalid UTF-8 Jan 24 08:39:00 node1 pmm-agent[2]: failed to send message Jan 24 08:39:00 node1 pmm-agent[2]: github.com/percona/pmm-agent/client/channel.(*Channel).send Jan 24 08:39:00 node1 pmm-agent[2]: #011/tmp/go/src/github.com/percona/pmm-agent/client/channel/channel.go:183 Jan 24 08:39:00 node1 pmm-agent[2]: github.com/percona/pmm-agent/client/channel.(*Channel).SendRequest Jan 24 08:39:00 node1 pmm-agent[2]: #011/tmp/go/src/github.com/percona/pmm-agent/client/channel/channel.go:156 Jan 24 08:39:00 node1 pmm-agent[2]: github.com/percona/pmm-agent/client.(*Client).processSupervisorRequests.func2 Jan 24 08:39:00 node1 pmm-agent[2]: #011/tmp/go/src/github.com/percona/pmm-agent/client/client.go:234 Jan 24 08:39:00 node1 pmm-agent[2]: runtime.goexit Jan 24 08:39:00 node1 pmm-agent[2]: #011/usr/local/go/src/runtime/asm_amd64.s:1337 #033[37mcomponent#033[0m=channel Jan 24 08:39:00 node1 pmm-agent[2]: #033[33mWARN#033[0m[date] Failed to send QanCollect request. #033[33mcomponent#033[0m=client Jan 24 08:39:00 node1 pmm-agent[2]: #033[37mDEBU#033[0m[date] Exiting receiver goroutine. #033[37mcomponent#033[0m=channel Jan 24 08:39:00 node1 pmm-agent[2]: #033[37mDEBU#033[0m[date] Channel closed: failed to send message: rpc error: code = Internal desc = grpc: error while marshaling: proto: field "agent.MetricsBucket.Common.Example" contains invalid UTF-8. #033[37mcomponent#033[0m=client Jan 24 08:39:00 node1 pmm-agent[2]: #033[36mINFO#033[0m[date] Connection closed. #033[36mcomponent#033[0m=client Jan 24 08:39:00 node1 pmm-agent[2]: #033[36mINFO#033[0m[date] Process: context canceled. #033[36magentID#033[0m=/agent_id/9C263F77-FFF7-49A9-B9B8-07B868566795 #033[36mcomponent#033[0m=agent-process #033[36mtype#033[0m=mysqld_exporter Jan 24 08:39:00 node1 pmm-agent[2]: #033[36mINFO#033[0m[date] Process: context canceled. #033[36magentID#033[0m=/agent_id/0F3F8677-ADAF-4FAF-989F-5FC230CBE29F #033[36mcomponent#033[0m=agent-process #033[36mtype#033[0m=node_exporter Jan 24 08:39:00 node1 pmm-agent[2]: #033[36mINFO#033[0m[date] Context done with context canceled. Reader closed with <nil>. #033[36magentID#033[0m=/agent_id/641159FD-8C33-4778-9CCF-013EBE92D1A4 #033[36mcomponent#033[0m=agent-builtin #033[36mtype#033[0m=qan_mysql_slowlog_agent Jan 24 08:39:00 node1 pmm-agent[2]: #033[36mINFO#033[0m[date] Stopped. #033[36mcomponent#033[0m=local-server/JSON Jan 24 08:39:00 node1 pmm-agent[2]: #033[36mINFO#033[0m[date] Starting 0, restarting 0, and stopping 2 agent processes. #033[36mcomponent#033[0m=supervisor Jan 24 08:39:00 node1 pmm-agent[2]: #033[36mINFO#033[0m[date] Done. #033[36mcomponent#033[0m=actions-runner// Code-Platzhalter
Based on the messages, it seems, the rpc agent used internally is not able to marshall the message containing the slow query log, because of a non UTF-8 Character, however I analysed the slow query log of all servers and all slow query logs contain non UTF-8 characters, so why is only this one node not working?
May a bug in the pmm-agent? I am not sure, please help.
Additional Info:
We now turned off slow query log on that server, cleared the logfile entirely, then we enabled the slow query log again and restarted pmm-agent, afterward the slow queries were reported to the server for a few minutes until the same error appeared again. Seems like a bug in pmm-agent while marshalling slow query logs containing non UTF-8 chars and this is causing a cacellation of the rpc connection to the server which ends up in a whole restart of the agent it seems.
If you need any further Information, feel free to contact me.
Attachments
Issue Links
- relates to
-
PMM-5283 Inconsistency with lengths of Example and Fingerprints in Query Analytics
-
- Done
-