vm-agent is restarting many times and communication between pmm-agent and pmm-managed are blocked
The problem is any time we remove/add multiple services to the same pmm-agent running in push mode, it triggers a stateChange request from pmm-managed to pmm-agent asking for updating the VMAgent Config, since there could be multiple requests pmm-agent can't run these requests in parallel and all of them stucks in a queue
This causes a timeout on several operations, causing restarts.
How could this impact?
Customers running multiple services on the same pmm-agent could face this issue, also if some automation tasks are set up by different teams, it could break the automation since pmm-agent could have timeouts on several such requests in parallel, this would not impact users will pull mode configured.
- Setup PMM Server and PMM Agent
- Add a bunch of services with metrics mode push
- Restart pmm-agent
- pmm-admin status response time too long
- Open PMM Inventory Page
- VMAgent stucks in starting state too long
- Try to remove one service via API
- Response too long