Uploaded image for project: 'Percona Monitoring and Management'
  1. Percona Monitoring and Management
  2. PMM-9085

PMM Server crashes after upgrading to 2.22 every 4 hours

Details

    • 1
    • Yes
    • Yes
    • [obsolete] Server Features

    Description

      Issue:
      --------
      After upgrading to 2.22 pmm-managed component crashes with the following error:

      panic: interface conversion: agentpb.AgentResponsePayload is nil, not *agentpb.GetVersionsResponse
      
      goroutine 268 [running]:
      github.com/percona/pmm-managed/services/agents.(*VersionerService).GetVersions(0xc0000bc550, 0xc000664600, 0x2e, 0xc000bd0380, 0x4, 0x4, 0x0, 0x0, 0x0, 0x0, ...)
       /home/builder/rpm/BUILD/pmm-managed-1ef5c4435934798165ff937e2f4c916b4329558a/src/github.com/percona/pmm-managed/services/agents/versioner.go:122 +0x5c9
      github.com/percona/pmm-managed/services/versioncache.(*Service).updateVersionsForNextService(0xc000313bf0, 0xc000000004, 0x1639d1b, 0x14)
       /home/builder/rpm/BUILD/pmm-managed-1ef5c4435934798165ff937e2f4c916b4329558a/src/github.com/percona/pmm-managed/services/versioncache/versioncache.go:164 +0x137
      github.com/percona/pmm-managed/services/versioncache.(*Service).Run(0xc000313bf0, 0x18897d8, 0xc00062aae0)
       /home/builder/rpm/BUILD/pmm-managed-1ef5c4435934798165ff937e2f4c916b4329558a/src/github.com/percona/pmm-managed/services/versioncache/versioncache.go:235 +0x308
      main.main.func13(0xc000756c40, 0xc000313bf0, 0x18897d8, 0xc00062aae0)
       /home/builder/rpm/BUILD/pmm-managed-1ef5c4435934798165ff937e2f4c916b4329558a/src/github.com/percona/pmm-managed/main.go:831 +0x6b
      created by main.main
       /home/builder/rpm/BUILD/pmm-managed-1ef5c4435934798165ff937e2f4c916b4329558a/src/github.com/percona/pmm-managed/main.go:829 +0x3bdd
      

      It happens every 4 hours and causes problems.

      Cause:
      ----------
      Service version check functionality was added in 2.22 (PMM-8460)

      In code, we do not check for NIL in response:

      	response, err := agent.channel.SendAndWaitResponse(request)
      	if err != nil {
      		return nil, errors.WithStack(err)
      	}
      
      	versionsResponse := response.(*agentpb.GetVersionsResponse).Versions
      	if len(versionsResponse) != len(softwaresRequest) {
      		return nil, errors.Errorf("response and request slice length mismatch %d != %d",
      			len(versionsResponse), len(softwaresRequest))
      	}
      

      The problem is we can receive NIL in case of a communication failure or if a remote pmm-agent has an older version and knows nothing about this kind of request.

      Possible solution:
      ------------------------
      Add a simple check "if response == nil" to handle situations like this properly and avoid a pmm-managed crash. E.g. we could log an ERROR/WARNING message and continue.

      Attachments

        Issue Links

          Activity

            People

              maksym.hilliaka Maksym Hilliaka (Inactive)
              maksim.larin Maksim Larin (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Smart Checklist