Uploaded image for project: 'Percona Monitoring and Management'
  1. Percona Monitoring and Management
  2. PMM-5915

Supervisord not restarting after restart of PMM Server virtual appliances (OVF/AMI)

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Done
    • Priority: High
    • Resolution: Fixed
    • Affects Version/s: 2.5.0, 2.6.0
    • Fix Version/s: 2.6.1
    • Component/s: Virtual Appliance
    • Labels:
      None
    • Story Points:
      2
    • Sprint:
      Platform Sprint 16
    • Needs Review:
      Yes
    • Needs QA:
      Yes
    • Needs Doc:
      No

      Description

      User Impact:
      PMM Server UI is not available
      STR:

      1. Install PMM 2.5.0
      2. Enable laboratory and testing repositories
      3. Upgrade to 2.6.0
      4. Login to PMM server and reboot the server
      5. Wait for a few minutes
      6. Run
         supervisorctl status

        supervisord is running

      1. Wait for a few minutes

      Given result: supervisord is stopped

       [admin@localhost ~]$ sudo supervisorctl status
      unix:///var/run/supervisor/supervisor.sock no such file
      [admin@localhost ~]$ sudo service supervisord start
      Redirecting to /bin/systemctl start supervisord.service
      Job for supervisord.service failed because a timeout was exceeded. See "systemctl status supervisord.service" and "journalctl -xe" for details.

      in journal:

       May 11 11:20:25 localhost crond[1568]: (CRON) INFO (Shutting down)
      May 11 11:20:25 localhost supervisord[1559]: 2020-05-11 11:20:25,582 INFO exited: grafana (exit status 0; expected)
      May 11 11:20:25 localhost supervisord[1559]: 2020-05-11 11:20:25,583 INFO exited: nginx (exit status 0; expected)
      May 11 11:20:25 localhost supervisord[1559]: 2020-05-11 11:20:25,583 INFO exited: cron (exit status 0; expected)
      May 11 11:20:25 localhost supervisord[1559]: 2020-05-11 11:20:25,583 INFO exited: alertmanager (exit status 0; expected)
      May 11 11:20:25 localhost supervisord[1559]: 2020-05-11 11:20:25,583 INFO exited: pmm-agent (exit status 0; expected)
      May 11 11:20:25 localhost supervisord[1559]: 2020-05-11 11:20:25,583 WARN received SIGTERM indicating exit request
      May 11 11:20:25 localhost supervisord[1559]: 2020-05-11 11:20:25,583 INFO waiting for postgresql, prometheus, pmm-managed, qan-api2, clickhouse to die
      May 11 11:20:25 localhost supervisord[1559]: 2020-05-11 11:20:25,591 INFO exited: pmm-managed (exit status 0; expected)
      May 11 11:20:25 localhost supervisord[1559]: 2020-05-11 11:20:25,591 INFO exited: qan-api2 (exit status 0; expected)
      May 11 11:20:25 localhost supervisord[1559]: 2020-05-11 11:20:25,595 INFO exited: prometheus (exit status 0; expected)
      May 11 11:20:25 localhost supervisord[1559]: 2020-05-11 11:20:25,609 INFO exited: postgresql (exit status 0; expected)
      May 11 11:20:25 localhost supervisord[1559]: 2020-05-11 11:20:25,830 INFO exited: clickhouse (exit status 0; expected)
      May 11 11:20:25 localhost systemd[1]: Failed to start Process Monitoring and Control Daemon.
      -- Subject: Unit supervisord.service has failed
      -- Defined-By: systemd
      -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
      -- 
      -- Unit supervisord.service has failed.
      -- 
      -- The result is failed.
      May 11 11:20:25 localhost systemd[1]: Unit supervisord.service entered failed state.
      May 11 11:20:25 localhost systemd[1]: supervisord.service failed.
      May 11 11:20:25 localhost polkitd[767]: Unregistered Authentication Agent for unix-process:1541:63960 (system bus name :1.23, object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale en_US.UT
      May 11 11:20:25 localhost sudo[1539]: pam_unix(sudo:session): session closed for user root

      The same problem appears for upgrade from 2.4.0 to 2.5.0

      Logs after upgrade are attached

      The same problem appears on AMI after upgrade

      Additional information: The problem appears only after restarting upgraded OVF/AMI.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              Unassigned
              Reporter:
              nailya.kutlubaeva Nailya Kutlubaeva
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - Not Specified
                  Not Specified
                  Logged:
                  Time Spent - 4 hours, 30 minutes
                  4h 30m

                    Smart Checklist