Uploaded image for project: 'Percona Monitoring and Management'
  1. Percona Monitoring and Management
  2. PMM-1519

find way to correct stop extremely large prometheus instance

    Details

    • Type: Bug
    • Status: Done
    • Priority: Medium
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.4.0
    • Component/s: PMM Server
    • Labels:
      None
    • Story Points:
      0.5

      Description

      on large prometheus instances (more > 50 targets) prometheus shutdown can take a long time (several minutes).
      but supervisord and systemd don't wait and kill prometheus after some period.
      In this case during prometheus start, it detects non-clean shutdown and run crash recovery, which can take up to 20 minutes on large installations.

      so it is needed to configure supervisord and systemd to wait for 5 minutes after SIGTERM and before SIGKILL.

      initial reqest was on PL Dublin 2017, also see https://www.percona.com/forums/questions-discussions/percona-monitoring-and-management/49655-pmm-shutdown-flow

        Smart Checklist

          Attachments

            Activity

              People

              • Assignee:
                borys.belinsky Borys Belinsky
                Reporter:
                mykola.marzhan Mykola Marzhan
              • Votes:
                1 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: