Uploaded image for project: 'Percona Monitoring and Management'
  1. Percona Monitoring and Management
  2. PMM-7350

Restart DB Cluster restarts only one PXC server and one proxySQL, not all of them


    • 3
    • Yes
    • [obsolete] Server Integrations
    • Hide

      Just follow the steps to reproduce. They won't be reproducible anymore

      Just follow the steps to reproduce. They won't be reproducible anymore


      Impact on the user:

      • user will not get what they've tried to achieve by restarting the DB Cluster

      Steps to reproduce:

      1. Setup PXC cluster in PMM
      2. wait it's ready
      3. Restart it
      4. Wait it's restarted
      5. check status in k8s

      Actual result:
      spronin-test-proxysql-0 3/3 Running 6 13h spronin-test-proxysql-1 3/3 Running 0 13h spronin-test-proxysql-2 0/3 ContainerCreating 0 1s spronin-test-pxc-0 1/1 Running 0 13h spronin-test-pxc-1 1/1 Running 8 13h spronin-test-pxc-2 1/1 Terminating 0 2m29s

      as you can see only one PXC and one proxySQL pod's started recently, others are 13h Up

      Expected Result:
      All pods have recent uptime

      restart pods from kubectl

      Suggested Implementation:

      • When dbaas-controller gets Restart request
      • Pause DB cluster
      • Wait until DB cluster is paused
        • Start new goroutine to Resume DB cluster
      • Return response

      Possible issues:

      • If dbaas-controller restarts during restart DB cluster may stuck in pause state
      • After pausing DB cluster it may have active status for a few seconds (PMM-7397


      PMM-6824 implemented database cluster restart with kubectl rollout restart. It is good enough for alpha, but not good enough for beta: we can lose data this way, and operators' team confirmed it is not safe.

      For beta1, we should implement restart via full cluster pause/resume: https://www.percona.com/doc/kubernetes-operator-for-pxc/pause.html

      The same functionality will be available for PSMDB.


        Issue Links



              andrew.minkin Andrew Minkin
              roma.novikov Roma Novikov
              0 Vote for this issue
              6 Start watching this issue



                Smart Checklist