Details
-
Bug
-
Status: Done
-
Medium
-
Resolution: Fixed
-
None
-
3
-
Yes
-
Server Integrations
-
Description
Impact on the user:
- user will not get what they've tried to achieve by restarting the DB Cluster
Steps to reproduce:
- Setup PXC cluster in PMM
- wait it's ready
- Restart it
- Wait it's restarted
- check status in k8s
Actual result:
spronin-test-proxysql-0 3/3 Running 6 13h spronin-test-proxysql-1 3/3 Running 0 13h spronin-test-proxysql-2 0/3 ContainerCreating 0 1s spronin-test-pxc-0 1/1 Running 0 13h spronin-test-pxc-1 1/1 Running 8 13h spronin-test-pxc-2 1/1 Terminating 0 2m29s
as you can see only one PXC and one proxySQL pod's started recently, others are 13h Up
Expected Result:
All pods have recent uptime
Workaround:
restart pods from kubectl
Suggested Implementation:
- When dbaas-controller gets Restart request
- Pause DB cluster
- Wait until DB cluster is paused
- Start new goroutine to Resume DB cluster
- Return response
Possible issues:
- If dbaas-controller restarts during restart DB cluster may stuck in pause state
- After pausing DB cluster it may have active status for a few seconds (PMM-7397)
Details:
PMM-6824 implemented database cluster restart with kubectl rollout restart. It is good enough for alpha, but not good enough for beta: we can lose data this way, and operators' team confirmed it is not safe.
For beta1, we should implement restart via full cluster pause/resume: https://www.percona.com/doc/kubernetes-operator-for-pxc/pause.html
The same functionality will be available for PSMDB.