Details
-
Improvement
-
Status: Done
-
Medium
-
Resolution: Fixed
-
1.11.0
-
None
-
Yes
-
Yes
-
Yes
Description
1. pause
Currently when we set "pause: true" for a cluster, this is what we see in operator logs:
{"level":"info","ts":1647445131.6547358,"logger":"controller_psmdb","msg":"initiating replset","replset":"rs0","pod":"my-cluster-name-rs0-1"} {"level":"info","ts":1647445138.8284142,"logger":"controller_psmdb","msg":"replset was initialized","replset":"rs0","pod":"my-cluster-name-rs0-1"} {"level":"info","ts":1647445186.9312146,"logger":"controller_psmdb","msg":"update Mongo version to 5.0.6-5 (fetched from db)"} {"level":"info","ts":1647445851.3521862,"logger":"controller_psmdb","msg":"StatefulSet is changed, starting smart update","name":"my-cluster-name-rs0"} {"level":"info","ts":1647445851.41177,"logger":"controller_psmdb","msg":"primary pod is my-cluster-name-rs0-1.my-cluster-name-rs0.psmdb-test.svc.cluster.local:27017"} {"level":"info","ts":1647445851.4118285,"logger":"controller_psmdb","msg":"apply changes to secondary pod","pod":"my-cluster-name-rs0-2"} {"level":"error","ts":1647445912.7067978,"logger":"controller.psmdb-controller","msg":"Reconciler error","name":"my-cluster-name","namespace":"psmdb-test","error":"reconcile StatefulSet for rs0: failed to run smartUpdate: failed to apply changes: wait pod restart: reach pod wait limit","errorVerbose":"reconcile StatefulSet for rs0: failed to run smartUpdate: failed to apply changes: wait pod restart: reach pod wait limit\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).Reconcile\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/psmdb_controller.go:371\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/src/github.com/percona/percona-server-mongodb-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/src/github.com/percona/percona-server-mongodb-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/percona/percona-server-mongodb-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/src/github.com/percona/percona-server-mongodb-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1581","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/percona/percona-server-mongodb-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/src/github.com/percona/percona-server-mongodb-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227 "}
So it just mentions that stateful set has changed, but to know what exactly happened I need to check the diff to see that "replicas: 3" in sts changed to "replicas: 0".
Also there is some error visible every time which is not nice "reconcile StatefulSet for rs0: failed to run smartUpdate: failed to apply changes: wait pod restart: reach pod wait limit".
2. unpause
When we unpause the cluster there is no any message at all in the operator logs. I guess for debugging purposes it would be good to at least write a line that we will now start to unpause some cluster.
3. This is how it looks in sharded cluster:
{"level":"info","ts":1647450208.4828126,"logger":"controller.psmdb-controller","msg":"Starting workers","worker count":1} {"level":"info","ts":1647450208.5847135,"logger":"controller.perconaservermongodbrestore-controller","msg":"Starting workers","worker count":1} {"level":"info","ts":1647450208.5847921,"logger":"controller.perconaservermongodbbackup-controller","msg":"Starting workers","worker count":1} {"level":"info","ts":1647450208.7911856,"logger":"KubeAPIWarningLogger","msg":"batch/v1beta1 CronJob is deprecated in v1.21+, unavailable in v1.25+; use batch/v1 CronJob"} {"level":"info","ts":1647450208.897378,"logger":"KubeAPIWarningLogger","msg":"policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget"} # # startup # {"level":"info","ts":1647450219.105958,"logger":"controller_psmdb","msg":"initiating replset","replset":"cfg","pod":"my-cluster-name-cfg-0"} {"level":"info","ts":1647450225.1640992,"logger":"controller_psmdb","msg":"replset was initialized","replset":"cfg","pod":"my-cluster-name-cfg-0"} {"level":"info","ts":1647450235.7624125,"logger":"controller_psmdb","msg":"initiating replset","replset":"rs0","pod":"my-cluster-name-rs0-0"} {"level":"info","ts":1647450241.4913213,"logger":"controller_psmdb","msg":"replset was initialized","replset":"rs0","pod":"my-cluster-name-rs0-0"} {"level":"info","ts":1647450251.741539,"logger":"controller_psmdb","msg":"initiating replset","replset":"rs1","pod":"my-cluster-name-rs1-0"} {"level":"info","ts":1647450257.3567595,"logger":"controller_psmdb","msg":"replset was initialized","replset":"rs1","pod":"my-cluster-name-rs1-0"} {"level":"info","ts":1647450278.3918908,"logger":"controller_psmdb","msg":"update Mongo version to 5.0.6-5 (fetched from db)"} {"level":"info","ts":1647450278.4973407,"logger":"controller_psmdb","msg":"adding rs to shard","rs":"rs0"} {"level":"info","ts":1647450280.8158197,"logger":"controller_psmdb","msg":"added to shard","rs":"rs0"} {"level":"info","ts":1647450280.9134772,"logger":"controller_psmdb","msg":"adding rs to shard","rs":"rs1"} {"level":"info","ts":1647450283.2842658,"logger":"controller_psmdb","msg":"added to shard","rs":"rs1"} # # pause # {"level":"info","ts":1647450479.270014,"logger":"controller_psmdb","msg":"waiting for config RS update"} {"level":"info","ts":1647450479.365072,"logger":"controller_psmdb","msg":"waiting for config RS update"} {"level":"info","ts":1647450546.8712163,"logger":"controller_psmdb","msg":"waiting for mongos update"} # # unpause #
So it's even more confusing because pause part has only "waiting for..." messages and unpause has no info at all.
Attachments
Issue Links
- mentioned in
-
Page Loading...