Details
-
Bug
-
Status: Open
-
High
-
Resolution: Fixed
-
1.13.0
-
None
-
None
-
Yes
-
Yes
Description
Backup stucks in requested state without any meaningful errors.
STR:
1. Run default cluster (with updateStrategy: Disabled)
2. Update expose oprions for rs and cfg:
enabled: false -> true
In operator logs we see the error (but it doesn't look as issue and was fixed before backup started.)
2023-02-02T13:02:01.599Z ERROR controller_psmdb failed to reconcile cluster {"Request.Namespace": "default", "Request.Name": "my-cluster-name", "replset": "cfg", "error": "fix hosts: write mongo config: replSetReconfig: connection(my-cluster-name-cfg-0.my-cluster-name-cfg.default.svc.cluster.local:27017[-109709]) incomplete read of message header: read tcp 10.4.1.9:54072->10.4.0.13:27017: use of closed network connection", "errorVerbose": "connection(my-cluster-name-cfg-0.my-cluster-name-cfg.default.svc.cluster.local:27017[-109709]) incomplete read of message header: read tcp 10.4.1.9:54072->10.4.0.13:27017: use of closed network connection\nreplSetReconfig\ngithub.com/percona/percona-server-mongodb-operator/pkg/psmdb/mongo.WriteConfig\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/psmdb/mongo/mongo.go:248\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).reconcileCluster\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/mgo.go:308\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).Reconcile\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/psmdb_controller.go:491\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1594\nfix hosts: write mongo config\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).reconcileCluster\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/mgo.go:310\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).Reconcile\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/psmdb_controller.go:491\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1594"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:114 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:311 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227 2023-02-02T13:02:03.712Z INFO controller_psmdb Cluster state changed {"previous": "initializing", "current": "ready"} 2023-02-02T13:05:26.593Z INFO controller_perconaservermongodbbackup Waiting for backup metadata {"PBM name": "2023-02-02T13:05:21Z", "backup": "cron-my-cluster-name-20230202130500-862p2"}
If schedule backups are enabled, we see the error in backup-agent:
k logs my-cluster-name-rs0-2 -c backup-agent 2023-02-02T13:13:52.000+0000 E [agentCheckup] get replSetGetStatus: not found
But at the same time we can easily get settings from mongo:
s0:SECONDARY> db.adminCommand( { replSetGetStatus: 1 } ) { "set" : "rs0", "date" : ISODate("2023-02-02T13:16:30.560Z"), "myState" : 2, "term" : NumberLong(1), "syncSourceHost" : "10.8.14.28:27017", "syncSourceId" : 0, "heartbeatIntervalMillis" : NumberLong(2000), "majorityVoteCount" : 2, "writeMajorityCount" : 2, "votingMembersCount" : 3, "writableVotingMembersCount" : 3, "optimes" : { "lastCommittedOpTime" : { "ts" : Timestamp(1675343789, 2), "t" : NumberLong(1) }, "lastCommittedWallTime" : ISODate("2023-02-02T13:16:29.850Z"), "readConcernMajorityOpTime" : { "ts" : Timestamp(1675343789, 2), "t" : NumberLong(1) }, "appliedOpTime" : { "ts" : Timestamp(1675343789, 2), "t" : NumberLong(1) }, "durableOpTime" : { "ts" : Timestamp(1675343789, 2), "t" : NumberLong(1) }, "lastAppliedWallTime" : ISODate("2023-02-02T13:16:29.850Z"), "lastDurableWallTime" : ISODate("2023-02-02T13:16:29.850Z") }, "lastStableRecoveryTimestamp" : Timestamp(1675343737, 1), "members" : [ { "_id" : 0, "name" : "10.8.14.28:27017", "health" : 1, "state" : 1, "stateStr" : "PRIMARY", "uptime" : 871, "optime" : { "ts" : Timestamp(1675343789, 2), "t" : NumberLong(1) }, "optimeDurable" : { "ts" : Timestamp(1675343789, 2), "t" : NumberLong(1) }, "optimeDate" : ISODate("2023-02-02T13:16:29Z"), "optimeDurableDate" : ISODate("2023-02-02T13:16:29Z"), "lastAppliedWallTime" : ISODate("2023-02-02T13:16:29.850Z"), "lastDurableWallTime" : ISODate("2023-02-02T13:16:29.850Z"), "lastHeartbeat" : ISODate("2023-02-02T13:16:30.114Z"), "lastHeartbeatRecv" : ISODate("2023-02-02T13:16:29.383Z"), "pingMs" : NumberLong(0), "lastHeartbeatMessage" : "", "syncSourceHost" : "", "syncSourceId" : -1, "infoMessage" : "", "electionTime" : Timestamp(1675333495, 2), "electionDate" : ISODate("2023-02-02T10:24:55Z"), "configVersion" : 8, "configTerm" : 1 }, { "_id" : 1, "name" : "10.8.8.136:27017", "health" : 1, "state" : 2, "stateStr" : "SECONDARY", "uptime" : 871, "optime" : { "ts" : Timestamp(1675343789, 2), "t" : NumberLong(1) }, "optimeDurable" : { "ts" : Timestamp(1675343789, 2), "t" : NumberLong(1) }, "optimeDate" : ISODate("2023-02-02T13:16:29Z"), "optimeDurableDate" : ISODate("2023-02-02T13:16:29Z"), "lastAppliedWallTime" : ISODate("2023-02-02T13:16:29.850Z"), "lastDurableWallTime" : ISODate("2023-02-02T13:16:29.850Z"), "lastHeartbeat" : ISODate("2023-02-02T13:16:29.888Z"), "lastHeartbeatRecv" : ISODate("2023-02-02T13:16:29.558Z"), "pingMs" : NumberLong(8), "lastHeartbeatMessage" : "", "syncSourceHost" : "10.8.11.221:27017", "syncSourceId" : 2, "infoMessage" : "", "configVersion" : 8, "configTerm" : 1 }, { "_id" : 2, "name" : "10.8.11.221:27017", "health" : 1, "state" : 2, "stateStr" : "SECONDARY", "uptime" : 10339, "optime" : { "ts" : Timestamp(1675343789, 2), "t" : NumberLong(1) }, "optimeDate" : ISODate("2023-02-02T13:16:29Z"), "lastAppliedWallTime" : ISODate("2023-02-02T13:16:29.850Z"), "lastDurableWallTime" : ISODate("2023-02-02T13:16:29.850Z"), "syncSourceHost" : "10.8.14.28:27017", "syncSourceId" : 0, "infoMessage" : "", "configVersion" : 8, "configTerm" : 1, "self" : true, "lastHeartbeatMessage" : "" } ], "ok" : 1, "$gleStats" : { "lastOpTime" : Timestamp(0, 0), "electionId" : ObjectId("000000000000000000000000") }, "lastCommittedOpTime" : Timestamp(1675343789, 2), "$configServerState" : { "opTime" : { "ts" : Timestamp(1675343788, 4), "t" : NumberLong(-1) } }, "$clusterTime" : { "clusterTime" : Timestamp(1675343789, 7), "signature" : { "hash" : BinData(0,"gA3VMnxb8CI6tr2U67Sr9QVBqIQ="), "keyId" : NumberLong("7195502652522758164") } }, "operationTime" : Timestamp(1675343789, 2) }
If schedule backups are disabled the error about
get replSetGetStatus: not found
disappeared.
k logs my-cluster-name-rs0-2 -c backup-agent
**********************************************
2023-02-02T13:47:00.000+0000 I starting PITR routine
2023-02-02T13:47:00.000+0000 I node: rs0/10.8.8.136:27017
2023-02-02T13:47:00.000+0000 I listening for the commands
2023-02-02T13:47:05.000+0000 W [agentCheckup] get current storage status: query mongo: mongo: no documents in result