Uploaded image for project: 'Percona Operator for MongoDB'
  1. Percona Operator for MongoDB
  2. K8SPSMDB-854

Backup stucks after cluster was exposed.

Details

    • Bug
    • Status: Done
    • High
    • Resolution: Fixed
    • 1.13.0
    • 1.15.0
    • None
    • None
    • Yes
    • Yes

    Description

      Backup stucks in requested state without any meaningful errors.
      STR:
      1. Run default cluster (with updateStrategy: Disabled)
      2. Update expose oprions for rs and cfg:

      enabled: false -> true
      

      In operator logs we see the error (but it doesn't look as issue and was fixed before backup started.)

      2023-02-02T13:02:01.599Z	ERROR	controller_psmdb	failed to reconcile cluster	{"Request.Namespace": "default", "Request.Name": "my-cluster-name", "replset": "cfg", "error": "fix hosts: write mongo config: replSetReconfig: connection(my-cluster-name-cfg-0.my-cluster-name-cfg.default.svc.cluster.local:27017[-109709]) incomplete read of message header: read tcp 10.4.1.9:54072->10.4.0.13:27017: use of closed network connection", "errorVerbose": "connection(my-cluster-name-cfg-0.my-cluster-name-cfg.default.svc.cluster.local:27017[-109709]) incomplete read of message header: read tcp 10.4.1.9:54072->10.4.0.13:27017: use of closed network connection\nreplSetReconfig\ngithub.com/percona/percona-server-mongodb-operator/pkg/psmdb/mongo.WriteConfig\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/psmdb/mongo/mongo.go:248\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).reconcileCluster\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/mgo.go:308\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).Reconcile\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/psmdb_controller.go:491\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1594\nfix hosts: write mongo config\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).reconcileCluster\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/mgo.go:310\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).Reconcile\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/psmdb_controller.go:491\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1594"}
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
      	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:114
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
      	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:311
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
      	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
      	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227
      2023-02-02T13:02:03.712Z	INFO	controller_psmdb	Cluster state changed	{"previous": "initializing", "current": "ready"}
      2023-02-02T13:05:26.593Z	INFO	controller_perconaservermongodbbackup	Waiting for backup metadata	{"PBM name": "2023-02-02T13:05:21Z", "backup": "cron-my-cluster-name-20230202130500-862p2"}
      

      If schedule backups are enabled, we see the error in backup-agent:

      k logs my-cluster-name-rs0-2 -c backup-agent
      2023-02-02T13:13:52.000+0000 E [agentCheckup] get replSetGetStatus: not found
      

      But at the same time we can easily get settings from mongo:

      s0:SECONDARY> db.adminCommand( { replSetGetStatus: 1 } )
      {
      	"set" : "rs0",
      	"date" : ISODate("2023-02-02T13:16:30.560Z"),
      	"myState" : 2,
      	"term" : NumberLong(1),
      	"syncSourceHost" : "10.8.14.28:27017",
      	"syncSourceId" : 0,
      	"heartbeatIntervalMillis" : NumberLong(2000),
      	"majorityVoteCount" : 2,
      	"writeMajorityCount" : 2,
      	"votingMembersCount" : 3,
      	"writableVotingMembersCount" : 3,
      	"optimes" : {
      		"lastCommittedOpTime" : {
      			"ts" : Timestamp(1675343789, 2),
      			"t" : NumberLong(1)
      		},
      		"lastCommittedWallTime" : ISODate("2023-02-02T13:16:29.850Z"),
      		"readConcernMajorityOpTime" : {
      			"ts" : Timestamp(1675343789, 2),
      			"t" : NumberLong(1)
      		},
      		"appliedOpTime" : {
      			"ts" : Timestamp(1675343789, 2),
      			"t" : NumberLong(1)
      		},
      		"durableOpTime" : {
      			"ts" : Timestamp(1675343789, 2),
      			"t" : NumberLong(1)
      		},
      		"lastAppliedWallTime" : ISODate("2023-02-02T13:16:29.850Z"),
      		"lastDurableWallTime" : ISODate("2023-02-02T13:16:29.850Z")
      	},
      	"lastStableRecoveryTimestamp" : Timestamp(1675343737, 1),
      	"members" : [
      		{
      			"_id" : 0,
      			"name" : "10.8.14.28:27017",
      			"health" : 1,
      			"state" : 1,
      			"stateStr" : "PRIMARY",
      			"uptime" : 871,
      			"optime" : {
      				"ts" : Timestamp(1675343789, 2),
      				"t" : NumberLong(1)
      			},
      			"optimeDurable" : {
      				"ts" : Timestamp(1675343789, 2),
      				"t" : NumberLong(1)
      			},
      			"optimeDate" : ISODate("2023-02-02T13:16:29Z"),
      			"optimeDurableDate" : ISODate("2023-02-02T13:16:29Z"),
      			"lastAppliedWallTime" : ISODate("2023-02-02T13:16:29.850Z"),
      			"lastDurableWallTime" : ISODate("2023-02-02T13:16:29.850Z"),
      			"lastHeartbeat" : ISODate("2023-02-02T13:16:30.114Z"),
      			"lastHeartbeatRecv" : ISODate("2023-02-02T13:16:29.383Z"),
      			"pingMs" : NumberLong(0),
      			"lastHeartbeatMessage" : "",
      			"syncSourceHost" : "",
      			"syncSourceId" : -1,
      			"infoMessage" : "",
      			"electionTime" : Timestamp(1675333495, 2),
      			"electionDate" : ISODate("2023-02-02T10:24:55Z"),
      			"configVersion" : 8,
      			"configTerm" : 1
      		},
      		{
      			"_id" : 1,
      			"name" : "10.8.8.136:27017",
      			"health" : 1,
      			"state" : 2,
      			"stateStr" : "SECONDARY",
      			"uptime" : 871,
      			"optime" : {
      				"ts" : Timestamp(1675343789, 2),
      				"t" : NumberLong(1)
      			},
      			"optimeDurable" : {
      				"ts" : Timestamp(1675343789, 2),
      				"t" : NumberLong(1)
      			},
      			"optimeDate" : ISODate("2023-02-02T13:16:29Z"),
      			"optimeDurableDate" : ISODate("2023-02-02T13:16:29Z"),
      			"lastAppliedWallTime" : ISODate("2023-02-02T13:16:29.850Z"),
      			"lastDurableWallTime" : ISODate("2023-02-02T13:16:29.850Z"),
      			"lastHeartbeat" : ISODate("2023-02-02T13:16:29.888Z"),
      			"lastHeartbeatRecv" : ISODate("2023-02-02T13:16:29.558Z"),
      			"pingMs" : NumberLong(8),
      			"lastHeartbeatMessage" : "",
      			"syncSourceHost" : "10.8.11.221:27017",
      			"syncSourceId" : 2,
      			"infoMessage" : "",
      			"configVersion" : 8,
      			"configTerm" : 1
      		},
      		{
      			"_id" : 2,
      			"name" : "10.8.11.221:27017",
      			"health" : 1,
      			"state" : 2,
      			"stateStr" : "SECONDARY",
      			"uptime" : 10339,
      			"optime" : {
      				"ts" : Timestamp(1675343789, 2),
      				"t" : NumberLong(1)
      			},
      			"optimeDate" : ISODate("2023-02-02T13:16:29Z"),
      			"lastAppliedWallTime" : ISODate("2023-02-02T13:16:29.850Z"),
      			"lastDurableWallTime" : ISODate("2023-02-02T13:16:29.850Z"),
      			"syncSourceHost" : "10.8.14.28:27017",
      			"syncSourceId" : 0,
      			"infoMessage" : "",
      			"configVersion" : 8,
      			"configTerm" : 1,
      			"self" : true,
      			"lastHeartbeatMessage" : ""
      		}
      	],
      	"ok" : 1,
      	"$gleStats" : {
      		"lastOpTime" : Timestamp(0, 0),
      		"electionId" : ObjectId("000000000000000000000000")
      	},
      	"lastCommittedOpTime" : Timestamp(1675343789, 2),
      	"$configServerState" : {
      		"opTime" : {
      			"ts" : Timestamp(1675343788, 4),
      			"t" : NumberLong(-1)
      		}
      	},
      	"$clusterTime" : {
      		"clusterTime" : Timestamp(1675343789, 7),
      		"signature" : {
      			"hash" : BinData(0,"gA3VMnxb8CI6tr2U67Sr9QVBqIQ="),
      			"keyId" : NumberLong("7195502652522758164")
      		}
      	},
      	"operationTime" : Timestamp(1675343789, 2)
      }
      

      If schedule backups are disabled the error about

       get replSetGetStatus: not found 

      disappeared.

      k logs my-cluster-name-rs0-2 -c backup-agent
      **********************************************
      
      2023-02-02T13:47:00.000+0000 I starting PITR routine
      2023-02-02T13:47:00.000+0000 I node: rs0/10.8.8.136:27017
      2023-02-02T13:47:00.000+0000 I listening for the commands
      2023-02-02T13:47:05.000+0000 W [agentCheckup] get current storage status: query mongo: mongo: no documents in result
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            natalia.marukovich Natalia Marukovich
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Smart Checklist