Uploaded image for project: 'Cloud Dev'
  1. Cloud Dev
  2. CLOUD-96

[PSMDB] Watchdog crash after re-startup of CR w/old PVC data

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Pending Release
    • Priority: Medium
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: PSMDB 0.2.0
    • Component/s: recreate
    • Labels:
      None

      Description

      Occasionally the operator is crashing after a re-create of a CR that has old PVC data (that causes replset init to be skipped):

      time="2019-01-17T18:47:20Z" level=info msg="changed state to initialised for replset rs0"
      time="2019-01-17T18:47:21Z" level=info msg="Starting watchdog" go=go1.11.4 service=my-cluster-name source=k8s version=0.4.1
      time="2019-01-17T18:47:21Z" level=info msg="Getting pods from source" source=k8s url="tcp://172.30.0.1:443"
      time="2019-01-17T18:47:21Z" level=info msg="Getting tasks for pod" pod=my-cluster-name-rs0-1
      time="2019-01-17T18:47:21Z" level=info msg="Starting replset watcher" replset=rs0
      time="2019-01-17T18:47:21Z" level=info msg="Getting tasks for pod" pod=my-cluster-name-rs0-0
      time="2019-01-17T18:47:21Z" level=info msg="Adding new mongod task" host="my-cluster-name-rs0-0.my-cluster-name-rs0.tim-psmdb.svc.cluster.local:27017" name=my-cluster-name-rs0-0 replset=rs0 state=RUNNING
      time="2019-01-17T18:47:26Z" level=info msg="Getting pods from source" source=k8s url="tcp://172.30.0.1:443"
      time="2019-01-17T18:47:26Z" level=info msg="Getting tasks for pod" pod=my-cluster-name-rs0-1
      time="2019-01-17T18:47:26Z" level=info msg="Getting tasks for pod" pod=my-cluster-name-rs0-0
      time="2019-01-17T18:47:26Z" level=info msg="Updating running mongod task" host="my-cluster-name-rs0-0.my-cluster-name-rs0.tim-psmdb.svc.cluster.local:27017" name=my-cluster-name-rs0-0 replset=rs0 state=RUNNING
      time="2019-01-17T18:47:26Z" level=error msg="Error connecting to mongodb replset: no reachable servers" addrs="[my-cluster-name-rs0-0.my-cluster-name-rs0.tim-psmdb.svc.cluster.local:27017]" replset=rs0 ssl=false
      time="2019-01-17T18:47:26Z" level=info msg="Watching replset" interval=5s replset=rs0
      time="2019-01-17T18:47:31Z" level=info msg="Getting pods from source" source=k8s url="tcp://172.30.0.1:443"
      time="2019-01-17T18:47:31Z" level=info msg="Getting tasks for pod" pod=my-cluster-name-rs0-1
      time="2019-01-17T18:47:31Z" level=info msg="Getting tasks for pod" pod=my-cluster-name-rs0-0
      time="2019-01-17T18:47:31Z" level=info msg="Updating running mongod task" host="my-cluster-name-rs0-0.my-cluster-name-rs0.tim-psmdb.svc.cluster.local:27017" name=my-cluster-name-rs0-0 replset=rs0 state=RUNNING
      time="2019-01-17T18:47:36Z" level=info msg="Getting pods from source" source=k8s url="tcp://172.30.0.1:443"
      time="2019-01-17T18:47:36Z" level=info msg="Getting tasks for pod" pod=my-cluster-name-rs0-1
      time="2019-01-17T18:47:36Z" level=info msg="Getting tasks for pod" pod=my-cluster-name-rs0-0
      time="2019-01-17T18:47:36Z" level=info msg="Updating running mongod task" host="my-cluster-name-rs0-0.my-cluster-name-rs0.tim-psmdb.svc.cluster.local:27017" name=my-cluster-name-rs0-0 replset=rs0 state=RUNNING
      time="2019-01-17T18:47:36Z" level=error msg="Error connecting to mongodb replset: no reachable servers" addrs="[my-cluster-name-rs0-0.my-cluster-name-rs0.tim-psmdb.svc.cluster.local:27017]" replset=rs0 ssl=false
      time="2019-01-17T18:47:41Z" level=info msg="Getting pods from source" source=k8s url="tcp://172.30.0.1:443"
      time="2019-01-17T18:47:41Z" level=info msg="Getting tasks for pod" pod=my-cluster-name-rs0-1
      time="2019-01-17T18:47:41Z" level=info msg="Getting tasks for pod" pod=my-cluster-name-rs0-0
      time="2019-01-17T18:47:41Z" level=info msg="Updating running mongod task" host="my-cluster-name-rs0-0.my-cluster-name-rs0.tim-psmdb.svc.cluster.local:27017" name=my-cluster-name-rs0-0 replset=rs0 state=RUNNING
      time="2019-01-17T18:47:41Z" level=error msg="Error connecting to mongodb replset: no reachable servers" addrs="[my-cluster-name-rs0-0.my-cluster-name-rs0.tim-psmdb.svc.cluster.local:27017]" replset=rs0 ssl=false
      time="2019-01-17T18:47:46Z" level=info msg="Getting pods from source" source=k8s url="tcp://172.30.0.1:443"
      time="2019-01-17T18:47:46Z" level=info msg="Getting tasks for pod" pod=my-cluster-name-rs0-1
      time="2019-01-17T18:47:46Z" level=info msg="Getting tasks for pod" pod=my-cluster-name-rs0-0
      time="2019-01-17T18:47:46Z" level=info msg="Updating running mongod task" host="my-cluster-name-rs0-0.my-cluster-name-rs0.tim-psmdb.svc.cluster.local:27017" name=my-cluster-name-rs0-0 replset=rs0 state=RUNNING
      time="2019-01-17T18:47:46Z" level=error msg="Error connecting to mongodb replset: no reachable servers" addrs="[my-cluster-name-rs0-0.my-cluster-name-rs0.tim-psmdb.svc.cluster.local:27017]" replset=rs0 ssl=false
      time="2019-01-17T18:47:51Z" level=info msg="Getting pods from source" source=k8s url="tcp://172.30.0.1:443"
      time="2019-01-17T18:47:51Z" level=info msg="Getting tasks for pod" pod=my-cluster-name-rs0-1
      time="2019-01-17T18:47:51Z" level=info msg="Adding new mongod task" host="my-cluster-name-rs0-1.my-cluster-name-rs0.tim-psmdb.svc.cluster.local:27017" name=my-cluster-name-rs0-1 replset=rs0 state=RUNNING
      time="2019-01-17T18:47:51Z" level=info msg="Getting tasks for pod" pod=my-cluster-name-rs0-0
      time="2019-01-17T18:47:51Z" level=info msg="Updating running mongod task" host="my-cluster-name-rs0-0.my-cluster-name-rs0.tim-psmdb.svc.cluster.local:27017" name=my-cluster-name-rs0-0 replset=rs0 state=RUNNING
      time="2019-01-17T18:47:51Z" level=error msg="Error connecting to mongodb replset: no reachable servers" addrs="[my-cluster-name-rs0-0.my-cluster-name-rs0.tim-psmdb.svc.cluster.local:27017 my-cluster-name-rs0-1.my-cluster-name-rs0.tim-psmdb.svc.cluster.local:27017]" replset=rs0 ssl=false
      time="2019-01-17T18:47:56Z" level=info msg="Getting pods from source" source=k8s url="tcp://172.30.0.1:443"
      time="2019-01-17T18:47:56Z" level=info msg="Getting tasks for pod" pod=my-cluster-name-rs0-1
      time="2019-01-17T18:47:56Z" level=info msg="Updating running mongod task" host="my-cluster-name-rs0-1.my-cluster-name-rs0.tim-psmdb.svc.cluster.local:27017" name=my-cluster-name-rs0-1 replset=rs0 state=RUNNING
      time="2019-01-17T18:47:56Z" level=info msg="Getting tasks for pod" pod=my-cluster-name-rs0-0
      time="2019-01-17T18:47:56Z" level=info msg="Updating running mongod task" host="my-cluster-name-rs0-0.my-cluster-name-rs0.tim-psmdb.svc.cluster.local:27017" name=my-cluster-name-rs0-0 replset=rs0 state=RUNNING
      time="2019-01-17T18:47:56Z" level=error msg="Error connecting to mongodb replset: no reachable servers" addrs="[my-cluster-name-rs0-0.my-cluster-name-rs0.tim-psmdb.svc.cluster.local:27017 my-cluster-name-rs0-1.my-cluster-name-rs0.tim-psmdb.svc.cluster.local:27017]" replset=rs0 ssl=false
      time="2019-01-17T18:48:01Z" level=info msg="Getting pods from source" source=k8s url="tcp://172.30.0.1:443"
      time="2019-01-17T18:48:01Z" level=info msg="Getting tasks for pod" pod=my-cluster-name-rs0-2
      time="2019-01-17T18:48:01Z" level=info msg="Getting tasks for pod" pod=my-cluster-name-rs0-0
      time="2019-01-17T18:48:01Z" level=info msg="Updating running mongod task" host="my-cluster-name-rs0-0.my-cluster-name-rs0.tim-psmdb.svc.cluster.local:27017" name=my-cluster-name-rs0-0 replset=rs0 state=RUNNING
      time="2019-01-17T18:48:01Z" level=info msg="Getting tasks for pod" pod=my-cluster-name-rs0-1
      time="2019-01-17T18:48:01Z" level=info msg="Updating running mongod task" host="my-cluster-name-rs0-1.my-cluster-name-rs0.tim-psmdb.svc.cluster.local:27017" name=my-cluster-name-rs0-1 replset=rs0 state=RUNNING
      time="2019-01-17T18:48:01Z" level=error msg="Error connecting to mongodb replset: no reachable servers" addrs="[my-cluster-name-rs0-0.my-cluster-name-rs0.tim-psmdb.svc.cluster.local:27017 my-cluster-name-rs0-1.my-cluster-name-rs0.tim-psmdb.svc.cluster.local:27017]" replset=rs0 ssl=false
      time="2019-01-17T18:48:06Z" level=info msg="Getting pods from source" source=k8s url="tcp://172.30.0.1:443"
      time="2019-01-17T18:48:06Z" level=info msg="Getting tasks for pod" pod=my-cluster-name-rs0-2
      time="2019-01-17T18:48:06Z" level=info msg="Getting tasks for pod" pod=my-cluster-name-rs0-0
      time="2019-01-17T18:48:06Z" level=info msg="Updating running mongod task" host="my-cluster-name-rs0-0.my-cluster-name-rs0.tim-psmdb.svc.cluster.local:27017" name=my-cluster-name-rs0-0 replset=rs0 state=RUNNING
      time="2019-01-17T18:48:06Z" level=info msg="Getting tasks for pod" pod=my-cluster-name-rs0-1
      time="2019-01-17T18:48:06Z" level=info msg="Updating running mongod task" host="my-cluster-name-rs0-1.my-cluster-name-rs0.tim-psmdb.svc.cluster.local:27017" name=my-cluster-name-rs0-1 replset=rs0 state=RUNNING
      time="2019-01-17T18:48:06Z" level=info msg="Updating replset config and status" replset=rs0
      panic: runtime error: invalid memory address or nil pointer dereference
      [signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0x10ae5ef]
      
      
      goroutine 674 [running]:
      github.com/Percona-Lab/percona-server-mongodb-operator/vendor/github.com/percona/mongodb-orchestration-tools/watchdog/watcher.(*Watcher).getScaledDownMembers(0xc0002ddea0, 0x20ec8b8, 0x0, 0x0)
      	/go/src/github.com/Percona-Lab/percona-server-mongodb-operator/vendor/github.com/percona/mongodb-orchestration-tools/watchdog/watcher/watcher.go:192 +0x16f
      github.com/Percona-Lab/percona-server-mongodb-operator/vendor/github.com/percona/mongodb-orchestration-tools/watchdog/watcher.(*Watcher).Run(0xc0002ddea0)
      	/go/src/github.com/Percona-Lab/percona-server-mongodb-operator/vendor/github.com/percona/mongodb-orchestration-tools/watchdog/watcher/watcher.go:358 +0x4f8
      created by github.com/Percona-Lab/percona-server-mongodb-operator/vendor/github.com/percona/mongodb-orchestration-tools/watchdog/watcher.(*WatcherManager).Watch
      	/go/src/github.com/Percona-Lab/percona-server-mongodb-operator/vendor/github.com/percona/mongodb-orchestration-tools/watchdog/watcher/manager.go:74 +0x401
      [tim@centos7 percona-server-mongodb-operator]$  

        Smart Checklist

          Attachments

            Activity

              People

              • Assignee:
                andrew.pogrebnoi Andrii Pogrebnoi
                Reporter:
                tim.vaillancourt Tim Vaillancourt
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0 minutes
                  0m
                  Logged:
                  Time Spent - 1 day, 2 hours, 30 minutes
                  1d 2h 30m