Uploaded image for project: 'Percona Backup for MongoDB'
  1. Percona Backup for MongoDB
  2. PBM-700

Improve backup/pitr tasks synchronization and align oplogs creation

Details

    • Improvement
    • Status: Done
    • Medium
    • Resolution: Fixed
    • None
    • 1.6.0
    • None
    • None

    Description

      Hi,

      There is no hard STR, however from time to time PITR doesn't restart automatically after backup.

      STR:
      1) perform backup

      pbm logs -e "backup/2021-08-04T11:35:56Z" -t 0 -n rs1 -s D
      2021-08-04T11:36:26Z D [rs1/localhost:27020] [backup/2021-08-04T11:35:56Z] releasing lock
      2021-08-04T11:36:26Z I [rs1/localhost:27020] [backup/2021-08-04T11:35:56Z] backup finished
      2021-08-04T11:36:26Z I [rs1/localhost:27020] [backup/2021-08-04T11:35:56Z] dropping tmp collections
      2021-08-04T11:36:21Z I [rs1/localhost:27020] [backup/2021-08-04T11:35:56Z] s3.uploadPartSize is set to 10485760 (~10Mb)
      2021-08-04T11:36:21Z D [rs1/localhost:27020] [backup/2021-08-04T11:35:56Z] set oplog span to {1628076973 1} / {1628076978 15}
      2021-08-04T11:36:18Z I [rs1/localhost:27020] [backup/2021-08-04T11:35:56Z] mongodump finished, waiting for the oplog
      2021-08-04T11:36:17Z I [rs1/localhost:27020] [backup/2021-08-04T11:35:56Z] s3.uploadPartSize is set to 10485760 (~10Mb)
      2021-08-04T11:36:16Z D [rs1/localhost:27020] [backup/2021-08-04T11:35:56Z] wait for tmp users {1628076976 7}
      2021-08-04T11:36:13Z D [rs1/localhost:27019] [backup/2021-08-04T11:35:56Z] skip after nomination, probably started by another node
      2021-08-04T11:36:13Z D [rs1/localhost:27018] [backup/2021-08-04T11:35:56Z] skip after nomination, probably started by another node
      2021-08-04T11:36:13Z I [rs1/localhost:27020] [backup/2021-08-04T11:35:56Z] backup started
      

      2) start PITR

      2021-08-04T11:36:56Z I [rs1/localhost:27020] [pitr] streaming started from 2021-08-04 11:36:18 +0000 UTC / 1628076978
      2021-08-04T11:36:56Z D [rs1/localhost:27020] [pitr] start_ok
      2021-08-04T11:36:56Z D [rs1/localhost:27020] [pitr] lastTS set to {1628076978 15} 2021-08-04T11:36:18
      2021-08-04T11:36:56Z I [rs1/localhost:27020] [pitr] restore after 2021-08-04T07:54:49Z the chunk &{rs1 pbmPitr/rs1/20210804/20210804075637-19.20210804080421-1.oplog.snappy s2 {%!s(uint32=1628063797) %!s(uint32=19)} {%!s(uint32=1628064261) %!s(uint32=1)} %!s(int64=0)}, skip
      

      3) perform backup

      pbm logs -e "backup/2021-08-04T11:37:14Z" -t 0 -n rs1 -s D
      2021-08-04T11:37:42Z D [rs1/localhost:27020] [backup/2021-08-04T11:37:14Z] releasing lock
      2021-08-04T11:37:42Z I [rs1/localhost:27020] [backup/2021-08-04T11:37:14Z] backup finished
      2021-08-04T11:37:42Z I [rs1/localhost:27020] [backup/2021-08-04T11:37:14Z] dropping tmp collections
      2021-08-04T11:37:39Z I [rs1/localhost:27020] [backup/2021-08-04T11:37:14Z] s3.uploadPartSize is set to 10485760 (~10Mb)
      2021-08-04T11:37:39Z D [rs1/localhost:27020] [backup/2021-08-04T11:37:14Z] set oplog span to {1628077053 1} / {1628077056 7}
      2021-08-04T11:37:35Z I [rs1/localhost:27020] [backup/2021-08-04T11:37:14Z] mongodump finished, waiting for the oplog
      2021-08-04T11:37:35Z I [rs1/localhost:27020] [backup/2021-08-04T11:37:14Z] s3.uploadPartSize is set to 10485760 (~10Mb)
      2021-08-04T11:37:35Z D [rs1/localhost:27020] [backup/2021-08-04T11:37:14Z] wait for tmp users {1628077055 4}
      2021-08-04T11:37:32Z D [rs1/localhost:27019] [backup/2021-08-04T11:37:14Z] skip after nomination, probably started by another node
      2021-08-04T11:37:31Z D [rs1/localhost:27018] [backup/2021-08-04T11:37:14Z] skip after nomination, probably started by another node
      2021-08-04T11:37:31Z I [rs1/localhost:27020] [backup/2021-08-04T11:37:14Z] backup started
      

      PITR temporary stopped for the backup, but never restarts again:

      2021-08-04T11:37:35Z I [rs1/localhost:27020] [pitr] pausing/stopping with last_ts 2021-08-04 11:37:33 +0000 UTC
      2021-08-04T11:37:35Z I [rs1/localhost:27020] [pitr] created chunk 2021-08-04T11:36:18 - 2021-08-04T11:37:33
      2021-08-04T11:37:35Z I [rs1/localhost:27020] [pitr] s3.uploadPartSize is set to 10485760 (~10Mb)
      2021-08-04T11:37:31Z I [rs1/localhost:27020] [pitr] got wake_up signal
      

      Current list of backups and PITR chunks:

      pbm list --full
      Backup snapshots:
        2021-08-04T07:54:49Z [complete: 2021-08-04T07:55:11]
        2021-08-04T07:56:15Z [complete: 2021-08-04T07:56:37]
        2021-08-04T11:35:56Z [complete: 2021-08-04T11:36:18]
        2021-08-04T11:37:14Z [complete: 2021-08-04T11:37:36]
        2021-08-04T13:54:25Z [complete: 2021-08-04T13:54:47]
        2021-08-04T13:56:03Z [complete: 2021-08-04T13:56:25]
      
      PITR <on>:
        2021-08-04T07:55:11 - 2021-08-04T08:04:21
        2021-08-04T11:36:18 - 2021-08-04T11:37:33
        2021-08-04T13:54:47 - 2021-08-04T14:06:32
      
        config: [{ 2021-08-04T07:55:11 - 2021-08-04T08:04:26} { 2021-08-04T11:36:18 - 2021-08-04T11:38:00} { 2021-08-04T13:54:47 - 2021-08-04T14:06:32}]
        rs1: [{ 2021-08-04T07:55:11 - 2021-08-04T08:04:21} { 2021-08-04T11:36:18 - 2021-08-04T11:37:33} { 2021-08-04T13:54:47 - 2021-08-04T14:06:33}]
        rs2: [{ 2021-08-04T07:55:11 - 2021-08-04T08:04:25} { 2021-08-04T11:36:18 - 2021-08-04T11:37:57} { 2021-08-04T13:54:47 - 2021-08-04T14:06:32}]
      

      On rs2 PITR was successfully restarted:

      2021-08-04T11:37:42Z I [rs2/localhost:28018] [pitr] streaming started from 2021-08-04 11:37:36 +0000 UTC / 1628077056
      2021-08-04T11:37:42Z D [rs2/localhost:28018] [pitr] start_ok
      2021-08-04T11:37:42Z D [rs2/localhost:28018] [pitr] lastTS set to {1628077056 7} 2021-08-04T11:37:36
      2021-08-04T11:37:42Z I [rs2/localhost:28018] [pitr] copied chunk 2021-08-04T11:37:33 - 2021-08-04T11:37:36
      2021-08-04T11:37:35Z I [rs2/localhost:28018] [pitr] pausing/stopping with last_ts 2021-08-04 11:37:34 +0000 UTC
      2021-08-04T11:37:35Z I [rs2/localhost:28018] [pitr] created chunk 2021-08-04T11:36:18 - 2021-08-04T11:37:34
      2021-08-04T11:37:35Z I [rs2/localhost:28018] [pitr] s3.uploadPartSize is set to 10485760 (~10Mb)
      2021-08-04T11:37:31Z I [rs2/localhost:28018] [pitr] got wake_up signal
      

      Attachments

        Activity

          People

            andrew.pogrebnoi Andrew Pogrebnoi
            sandra.romanchenko Sandra Romanchenko
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Smart Checklist