PBM restore takes long time

Description

Team,

I’m doing selective restore - restoring a single collection (6GB) from a FULL backup (2.5TB).
its keep running for 11+ hrs (still progressing).

Is this an expected behaviour. Please help me to fix this.

command: pbm restore 2023-05-09T14:56:30Z --ns=ctr.trademarks.gecn 2>&1

[mongod@ip-10-80-11-253 ~]$ pbm status

Cluster:

shard1ReplSet:

  • shard1ReplSet/10.80.11.253:27038 [P]: pbm-agent v2.1.0 OK
    configReplSet:

  • configReplSet/10.80.11.253:27039 [P]: pbm-agent v2.1.0 OK
    shard3ReplSet:

  • shard3ReplSet/10.80.10.159:27038 [P]: pbm-agent v2.1.0 OK
    shard2ReplSet:

  • shard2ReplSet/10.80.10.186:27038 [P]: pbm-agent v2.1.0 OK

PITR incremental backup:

Status [OFF]

Currently running:

Snapshot restore “2023-05-09T14:56:30Z”, started at 2023-05-16T04:30:23Z. Status: oplog restore. [op id: 646306df173cdb6959b600ac]

Backups:

S3 us-east-1 s3://xxxxxxxxxx/pbm/backup/
Snapshots:
2023-05-12T07:58:25Z 92.68KB <logical, selective> [restore_to_time: 2023-05-12T07:58:33Z]
2023-05-12T07:49:04Z 6.53GB <logical, selective> [restore_to_time: 2023-05-12T07:54:17Z]
2023-05-12T06:56:45Z 75.56KB <logical, selective> [restore_to_time: 2023-05-12T06:56:52Z]
2023-05-11T14:17:04Z 752.69GB <logical, selective> [restore_to_time: 2023-05-11T17:33:36Z]
2023-05-09T14:56:30Z 2.68TB [restore_to_time: 2023-05-10T01:41:15Z]


[mongod@ip-10-80-11-253 ~]$ pbm logs
2023-05-16T04:30:23Z I [shard1ReplSet/10.80.11.253:27038] got epoch {1684211244 3}
2023-05-16T04:30:23Z I [shard1ReplSet/10.80.11.253:27038] [restore/2023-05-16T04:30:23.527718212Z] backup: 2023-05-09T14:56:30Z
2023-05-16T04:30:23Z I [shard1ReplSet/10.80.11.253:27038] [restore/2023-05-16T04:30:23.527718212Z] restore started
2023-05-16T04:30:24Z I [configReplSet/10.80.11.253:27039] [restore/2023-05-16T04:30:23.527718212Z] moving to state running
2023-05-16T04:30:24Z I [shard2ReplSet/10.80.10.186:27038] [restore/2023-05-16T04:30:23.527718212Z] moving to state running
2023-05-16T04:30:24Z I [shard1ReplSet/10.80.11.253:27038] [restore/2023-05-16T04:30:23.527718212Z] moving to state running
2023-05-16T04:30:27Z I [configReplSet/10.80.11.253:27039] [restore/2023-05-16T04:30:23.527718212Z] moving to state dumpDone
2023-05-16T04:32:47Z I [shard3ReplSet/10.80.10.159:27038] [restore/2023-05-16T04:30:23.527718212Z] moving to state dumpDone
2023-05-16T04:32:52Z I [shard2ReplSet/10.80.10.186:27038] [restore/2023-05-16T04:30:23.527718212Z] moving to state dumpDone
2023-05-16T04:32:55Z I [shard3ReplSet/10.80.10.159:27038] [restore/2023-05-16T04:30:23.527718212Z] starting oplog replay
2023-05-16T04:32:56Z I [shard1ReplSet/10.80.11.253:27038] [restore/2023-05-16T04:30:23.527718212Z] moving to state dumpDone
2023-05-16T04:32:57Z I [shard2ReplSet/10.80.10.186:27038] [restore/2023-05-16T04:30:23.527718212Z] starting oplog replay
2023-05-16T04:32:57Z I [shard1ReplSet/10.80.11.253:27038] [restore/2023-05-16T04:30:23.527718212Z] starting oplog replay
2023-05-16T04:32:58Z I [configReplSet/10.80.11.253:27039] [restore/2023-05-16T04:30:23.527718212Z] starting oplog replay
2023-05-16T04:33:53Z I [shard3ReplSet/10.80.10.159:27038] [restore/2023-05-16T04:30:23.527718212Z] oplog replay finished on {1683682875 266}
2023-05-16T04:33:53Z I [shard3ReplSet/10.80.10.159:27038] [restore/2023-05-16T04:30:23.527718212Z] restore finished successfully
2023-05-16T04:35:58Z I [shard2ReplSet/10.80.10.186:27038] [restore/2023-05-16T04:30:23.527718212Z] oplog replay finished on {1683682875 261}
2023-05-16T04:35:58Z I [shard2ReplSet/10.80.10.186:27038] [restore/2023-05-16T04:30:23.527718212Z] restore finished successfully
2023-05-16T04:40:36Z I [shard1ReplSet/10.80.11.253:27038] [restore/2023-05-16T04:30:23.527718212Z] oplog replay finished on {1683682875 267}
2023-05-16T04:40:36Z I [shard1ReplSet/10.80.11.253:27038] [restore/2023-05-16T04:30:23.527718212Z] restore finished successfully

Environment

None

Attachments

8

Activity

Show:

Ranjith August 8, 2023 at 7:58 AM

Hi @kai Wagner:

As you said, i an unable to reproduce this issue during our phone call, that's because i mistakenly used the wrong backup file.(backup of selective namespace instead of full backup). Sent email to you all already. Please let me know if you can schedule a call once again so that i can reproduce this issue and show you all. thanks

Please let me know if i can open a new jira request so that someone can take a look on this.

Kai Wagner August 8, 2023 at 7:39 AM

Hello .

I'll close this ticket for now as we cannot reproduce your issue, nor do we have any other community member or customer with something along that lines.

You were also unable to reproduce this issue during our phone call, so we can't spend more time on this for now.

If you still think you need further assistance, please reach out to or directly to https://www.percona.com/about/contact to ask for a support contract. 

Ranjith August 8, 2023 at 7:24 AM

Hi @Dmytro,

Attached the mongod logs between 09:38:00 - 09:42:00 (2023-08-04).


Appreciate your help on this.

Dmytro Zghoba August 8, 2023 at 5:51 AM

Unfortunately, MongoDB logs are useless. They do not match the backup oplog applying logs time.
You sent me the last 100 lines on 2023-08-07. But backup started on 2023-08-04, and the oplog applying stuck at "2023-08-04 09:40:04".

I expected logs of mongod between 09:38:00 - 09:42:00 (2023-08-04). In this case, I would investigate if there is an error or other info during oplog applying that restore.

Ranjith August 7, 2023 at 8:41 AM

another run still hung for 3+ days.

Currently running:
==================
Snapshot restore "2023-08-03T13:52:31Z", started at 2023-08-04T09:40:04Z. Status: oplog restore. [op id: 64ccc774a242d43c2d5e857c]
[mongod@ip-10-80-10-159 ~]$ date
Mon Aug 7 08:40:02 GMT 2023

[mongod@ip-10-80-10-159 ~]$ pbm logs -s D -n "configReplSet/10.80.10.159:27039" --follow

2023-08-04T09:40:04Z I [restore/2023-08-04T09:40:04.070146102Z] recovery started
2023-08-04T09:40:04Z D [restore/2023-08-04T09:40:04.070146102Z] waiting for 'starting' status
2023-08-04T09:40:05Z I [restore/2023-08-04T09:40:04.070146102Z] moving to state running
2023-08-04T09:40:08Z I [restore/2023-08-04T09:40:04.070146102Z] moving to state dumpDone
2023-08-04T09:42:04Z I [restore/2023-08-04T09:40:04.070146102Z] starting oplog replay
2023-08-04T09:42:04Z D [restore/2023-08-04T09:40:04.070146102Z] + applying {configReplSet 2023-08-03T13:52:31Z/configReplSet/local.oplog.rs.bson.s2 s2 {1691070753 30} {1691084125 1} 0}

Also attached the MongoDB logs for the configReplSet.

Cannot Reproduce

Details

Assignee

Reporter

Needs QA

Affects versions

Priority

Smart Checklist

Created May 16, 2023 at 4:04 PM
Updated March 5, 2024 at 6:39 PM
Resolved August 8, 2023 at 7:39 AM