Details
-
Improvement
-
Status: Done
-
Medium
-
Resolution: Fixed
-
None
-
None
-
None
Description
A backup is started with the "pbm backup" subcommand.
It will launch the backup asynchronously as of v1.0. The backup make take hours when the db size is large and/or the upload speed, and we don't want to block the shell terminal by default. (We may add a "--wait" or "--blocking-mode" or "--show-progress" option in later versions which will make it synchronous, but it is asynchronous by default.)
When a backup has been launched it means (in the normal case): The pbm-agents have begun the upload to the remote storage.
When a backup has been launched but there has been an error it means: The backup command object has been created by the pbm CLI in a place (currently admin.pbmCmd) but one or more of replicasets do not have any pbm-agent doing the work that is supposed to happen. We expect that the pbm-agents will detect this and self-abort (i.e. requiring no input from user), but A) they might lack handling for a given error and B) in the case of all pbm-agents being dead or being killed simultaneously nothing can happen automatically anyway. So it will probably always be necessary that the pbm user needs a way to stop/abort/cancel a backup stuck in an erroneous state.
Let's make a "pbm abort-backup" or "pbm cancel-backup" or similar (or "pbm backup --abort"?) subcommand that will either:
- Stop a normally-running backup and log that it was canceled by user.
- In a stuck-in-error backup stop the backup processes still being run by pbm-agents (if any). Update the pbm control collections appropriately to record that backup finished in error.
- At the end of the "pbm cancel-backup" make sure the pbm-agent processes will be in state ready to start the next backup request OK.
Attachments
Issue Links
- is blocked by
-
PBM-145 Make 'run backup' asynchronous
-
- Done
-