'pmm-admin status --wait' option added to allow for configurable delay in checking status of pmm-agent

Description

For better integration with k8s, there should be a way to wait for pmm-agent to become fully configured and connected.

Add --wait- flag to pmm-admin status command. Flag's default value(0s) should not change the command behavior. Otherwise, the flag's value should contain duration (like in pmm-admin status --wait=30s-). If given, pmm-admin should wait for pmm-agent to become connected for that maximum time. It should retry checking pmm-agent's status with a one-second delay after the previous request failed or returned information about pmm-agent not being connected. It should retry silently unless -debug or -trace is given. After pmm-agent is connected or that time elapsed, it should behave as it is now (i. e. produce normal output or exit with an appropriate error message and non-zero status code).

That flag should cover the following cases:

  • when pmm-agent is not running (what currently produces "Failed to get PMM Agent status from local pmm-agent: Post "http://127.0.0.1:7777/local/Status": dial tcp 127.0.0.1:7777: connect: connection refused." error);

  • when pmm-agent is running but not configured (what currently produces "Failed to get PMM Agent status from local pmm-agent: pmm-agent is running, but not set up." error);

  • when pmm-agent is running and configured, but not connected to PMM Server (what currently produces "Failed to get PMM Agent status from local pmm-agent: pmm-agent is not connected to PMM Server." too, but actually a different case).

How to test:

  • Setup PMM Server

  • Run pmm-admin status --wait=30s

    • After 30s should return error "Failed to get PMM Agent status from local pmm-agent: pmm-agent is running, but not set up."

  • Configure pmm-agent

  • Run pmm-admin status --wait=30s

    • Should return status immediately

  • Stop PMM Server

  • Run pmm-admin status --wait=30s

    • After 30s should return error "Failed to get PMM Agent status from local pmm-agent: pmm-agent is not connected to PMM Server."

  • Run pmm-admin status --wait=30s and start PMM Server

    • After a few seconds status should be returned

  • Stop pmm-agent

  • Run pmm-admin status --wait=30s

    • After 30s should return error "Failed to get PMM Agent status from local pmm-agent: Post "http://127.0.0.1:7777/local/Status": dial tcp 127.0.0.1:7777: connect: connection refused."

  • Run pmm-admin status --wait=30s and start pmm-agent

    • After a few seconds status should be returned

Documentation:

Documentation should be added to pmm-admin man page.

--wait values expected in go format `30s`, `5m`, `1h`

How to test

None

How to document

None

Smart Checklist

Activity

Show:

Paul Jacobs November 13, 2020 at 6:25 AM
Edited

 Is this new option wait or timeout? I'd understood it was changed from the former to the latter.

Former user November 12, 2020 at 1:31 PM

Paul Jacobs November 9, 2020 at 11:47 AM

'Deadline' is usually associated with a point in time. This 'wait' is a period of time irrespective of when it happens. --timeout seems natural and would be familiar to anyone using commands like ping. (In which case, why not add the short form -t to match?)

Alexey Palazhchenko November 4, 2020 at 6:22 PM

After 30s should return error // TBD

If we can improve that error message now – we should (see https://github.com/percona/pmm-admin/pull/123). But the exit code is more important right now.

Done

Details

Assignee

Reporter

Priority

Components

Needs QA

Yes

Needs Doc

Yes

Fix versions

Story Points

Sprint

Smart Checklist

Created October 26, 2020 at 2:17 PM
Updated March 6, 2024 at 3:46 AM
Resolved December 2, 2020 at 11:37 PM