Details
-
Bug
-
Status: Done
-
High
-
Resolution: Fixed
-
None
-
4
-
Yes
-
Yes
-
[obsolete] C/S Core
Description
If you are monitoring a MongoDB instance with SSL enabled and upgrades to PMM 2.26, it breaks all monitoring on that server.
All pmm-admin commands fail with "Internal server error."
The service can't be removed and re-added, as removing it fails with:
pmm-admin remove mongodb leonardo-bacchi-fernandes-default-mongodb
Internal server error.
It is not possible to remove the service through the GUI (PMM Inventory -> Services) either.
The pmm-managed logs show continuous entries of the following error:
ERRO[2022-02-16T21:32:24.341+00:00] Failed to update configuration, will retry: sql: Scan error on column index 32, name "mongo_db_tls_options": failed to unmarshal JSON column: json: cannot unmarshal string into Go struct field MongoDBOptions.stats_collections of type []string
github.com/percona/pmm-managed/models.FindAgentsForScrapeConfig
/home/builder/rpm/BUILD/pmm-managed-6914083707b5478605e7bc816de8fc68b5f511f6/src/github.com/percona/pmm-managed/models/agent_helpers.go:454
github.com/percona/pmm-managed/services/victoriametrics.AddScrapeConfigs
Steps to reproduce:
With PMM Server 2.25 and PMM Client 2.25:
Create a MongoDB with SSL enabled:
net:
port: 27017
bindIp: 0.0.0.0
ssl:
mode: requireSSL
PEMKeyFile: /etc/ssl/mongodb.pem
CAFile: /etc/ssl/rootCA.pem
allowInvalidCertificates: true
Configure and monitor this MongoDB instance using SSL:
pmm-admin add mongodb --tls --tls-skip-verify --tls-certificate-key-file=/etc/ssl/mongodb.pem --tls-ca-file=/etc/ssl/rootCA.pem --host=127.0.0.1
Upgrade PMM Server to 2.26 (this was reproduced through GUI upgrade, but haven't tested other methods).
Connect to the PMM container and check:
tail -100f /srv/logs/pmm-managed.log
Both with PMM Client 2.25 / 2.26, all commands fail on that instance. You can't remove the services or the nodes to re-add the instance.
Root Cause
A decision was made to update the MongoDB configuration JSON structure. This involved updating stats_collections field type from string to string array. This change was part of PMM 2.26.0. This change breaks backward compatibility. A database migration is required to ensure smooth upgrade from any version older than 2.26.0 to newer versions. However, the migration was missed in 2.26.0.
Solution
Workaround
The problem is a result of a missing database migration. The migration would normally be applied as part of PMM upgrade. Applying the migration to the database manually is an available workaround.
UPDATE agents SET mongo_db_tls_options = jsonb_set(mongo_db_tls_options, '{stats_collections}', to_jsonb(string_to_array(mongo_db_tls_options->>'stats_collections', ','))) WHERE 'mongo_db_tls_options' is not null AND jsonb_typeof(mongo_db_tls_options->'stats_collections') = 'string'
Fix
Analysis: The missing migration will be a part of 2.27.0. This is problematic, since the migration would now have to handle 2 different types of stats_collections field in configuration JSON. Configuration data added by version 2.25.0 or earlier would have stored this field as a string, whereas 2.26.0 would save the same as a string array.
Hence the migration needs to update a string to a string array, while skipping it if the field is already a string array.
The following migrations were considered as a solution as part of our fix process:
Status | Migration | Issues |
---|---|---|
REJECTED | UPDATE agents SET mongo_db_tls_options = jsonb_set(mongo_db_tls_options, '{stats_collections}', '[]') WHERE 'mongo_db_tls_options' is not null' | This causes loss of configuration data (stats_collections), which is not an acceptable scenario. |
REJECTED | UPDATE agents SET mongo_db_tls_options = jsonb_set(mongo_db_tls_options, '{stats_collections}', to_jsonb(string_to_array(mongo_db_tls_options->>'stats_collections', ','))) WHERE 'mongo_db_tls_options' is not null' | This doesn't cause loss of configuration data, which is great. However, for an entry added in PMM 2.26.0, this migration would update the stats_collections array, which should not happen. |
ACCEPTED | UPDATE agents SET mongo_db_tls_options = jsonb_set(mongo_db_tls_options, '{stats_collections}', to_jsonb(string_to_array(mongo_db_tls_options->>'stats_collections', ','))) WHERE 'mongo_db_tls_options' is not null AND jsonb_typeof(mongo_db_tls_options->'stats_collections') = 'string' | No issues reported. |
Future improvements
Structs which marshal or unmarshal data from the database, whenever modified, should be accompanied with a corresponding migration. We need to add a checklist as part of our code review process to ensure this. Also, QA should be informed of any such changes to have better test coverage.
Acceptance Criteria
Scenario | Migration from PMM 2.25.0 to any PMM version after 2.26.0 |
When | PMM is monitoring a MongoDB instance where stats_collections parameter is used |
Then | Newer PMM version is able to monitor MongoDB instance post migration |
And | MongoDB exporter has stats_collections field with right parameters |
Scenario | Migration from PMM 2.26.0 to any PMM version after 2.26.0 |
When | PMM is monitoring a MongoDB instance where stats_collections parameter is used |
Then | Newer PMM version is able to monitor MongoDB instance post migration |
And | MongoDB exporter has stats_collections field with right parameters |
Attachments
Issue Links
- is duplicated by
-
PMM-9657 Upgrading to PMM-Server 2.26.0 causes pmm-managed Panic errors
-
- Done
-