Uploaded image for project: 'Percona Monitoring and Management'
  1. Percona Monitoring and Management
  2. PMM-9614

Upgrading PMM Server from 2.25 to 2.26 while monitoring a mongo with SSL enabled causes the agents to break.

Details

    • 4
    • Yes
    • Yes
    • [obsolete] C/S Core

    Description

      If you are monitoring a MongoDB instance with SSL enabled and upgrades to PMM 2.26, it breaks all monitoring on that server.

      All pmm-admin commands fail with "Internal server error."

      The service can't be removed and re-added, as removing it fails with:

      pmm-admin remove mongodb leonardo-bacchi-fernandes-default-mongodb
      Internal server error.
      

      It is not possible to remove the service through the GUI (PMM Inventory -> Services) either. 

       The pmm-managed logs show continuous entries of the following error:

      ERRO[2022-02-16T21:32:24.341+00:00] Failed to update configuration, will retry: sql: Scan error on column index 32, name "mongo_db_tls_options": failed to unmarshal JSON column: json: cannot unmarshal string into Go struct field MongoDBOptions.stats_collections of type []string
      github.com/percona/pmm-managed/models.FindAgentsForScrapeConfig
              /home/builder/rpm/BUILD/pmm-managed-6914083707b5478605e7bc816de8fc68b5f511f6/src/github.com/percona/pmm-managed/models/agent_helpers.go:454
      github.com/percona/pmm-managed/services/victoriametrics.AddScrapeConfigs
      

       

      Steps to reproduce: 

      With PMM Server 2.25 and PMM Client 2.25:

      Create a MongoDB with SSL enabled: 

      net:
       port: 27017
       bindIp: 0.0.0.0
       ssl:
         mode: requireSSL
         PEMKeyFile: /etc/ssl/mongodb.pem
         CAFile: /etc/ssl/rootCA.pem
         allowInvalidCertificates: true
      

      Configure and monitor this MongoDB instance using SSL:

      pmm-admin add mongodb --tls --tls-skip-verify --tls-certificate-key-file=/etc/ssl/mongodb.pem --tls-ca-file=/etc/ssl/rootCA.pem --host=127.0.0.1
      

      Upgrade PMM Server to 2.26 (this was reproduced through GUI upgrade, but haven't tested other methods). 

      Connect to the PMM container and check:

      tail -100f /srv/logs/pmm-managed.log
      

      Both with PMM Client 2.25 / 2.26, all commands fail on that instance. You can't remove the services or the nodes to re-add the instance.

      Root Cause

      A decision was made to update the MongoDB configuration JSON structure. This involved updating stats_collections field type from string to string array. This change was part of PMM 2.26.0. This change breaks backward compatibility. A database migration is required to ensure smooth upgrade from any version older than 2.26.0 to newer versions. However, the migration was missed in 2.26.0.

      Solution

      Workaround

      The problem is a result of a missing database migration. The migration would normally be applied as part of PMM upgrade. Applying the migration to the database manually is an available workaround.

      UPDATE agents SET mongo_db_tls_options = jsonb_set(mongo_db_tls_options, '{stats_collections}', to_jsonb(string_to_array(mongo_db_tls_options->>'stats_collections', ','))) WHERE 'mongo_db_tls_options' is not null AND jsonb_typeof(mongo_db_tls_options->'stats_collections') = 'string' 

      Fix

       Analysis: The missing migration will be a part of 2.27.0. This is problematic, since the migration would now have to handle 2 different types of stats_collections field in configuration JSON. Configuration data added by version 2.25.0 or earlier would have stored this field as a string, whereas 2.26.0 would save the same as a string array.

      Hence the migration needs to update a string to a string array, while skipping it if the field is already a string array.

      The following migrations were considered as a solution as part of our fix process:

      Status Migration Issues
      REJECTED UPDATE agents SET mongo_db_tls_options = jsonb_set(mongo_db_tls_options, '{stats_collections}', '[]') WHERE 'mongo_db_tls_options' is not null' This causes loss of configuration data (stats_collections), which is not an acceptable scenario.
      REJECTED  UPDATE agents SET mongo_db_tls_options = jsonb_set(mongo_db_tls_options, '{stats_collections}', to_jsonb(string_to_array(mongo_db_tls_options->>'stats_collections', ','))) WHERE 'mongo_db_tls_options' is not null' This doesn't cause loss of configuration data, which is great. However, for an entry added in PMM 2.26.0, this migration would update the stats_collections array, which should not happen.
      ACCEPTED  UPDATE agents SET mongo_db_tls_options = jsonb_set(mongo_db_tls_options, '{stats_collections}', to_jsonb(string_to_array(mongo_db_tls_options->>'stats_collections', ','))) WHERE 'mongo_db_tls_options' is not null AND jsonb_typeof(mongo_db_tls_options->'stats_collections') = 'string' No issues reported.

      Future improvements

       Structs which marshal or unmarshal data from the database, whenever modified, should be accompanied with a corresponding migration. We need to add a checklist as part of our code review process to ensure this. Also, QA should be informed of any such changes to have better test coverage. 

      Acceptance Criteria

      Case 1. PMM 2.25.0 to PMM 2.26.0+
      Scenario Migration from PMM 2.25.0 to any PMM version after 2.26.0
      When PMM is monitoring a MongoDB instance where stats_collections parameter is used
      Then Newer PMM version is able to monitor MongoDB instance post migration
      And MongoDB exporter has stats_collections field with right parameters
      Case 2. PMM 2.26.0 to PMM 2.26.0+
      Scenario Migration from PMM 2.26.0 to any PMM version after 2.26.0
      When PMM is monitoring a MongoDB instance where stats_collections parameter is used
      Then Newer PMM version is able to monitor MongoDB instance post migration
      And MongoDB exporter has stats_collections field with right parameters

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              leonardo.bacchi.fernandes Leonardo Bacchi Fernandes
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Smart Checklist