Uploaded image for project: 'Percona Monitoring and Management'
  1. Percona Monitoring and Management
  2. PMM-10443

Improvements on Vacuum Monitoring (follow up on PoC)


    • Improvement
    • Status: Done
    • Medium
    • Resolution: Done
    • None
    • 2.31.0
    • None
    • None
    • 5
    • Yes
    • Yes
    • No
    • [obsolete] C/S Core
    • Hide




      PoC for Vacuum monitoring was implemented in PMM-10244

      Everyone can take a look at changes and test it in PMM by using the following images


      server docker - perconalab/pmm-server-fb:PR-2643-72e8e6e
      client docker - perconalab/pmm-client-fb:PR-2643-72e8e6e
      client - https://s3.us-east-2.amazonaws.com/pmm-build-cache/PR-BUILDS/pmm2-client/pmm2-client-PR-2643-72e8e6e.tar.gz
      Create Staging Instance: https://pmm.cd.percona.com/job/aws-staging-start/parambuild/?DOCKER_VERSION=perconalab/pmm-server-fb:PR-2643-72e8e6e&CLIENT_VERSION=https://s3.us-east-2.amazonaws.com/pmm-build-cache/PR-BUILDS/pmm2-client/pmm2-client-PR-2643-72e8e6e.tar.gz  

      My inputs for further steps


       I think it'll be good to have

      1. Collector of those metrics with additional parameters for databases and tables that have more than 1000 tables
      2. Labels like oid, table, schema, and database can produce high cardinality issues in the future on large-scale databases with many tables. Or many clusters with many tables. 

      We also discussed that Postgres exporter needs to work with PG connections more gentle because PG creates one process per connection and in the case of 50 databases PG exporter needs to open 50 connections (I'm not sure about this point)

      Jobin's inputs

      When it comes to autovacuum, the ultimate source of information is the PostgreSQL logs.

      Generally every PostgreSQL user will set  log_autovacuum_min_duration  parameter and capture the details to PostgreSQL logs.

      It is important to understand which tables are becoming too frequent candidate for autovacuum and the load it creates.

      For example,


      2022-07-30 16:15:01.497 IST [62272] LOG:  automatic vacuum of table "postgres.public.pgbench_accounts": index scans: 1
              pages: 0 removed, 32787 remain, 0 skipped due to pins, 0 skipped frozen
              tuples: 1000000 removed, 1000000 remain, 0 are dead but not yet removable, oldest xmin: 820290
              index scan needed: 16394 pages from table (50.00% of total) had 1000000 dead item identifiers removed
              index "pgbench_accounts_pkey": pages: 5486 in total, 0 newly deleted, 0 currently deleted, 0 reusable
              index "idx_pgbench_accounts_bid": pages: 1629 in total, 836 newly deleted, 836 currently deleted, 0 reusable
              avg read rate: 101.908 MB/s, avg write rate: 114.909 MB/s
              buffer usage: 59385 hits, 36348 misses, 40985 dirtied
              WAL usage: 73567 records, 828 full page images, 16119666 bytes
              system usage: CPU: user: 0.82 s, system: 0.40 s, elapsed: 2.78 s 

      Charly's inputs


      • We need to be careful with the load will be added to the database we are monitoring. Few comments above databases with large number of tables are mentioned but there are other cases, for example, an instance with many databases. Postgres requires 1 connection per database, and the more databases in the same instance, the more connections PMM will open. It's a global problem for PMM and PG but can be even more harmful here;
      • Some queries can take longer depending on the number of objects (schemas, tables, etc) in the database and the 1k table cap may not solve the issue;
      • This is great to start the vacuum monitoring but we need start thinking about other sources of information, for example the one mentioned by Jobin, the log files. It may be for another POC or features updates but are they in the road map?
      • Configuration and granularity: Would it be possible for the user to have different granularity of the collected information? For example, would it be possible to enable the collector only for one specific database (the similar to schema in MySQL), and or disable for a list of them? Maybe add for future improvements?




            jiri.ctvrtka Jiří Čtvrtka
            andrew.minkin Andrew Minkin
            0 Vote for this issue
            8 Start watching this issue



              Smart Checklist