Uploaded image for project: 'Percona Toolkit'
  1. Percona Toolkit
  2. PT-157

Specifying the index to use for pt-archiver ignores --primary-key-only

Details

    • Improvement
    • Status: Done
    • Medium
    • Resolution: Fixed
    • 3.0.3
    • 3.0.13
    • None
    • None

    Description

      We have a customer where the SELECT queries are taking a long time for pt-archiver because it is choosing the PRIMARY KEY, so it is desirable to force an index.

      Sample table
      CREATE TABLE `purge_test` (
        `id` bigint unsigned NOT NULL AUTO_INCREMENT,
        `data` varchar(100) NOT NULL DEFAULT '',
        `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
        PRIMARY KEY (`id`),
        KEY `created_at` (`created_at`)
      ) ENGINE=InnoDB DEFAULT CHARSET=latin1
      
      Example without specifying the index
      > /usr/bin/pt-archiver --source=h=localhost,D=sandbox,t=purge_test,b=0 --where="created_at < date_sub(now(), interval 1 minute)" --primary-key-only --bulk-delete --limit=1000 --dry-run --purge      
      
      SELECT /*!40001 SQL_NO_CACHE */ `id` FROM `sandbox`.`purge_test` FORCE INDEX(`PRIMARY`) WHERE (created_at < date_sub(now(), interval 1 minute)) AND (`id` < '100') ORDER BY `id` LIMIT 1000
      
      SELECT /*!40001 SQL_NO_CACHE */ `id` FROM `sandbox`.`purge_test` FORCE INDEX(`PRIMARY`) WHERE (created_at < date_sub(now(), interval 1 minute)) AND (`id` < '100') AND ((`id` >= ?)) ORDER BY `id` LIMIT 1000
      
      DELETE FROM `sandbox`.`purge_test` WHERE (((`id` >= ?))) AND (((`id` <= ?))) AND (created_at < date_sub(now(), interval 1 minute)) LIMIT 1000
      
      Example specifying the index
      > /usr/bin/pt-archiver --source=h=localhost,D=sandbox,t=purge_test,b=0,i=created_at --where="created_at < date_sub(now(), interval 1 minute)" --primary-key-only --bulk-delete --limit=1000 --dry-run --purge 
      
      SELECT /*!40001 SQL_NO_CACHE */ `id`,`data`,`created_at` FROM `sandbox`.`purge_test` FORCE INDEX(`created_at`) WHERE (created_at < date_sub(now(), interval 1 minute)) ORDER BY `created_at` LIMIT 1000
      
      SELECT /*!40001 SQL_NO_CACHE */ `id`,`data`,`created_at` FROM `sandbox`.`purge_test` FORCE INDEX(`created_at`) WHERE (created_at < date_sub(now(), interval 1 minute)) AND ((`created_at` >= ?)) ORDER BY `created_at` LIMIT 1000
      
      DELETE FROM `sandbox`.`purge_test` WHERE (((`created_at` >= ?))) AND (((`created_at` <= ?))) AND (created_at < date_sub(now(), interval 1 minute)) LIMIT 1000
      

      Here 2 problems exist:

      • it requests all columns even though --primary-key-only is selected (which could be large amounts of data)
      • despite having selected the PK field it does not delete by it and instead choses the index specified for the select

      This is causing issues for the customer as the SELECT queries are showing up as slow ones.

      Attachments

        Issue Links

          Activity

            People

              carlos.salguero Carlos Salguero (Inactive)
              ceri.williams Ceri Williams
              Ceri Williams Ceri Williams
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - Not Specified
                  Not Specified
                  Logged:
                  Time Spent - 4 hours
                  4h

                  Smart Checklist