Uploaded image for project: 'Percona Toolkit'
  1. Percona Toolkit
  2. PT-157

Specifying the index to use for pt-archiver ignores --primary-key-only

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Done
    • Priority: Medium
    • Resolution: Fixed
    • Affects Version/s: 3.0.3
    • Fix Version/s: 3.0.13
    • Component/s: None
    • Labels:
      None

      Description

      We have a customer where the SELECT queries are taking a long time for pt-archiver because it is choosing the PRIMARY KEY, so it is desirable to force an index.

      Sample table
      CREATE TABLE `purge_test` (
        `id` bigint unsigned NOT NULL AUTO_INCREMENT,
        `data` varchar(100) NOT NULL DEFAULT '',
        `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
        PRIMARY KEY (`id`),
        KEY `created_at` (`created_at`)
      ) ENGINE=InnoDB DEFAULT CHARSET=latin1
      
      Example without specifying the index
      > /usr/bin/pt-archiver --source=h=localhost,D=sandbox,t=purge_test,b=0 --where="created_at < date_sub(now(), interval 1 minute)" --primary-key-only --bulk-delete --limit=1000 --dry-run --purge      
      
      SELECT /*!40001 SQL_NO_CACHE */ `id` FROM `sandbox`.`purge_test` FORCE INDEX(`PRIMARY`) WHERE (created_at < date_sub(now(), interval 1 minute)) AND (`id` < '100') ORDER BY `id` LIMIT 1000
      
      SELECT /*!40001 SQL_NO_CACHE */ `id` FROM `sandbox`.`purge_test` FORCE INDEX(`PRIMARY`) WHERE (created_at < date_sub(now(), interval 1 minute)) AND (`id` < '100') AND ((`id` >= ?)) ORDER BY `id` LIMIT 1000
      
      DELETE FROM `sandbox`.`purge_test` WHERE (((`id` >= ?))) AND (((`id` <= ?))) AND (created_at < date_sub(now(), interval 1 minute)) LIMIT 1000
      
      Example specifying the index
      > /usr/bin/pt-archiver --source=h=localhost,D=sandbox,t=purge_test,b=0,i=created_at --where="created_at < date_sub(now(), interval 1 minute)" --primary-key-only --bulk-delete --limit=1000 --dry-run --purge 
      
      SELECT /*!40001 SQL_NO_CACHE */ `id`,`data`,`created_at` FROM `sandbox`.`purge_test` FORCE INDEX(`created_at`) WHERE (created_at < date_sub(now(), interval 1 minute)) ORDER BY `created_at` LIMIT 1000
      
      SELECT /*!40001 SQL_NO_CACHE */ `id`,`data`,`created_at` FROM `sandbox`.`purge_test` FORCE INDEX(`created_at`) WHERE (created_at < date_sub(now(), interval 1 minute)) AND ((`created_at` >= ?)) ORDER BY `created_at` LIMIT 1000
      
      DELETE FROM `sandbox`.`purge_test` WHERE (((`created_at` >= ?))) AND (((`created_at` <= ?))) AND (created_at < date_sub(now(), interval 1 minute)) LIMIT 1000
      

      Here 2 problems exist:

      • it requests all columns even though --primary-key-only is selected (which could be large amounts of data)
      • despite having selected the PK field it does not delete by it and instead choses the index specified for the select

      This is causing issues for the customer as the SELECT queries are showing up as slow ones.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                carlos.salguero Carlos Salguero
                Reporter:
                ceri.williams Ceri Williams
                Reviewer:
                Ceri Williams
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - Not Specified
                  Not Specified
                  Logged:
                  Time Spent - 4 hours
                  4h