Details
-
Improvement
-
Status: Done
-
Medium
-
Resolution: Fixed
-
3.0.3
-
None
-
None
Description
We have a customer where the SELECT queries are taking a long time for pt-archiver because it is choosing the PRIMARY KEY, so it is desirable to force an index.
Sample table
CREATE TABLE `purge_test` ( `id` bigint unsigned NOT NULL AUTO_INCREMENT, `data` varchar(100) NOT NULL DEFAULT '', `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (`id`), KEY `created_at` (`created_at`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1
Example without specifying the index
> /usr/bin/pt-archiver --source=h=localhost,D=sandbox,t=purge_test,b=0 --where="created_at < date_sub(now(), interval 1 minute)" --primary-key-only --bulk-delete --limit=1000 --dry-run --purge SELECT /*!40001 SQL_NO_CACHE */ `id` FROM `sandbox`.`purge_test` FORCE INDEX(`PRIMARY`) WHERE (created_at < date_sub(now(), interval 1 minute)) AND (`id` < '100') ORDER BY `id` LIMIT 1000 SELECT /*!40001 SQL_NO_CACHE */ `id` FROM `sandbox`.`purge_test` FORCE INDEX(`PRIMARY`) WHERE (created_at < date_sub(now(), interval 1 minute)) AND (`id` < '100') AND ((`id` >= ?)) ORDER BY `id` LIMIT 1000 DELETE FROM `sandbox`.`purge_test` WHERE (((`id` >= ?))) AND (((`id` <= ?))) AND (created_at < date_sub(now(), interval 1 minute)) LIMIT 1000
Example specifying the index
> /usr/bin/pt-archiver --source=h=localhost,D=sandbox,t=purge_test,b=0,i=created_at --where="created_at < date_sub(now(), interval 1 minute)" --primary-key-only --bulk-delete --limit=1000 --dry-run --purge SELECT /*!40001 SQL_NO_CACHE */ `id`,`data`,`created_at` FROM `sandbox`.`purge_test` FORCE INDEX(`created_at`) WHERE (created_at < date_sub(now(), interval 1 minute)) ORDER BY `created_at` LIMIT 1000 SELECT /*!40001 SQL_NO_CACHE */ `id`,`data`,`created_at` FROM `sandbox`.`purge_test` FORCE INDEX(`created_at`) WHERE (created_at < date_sub(now(), interval 1 minute)) AND ((`created_at` >= ?)) ORDER BY `created_at` LIMIT 1000 DELETE FROM `sandbox`.`purge_test` WHERE (((`created_at` >= ?))) AND (((`created_at` <= ?))) AND (created_at < date_sub(now(), interval 1 minute)) LIMIT 1000
Here 2 problems exist:
- it requests all columns even though --primary-key-only is selected (which could be large amounts of data)
- despite having selected the PK field it does not delete by it and instead choses the index specified for the select
This is causing issues for the customer as the SELECT queries are showing up as slow ones.