Uploaded image for project: 'Percona Server for MySQL'
  1. Percona Server for MySQL
  2. PS-6995

Introduce a new optimizer switch to allow the user to reduce the cost of a range scan to determine best execution plan for Primary Key lookup

Details

    • Bug
    • Status: Done
    • Medium
    • Resolution: Fixed
    • 5.6.25-73.1, 5.7.28-31
    • 5.6.49-89.0, 5.7.31-34, 8.0.21-12
    • None
    • None
    • Ubuntu 16.04.6

      Kernel 4.10.0-40-generic

      2 x Intel Xeon E5-2630 v4

      380G RAM 

       

    • Yes

    Description

      In some cases optimizer decides to perform an index scan instead of direct lookup for queries produced by pt-table-checksum that should do a better lookup. 

      Consider following table definition:
      CREATE TABLE `testdata` (
        `testtype` binary(16) NOT NULL,
        `gid` binary(16) NOT NULL,
        `gid2` varbinary(16) NOT NULL,
        `revision` bigint(20) NOT NULL,
        `data_version` tinyint(3) NOT NULL DEFAULT '0',
        `data` mediumblob,
        PRIMARY KEY (`testtype`,`gid`,`gid2`) KEY_BLOCK_SIZE=8,
        UNIQUE KEY `testdata_key_v3` (`testtype`,`gid`,`revision`) KEY_BLOCK_SIZE=8
      ) ENGINE=InnoDB DEFAULT CHARSET=utf8 STATS_SAMPLE_PAGES=2000 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4
      1 row in set (0.00 sec)

      mysql> show table status like 'testdata'\G
      *************************** 1. row ***************************
                 Name: }}{{testdata{{}}
               Engine: InnoDB
              Version: 10
           Row_format: Compressed
                 Rows: 9077248562
       Avg_row_length: 89
          Data_length: 816283123712
      Max_data_length: 0
         Index_length: 230263353344
            Data_free: 19950993408
       Auto_increment: NULL
          Create_time: 2020-04-16 21:04:57
          Update_time: NULL
           Check_time: NULL
            Collation: utf8_general_ci
             Checksum: NULL
       Create_options: stats_auto_recalc=1 stats_sample_pages=10000 row_format=COMPRESSED KEY_BLOCK_SIZE=4
              Comment:
      1 row in set (0.00 sec)

      mysql> show indexes from testdata\G
      *************************** 1. row ***************************
      {{ Table: testdata}}
      {{ Non_unique: 0}}
      {{ Key_name: PRIMARY}}
      {{ Seq_in_index: 1}}
      {{ Column_name: testtype}}
      {{ Collation: A}}
      {{ Cardinality: 74120}}
      {{ Sub_part: NULL}}
      {{ Packed: NULL}}
      {{ Null:}}
      {{ Index_type: BTREE}}
      {{ Comment:}}
      Index_comment:
      *************************** 2. row ***************************
      {{ Table: testdata}}
      {{ Non_unique: 0}}
      {{ Key_name: PRIMARY}}
      {{ Seq_in_index: 2}}
      {{ Column_name: gid}}
      {{ Collation: A}}
      {{ Cardinality: 3704556800}}
      {{ Sub_part: NULL}}
      {{ Packed: NULL}}
      {{ Null:}}
      {{ Index_type: BTREE}}
      {{ Comment:}}
      Index_comment:
      *************************** 3. row ***************************
      {{ Table: testdata}}
      {{ Non_unique: 0}}
      {{ Key_name: PRIMARY}}
      {{ Seq_in_index: 3}}
      {{ Column_name: gid2}}
      {{ Collation: A}}
      {{ Cardinality: 9077249024}}
      {{ Sub_part: NULL}}
      {{ Packed: NULL}}
      {{ Null:}}
      {{ Index_type: BTREE}}
      {{ Comment:}}
      Index_comment:
      *************************** 4. row ***************************
      {{ Table: testdata}}
      {{ Non_unique: 0}}
      {{ Key_name: testdata_key_v3}}
      {{ Seq_in_index: 1}}
      {{ Column_name: testtype}}
      {{ Collation: A}}
      {{ Cardinality: 63873}}
      {{ Sub_part: NULL}}
      {{ Packed: NULL}}
      {{ Null:}}
      {{ Index_type: BTREE}}
      {{ Comment:}}
      Index_comment:
      *************************** 5. row ***************************
      {{ Table: testdata}}
      {{ Non_unique: 0}}
      {{ Key_name: testdata_key_v3}}
      {{ Seq_in_index: 2}}
      {{ Column_name: gid}}
      {{ Collation: A}}
      {{ Cardinality: 3126331392}}
      {{ Sub_part: NULL}}
      {{ Packed: NULL}}
      {{ Null:}}
      {{ Index_type: BTREE}}
      {{ Comment:}}
      Index_comment:
      *************************** 6. row ***************************
      {{ Table: testdata}}
      {{ Non_unique: 0}}
      {{ Key_name: testdata_key_v3}}
      {{ Seq_in_index: 3}}
      {{ Column_name: revision}}
      {{ Collation: A}}
      {{ Cardinality: 9077249024}}
      {{ Sub_part: NULL}}
      {{ Packed: NULL}}
      {{ Null:}}
      {{ Index_type: BTREE}}
      {{ Comment:}}
      Index_comment:
      6 rows in set (0.00 sec)

       

      Following query that should be super fast as is doing a lookup through Primary Key in order to get the first row to start checksuming data:

      SELECT /!40001 SQL_NO_CACHE */ `testtype`, `testtype`, `gid`, `testtype`, `gid`, `gid2` FROM `shard01`.`testdata` FORCE INDEX(`PRIMARY`) WHERE ((`testtype` > 0x2122919859220CC00000000000000001) OR (`testtype` = 0x2122919859220CC00000000000000001 AND `gid` > 0x744C9F6116DD92A30000000000000000) OR (`testtype` =0x2122919859220CC00000000000000001 AND `gid` = 0x744C9F6116DD92A30000000000000000 AND `gid2` > 0x003450DA7BC00E400000000000000000)) ORDER BY `testtype`, `gid`, `gid2` LIMIT 1 /*resume lower boundary/;

      Execution plan shows that it going to use Primary Key: 
      mysql> explain format=json SELECT /!40001 SQL_NO_CACHE */ `testtype`, `testtype`, `gid`, `testtype`, `gid`, `gid2` FROM `shard01`.`testdata` FORCE INDEX(`PRIMARY`) WHERE ((`testtype` > X'2122919859220CC00000000000000001') OR (`testtype` = X'2122919859220CC00000000000000001' AND `gid` > X'744C9F6116DD92A30000000000000000') OR (`testtype` =X'2122919859220CC00000000000000001' AND `gid` = X'744C9F6116DD92A30000000000000000' AND `gid2` > X'003450DA7BC00E400000000000000000')) ORDER BY `testtype`, `gid`, `gid2` LIMIT 1 /*resume lower boundary/\G
      *************************** 1. row ***************************
      EXPLAIN: {
        "query_block": {
          "select_id": 1,
          "cost_info": {
            "query_cost": "10892698275.40"
          },
          "ordering_operation": {
            "using_filesort": false,
            "table": {
              "table_name": "testdata",
              "access_type": "index",
              "possible_keys": [
                "PRIMARY"
              ],
              "key": "PRIMARY",
              "used_key_parts": [
                "testtype",
                "gid",
                "gid2"
              ],
              "key_length": "50",
              "rows_examined_per_scan": 1,
              "rows_produced_per_join": 9077248562,
              "filtered": "100.00",
              "using_index": true,
              "cost_info": {
                "read_cost": "9077248563.00",
                "eval_cost": "1815449712.40",
                "prefix_cost": "10892698275.40",
                "data_read_per_join": "608G"
              },
              "used_columns": [
                "testtype",
                "gid",
                "gid2"
              ],
              "attached_condition": "((`shard01`.`testdata`.`testtype` > 0x2122919859220cc00000000000000001) or ((`shard01`.`testdata`.`testtype` = 0x2122919859220cc00000000000000001) and (`shard01`.`testdata`.`gid` > 0x744c9f6116dd92a30000000000000000)) or ((`shard01`.`testdata`.`gid` = 0x744c9f6116dd92a30000000000000000) and (`shard01`.`testdata`.`testtype` = 0x2122919859220cc00000000000000001) and (`shard01`.`testdata`.`gid2` > 0x003450da7bc00e400000000000000000)))"
      {{      }}}
      {{    }}}
      {{  }}}
      }

      Which is demonstrated by the Handler% counters 

      mysql> show status like 'handle%read%';
      -------------------------------+
      | Variable_name         | Value    |
      -------------------------------+
      | Handler_read_first    | 1        |
      | Handler_read_key      | 2        |
      | Handler_read_last     | 0        |
      | Handler_read_next     | 15028951 | <- here
      | Handler_read_prev     | 0        |
      | Handler_read_rnd      | 0        |
      | Handler_read_rnd_next | 0        |
      -------------------------------+
      7 rows in set (0.00 sec)

      (I stoped the query after few secods but number is way higher)

      This is not expected behavior as all lookups are done through PK in order so is expected to see query finishing really quick, actually if I change the value in gid2 column to use same value than gid column (just picked randomly) I got a way better performance: 

      mysql> explain format=json SELECT /!40001 SQL_NO_CACHE */ `testtype`, `testtype`, `gid`, `testtype`, `gid`, `gid2` FROM `shard01`.`testdata` FORCE INDEX(`PRIMARY`) WHERE ((`testtype` > X'2122919859220CC00000000000000001') OR (`testtype` = X'2122919859220CC00000000000000001' AND `gid` > X'744C9F6116DD92A30000000000000000') OR (`testtype` =X'2122919859220CC00000000000000001' AND `gid` = X'744C9F6116DD92A30000000000000000' AND `gid2` > X'744C9F6116DD92A30000000000000000')) ORDER BY `testtype`, `gid`, `gid2` LIMIT 1 /*resume lower boundary/\G
      *************************** 1. row ***************************
      EXPLAIN: {
        "query_block": {
          "select_id": 1,
          "cost_info": {
            "query_cost": "3817859223.28"
          },
          "ordering_operation": {
            "using_filesort": false,
            "table": {
              "table_name": "testdata",
              "access_type": "range",
              "possible_keys": [
                "PRIMARY"
              ],
              "key": "PRIMARY",
              "used_key_parts": [
                "testtype",
                "gid",
                "gid2"
              ],
              "key_length": "50",
              "rows_examined_per_scan": 9493095212,
              "rows_produced_per_join": 9493095212,
              "filtered": "100.00",
              "using_index": true,
              "cost_info": {
                "read_cost": "1919240180.88",
                "eval_cost": "1898619042.40",
                "prefix_cost": "3817859223.28",
                "data_read_per_join": "636G"
              },
              "used_columns": [
                "testtype",
                "gid",
                "gid2"
              ],
              "attached_condition": "((`shard01`.`testdata`.`testtype` > 0x2122919859220cc00000000000000001) or ((`shard01`.`testdata`.`testtype` = 0x2122919859220cc00000000000000001) and (`shard01`.`testdata`.`gid` > 0x744c9f6116dd92a30000000000000000)) or ((`shard01`.`testdata`.`gid` = 0x744c9f6116dd92a30000000000000000) and (`shard01`.`testdata`.`testtype` = 0x2122919859220cc00000000000000001) and (`shard01`.`testdata`.`gid2` > 0x744c9f6116dd92a30000000000000000)))"
      {{      }}}
      {{    }}}
        }}}{mysql> SELECT /!40001 SQL_NO_CACHE */ `testtype`, `testtype`, `gid`, `testtype`, `gid`, `gid2` FROM `shard01`.`testdata` FORCE INDEX(`PRIMARY`) WHERE ((`testtype` > X'2122919859220CC00000000000000001') OR (`testtype` = X'2122919859220CC00000000000000001' AND `gid` > X'744C9F6116DD92A30000000000000000') OR (`testtype` =X'2122919859220CC00000000000000001' AND `gid` = X'744C9F6116DD92A30000000000000000' AND `gid2` > X'744C9F6116DD92A30000000000000000')) ORDER BY `testtype`, `gid`, `gid2` LIMIT 1 /*resume lower boundary/\G
      *************************** 1. row ***************************
      testtype: !"��Y"
      {{ �}}
      testtype: !"��Y"
      {{ �}}
      {{ gid: tL�aݒ�}}
      testtype: !"��Y"
      {{ �}}
      {{ gid: tL�aݒ�}}
      {{ gid2: tL�aݒ� ^*��}}
      1 row in set, 1 warning (0.00 sec)mysql> show status like 'Handler%re%';
      ----------------------------+
      | Variable_name | Value |
      ----------------------------+
      | Handler_prepare | 0 |
      | Handler_read_first | 0 |
      | Handler_read_key | 1 |<- expected behavior
      | Handler_read_last | 0 |
      | Handler_read_next | 0 |
      | Handler_read_prev | 0 |
      | Handler_read_rnd | 0 |
      | Handler_read_rnd_next | 0 |
      ----------------------------+
      8 rows in set (0.00 sec)

      I'm attaching two optimizer trace outputs with good and bad execution plans. 

      This problem is affecting Percona Server 5.6.25 and Percona Server 5.7.28.  

      Some interesting/important facts: 

      • Query parser/json render functions are casting values so we see a difference in the output of explain and optimizer trace from X'744C9F6116DD92A30000000000000000' to 0x744c9f6116dd92a30000000000000000. Is important to remark that in 5.6 this output is also truncated to shorted values like: `shard01`.`testdata`.`gid2` > 0x0000000000000000 which makes things even harder to debug. 
      • Changing amount of sample pages for persistent statistics and forcing ANALYZE TABLE don't make any difference in the execution plan.   
      • No optimizer switches makes any difference here. 

       

      Which is odd is that there is no way to predict when query will decide to make an index scan instead of a direct index lookup. 

      I have a test environment where we can run what's needed to track this down, so please feel free to make any questions about this. FWIW I don't know a way to reproduce the problem as this appears very randomly.  

      Attachments

        Activity

          People

            Unassigned Unassigned
            francisco.bordenave Francisco Bordenave
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - Not Specified
                Not Specified
                Logged:
                Time Spent - 1 day, 7 hours, 40 minutes
                1d 7h 40m

                Smart Checklist