Uploaded image for project: 'Percona Server for MySQL'
  1. Percona Server for MySQL
  2. PS-7325

Modify SELECT to correct situation when data is missing from MyRocks table when GROUP BY is used

Details

    • Bug
    • Status: Done
    • Medium
    • Resolution: Fixed
    • 8.0.20-11
    • 8.0.21-12
    • MyRocks
    • None
    • Yes

    Description

      Easy to reproduce, starting with a simple RocksDB table with a few rows:

      CREATE TABLE `test` (
      `col1` bigint unsigned NOT NULL AUTO_INCREMENT,
      `col2` binary(16) NOT NULL,
      `col3` varchar(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL,
      `col4` bigint NOT NULL,
      `col5` mediumblob NOT NULL,
      PRIMARY KEY (`col1`),
      UNIQUE KEY `uc_key` (`col2`,`col3`,`col4`)
      ) ENGINE=ROCKSDB;
      insert into test (col2,col3,col4,col5) values('aa', 'aa', 100, 'aa');
      insert into test (col2,col3,col4,col5) values('aa', 'bb', 100, 'bb');
      insert into test (col2,col3,col4,col5) values('aa', 'cc', 100, 'cc');
      insert into test (col2,col3,col4,col5) values('aa', 'dd', 100, 'dd');
      insert into test (col2,col3,col4,col5) values('aa', 'cc', 99, 'cc');
      

      Then run: 

      SELECT col2, col3, max(col4) AS col4 FROM test t WHERE t.col2 IN (x'61610000000000000000000000000000') AND t.col3 IN ('aa','bb','cc','dd') GROUP BY t.col2, t.col3;
      +------------------------------------+------+------+
      | col2 | col3 | col4 |
      +------------------------------------+------+------+
      | 0x61610000000000000000000000000000 | aa | 100 |
      | 0x61610000000000000000000000000000 | bb | 100 |
      | 0x61610000000000000000000000000000 | cc | 100 |
      | 0x61610000000000000000000000000000 | dd | 100 |
      +------------------------------------+------+------+
      4 rows in set (0.00 sec)

      Rebuild the table: 

      alter table test engine=rocksdb;

      and run the SELECT again (it might take a few tries to obtain a different result set):

      +------------------------------------+------+------+
      | col2 | col3 | col4 |
      +------------------------------------+------+------+
      | 0x61610000000000000000000000000000 | aa | 100 |
      +------------------------------------+------+------+
      1 row in set (0.00 sec)

      The difference from EXPLAIN plans between a "good" plan (returning the expected result set) and a "bad" one:

      --- explain.good 2020-09-16 15:16:39.410862546 -0300
      +++ explain.bad 2020-09-16 15:14:59.152933716 -0300
      @@ -2,7 +2,7 @@
       "query_block": {
       "select_id": 1,
       "cost_info": {
      - "query_cost": "1.35"
      + "query_cost": "2.00"
       },
       "grouping_operation": {
       "using_filesort": false,
      @@ -18,15 +18,15 @@
       "col3"
       ],
       "key_length": "782",
      - "rows_examined_per_scan": 5,
      - "rows_produced_per_join": 5,
      + "rows_examined_per_scan": 7,
      + "rows_produced_per_join": 7,
       "filtered": "100.00",
      - "using_index": true,
      + "using_index_for_group_by": true,
       "cost_info": {
      - "read_cost": "0.85",
      - "eval_cost": "0.50",
      - "prefix_cost": "1.35",
      - "data_read_per_join": "3K"
      + "read_cost": "1.30",
      + "eval_cost": "0.70",
      + "prefix_cost": "2.00",
      + "data_read_per_join": "5K"
       },
       "used_columns": [
       "col1",

      Note that forcing the optimizer to avoid range optimization on the unique key seems to assure the correct dataset to be always retrieved:

      SELECT /*+ NO_RANGE_OPTIMIZATION(t uc_key)*/ col2, col3, max(col4) AS col4 FROM test t WHERE t.col2 IN (x'61610000000000000000000000000000') AND t.col3 IN ('aa','bb','cc','dd') GROUP BY t.col2, t.col3;

      Attachments

        Activity

          People

            marcelo.altmann Marcelo Altmann
            fernando.laudares@percona.com Fernando Laudares
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - Not Specified
                Not Specified
                Logged:
                Time Spent - 2 days, 6 hours, 30 minutes
                2d 6h 30m

                Smart Checklist