Description
Easy to reproduce, starting with a simple RocksDB table with a few rows:
CREATE TABLE `test` ( `col1` bigint unsigned NOT NULL AUTO_INCREMENT, `col2` binary(16) NOT NULL, `col3` varchar(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL, `col4` bigint NOT NULL, `col5` mediumblob NOT NULL, PRIMARY KEY (`col1`), UNIQUE KEY `uc_key` (`col2`,`col3`,`col4`) ) ENGINE=ROCKSDB; insert into test (col2,col3,col4,col5) values('aa', 'aa', 100, 'aa'); insert into test (col2,col3,col4,col5) values('aa', 'bb', 100, 'bb'); insert into test (col2,col3,col4,col5) values('aa', 'cc', 100, 'cc'); insert into test (col2,col3,col4,col5) values('aa', 'dd', 100, 'dd'); insert into test (col2,col3,col4,col5) values('aa', 'cc', 99, 'cc');
Then run:
SELECT col2, col3, max(col4) AS col4 FROM test t WHERE t.col2 IN (x'61610000000000000000000000000000') AND t.col3 IN ('aa','bb','cc','dd') GROUP BY t.col2, t.col3;
+------------------------------------+------+------+ | col2 | col3 | col4 | +------------------------------------+------+------+ | 0x61610000000000000000000000000000 | aa | 100 | | 0x61610000000000000000000000000000 | bb | 100 | | 0x61610000000000000000000000000000 | cc | 100 | | 0x61610000000000000000000000000000 | dd | 100 | +------------------------------------+------+------+ 4 rows in set (0.00 sec)
Rebuild the table:
alter table test engine=rocksdb;
and run the SELECT again (it might take a few tries to obtain a different result set):
+------------------------------------+------+------+ | col2 | col3 | col4 | +------------------------------------+------+------+ | 0x61610000000000000000000000000000 | aa | 100 | +------------------------------------+------+------+ 1 row in set (0.00 sec)
The difference from EXPLAIN plans between a "good" plan (returning the expected result set) and a "bad" one:
--- explain.good 2020-09-16 15:16:39.410862546 -0300 +++ explain.bad 2020-09-16 15:14:59.152933716 -0300 @@ -2,7 +2,7 @@ "query_block": { "select_id": 1, "cost_info": { - "query_cost": "1.35" + "query_cost": "2.00" }, "grouping_operation": { "using_filesort": false, @@ -18,15 +18,15 @@ "col3" ], "key_length": "782", - "rows_examined_per_scan": 5, - "rows_produced_per_join": 5, + "rows_examined_per_scan": 7, + "rows_produced_per_join": 7, "filtered": "100.00", - "using_index": true, + "using_index_for_group_by": true, "cost_info": { - "read_cost": "0.85", - "eval_cost": "0.50", - "prefix_cost": "1.35", - "data_read_per_join": "3K" + "read_cost": "1.30", + "eval_cost": "0.70", + "prefix_cost": "2.00", + "data_read_per_join": "5K" }, "used_columns": [ "col1",
Note that forcing the optimizer to avoid range optimization on the unique key seems to assure the correct dataset to be always retrieved:
SELECT /*+ NO_RANGE_OPTIMIZATION(t uc_key)*/ col2, col3, max(col4) AS col4 FROM test t WHERE t.col2 IN (x'61610000000000000000000000000000') AND t.col3 IN ('aa','bb','cc','dd') GROUP BY t.col2, t.col3;