Uploaded image for project: 'Percona Server'
  1. Percona Server
  2. PS-3928

fulltext search on ngram indexes on columns which use collation utf8mb4_bin cannot be performed for certain uppercase letters

    Details

    • Type: Bug
    • Status: Done
    • Priority: Medium
    • Resolution: Fixed
    • Affects Version/s: 5.7, Not 5.5, Not 5.6
    • Fix Version/s: 5.7.21-21
    • Component/s: None
    • Labels:

      Description

      How to repeat:

      set ft_query_extra_word_chars=1;
      
      CREATE TABLE `ngram_simple` ( 
      `i` int(11) NOT NULL AUTO_INCREMENT, 
      `txt` text COLLATE utf8mb4_bin NOT NULL, 
      PRIMARY KEY (`i`), 
      FULLTEXT KEY `fx_txts` (`txt`) /*!50100 WITH PARSER `ngram` */ 
      ) ENGINE=InnoDB AUTO_INCREMENT=10 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;
      
      insert into ngram_simple (txt) values ('CompP&C01');
      insert into ngram_simple (txt) values ('CompP&C02');
      insert into ngram_simple (txt) values ('CompP&C03');
      insert into ngram_simple (txt) values ('CompP&C04');
      insert into ngram_simple (txt) values ('CompP&C05');
      insert into ngram_simple (txt) values ('CompP&C06');
      insert into ngram_simple (txt) values ('abc*efg');
      insert into ngram_simple (txt) values ('abc&efg');
      insert into ngram_simple (txt) values ('abC&Efg');
      
      mysql> SELECT * FROM ngram_simple WHERE MATCH (txt) against ('C04' IN boolean MODE);
      Empty set (0,00 sec)
      
      mysql> select * from ngram_simple where match(txt) against ('C&E' in boolean mode);
      Empty set (0,00 sec)
      
      mysql> select * from ngram_simple where match(txt) against ('c&e' in boolean mode);
      +----+---------+
      | i  | txt     |
      +----+---------+
      | 17 | abc&efg |
      +----+---------+
      1 row in set (0,00 sec)
      
      mysql> alter table ngram_simple modify `txt` text COLLATE utf8mb4_general_ci NOT NULL;
      Query OK, 9 rows affected (6,30 sec)
      Records: 9  Duplicates: 0  Warnings: 0
      
      mysql> SELECT * FROM ngram_simple WHERE MATCH (txt) against ('C04' IN boolean MODE);
      +----+-----------+
      | i  | txt       |
      +----+-----------+
      | 13 | CompP&C04 |
      +----+-----------+
      1 row in set (0,00 sec)
      
      mysql> select * from ngram_simple where match(txt) against ('C&E' in boolean mode);
      +----+---------+
      | i  | txt     |
      +----+---------+
      | 17 | abc&efg |
      | 18 | abC&Efg |
      +----+---------+
      2 rows in set (0,01 sec)
      
      mysql> select * from ngram_simple where match(txt) against ('c&e' in boolean mode);
      +----+---------+
      | i  | txt     |
      +----+---------+
      | 17 | abc&efg |
      | 18 | abC&Efg |
      +----+---------+
      2 rows in set (0,00 sec)
      

        Smart Checklist

          Attachments

            Activity

              People

              • Assignee:
                zsolt.parragi Zsolt Parragi
                Reporter:
                sveta.smirnova Sveta Smirnova
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0 minutes
                  0m
                  Logged:
                  Time Spent - 1 day
                  1d