Uploaded image for project: 'Percona Server for MySQL'
  1. Percona Server for MySQL
  2. PS-2501

LP #1689268: Fulltext search can not find word which contains punctuation marks

    XMLWordPrintable

    Details

      Description

      **Reported in Launchpad by Sveta Smirnova last update 23-01-2018 13:08:07

      Originally reported at https://bugs.mysql.com/bug.php?id=86164

      Description:
      Fulltext search can not find word which contains punctuation marks like "!?,.{}/" in boolean mode BUT it does in natural language mode.

      There is one case reported for "." and "," but according our tests it includes much more punctuation marks.

      The reported case is https://bugs.mysql.com/bug.php?id=85876

      How to repeat:
      The problem can be reproduced as follows:

      =================================
      CREATE TABLE `test` (
      `seq` int(11) NOT NULL AUTO_INCREMENT,
      `cont` text COLLATE utf8mb4_bin,
      PRIMARY KEY (`seq`),
      FULLTEXT KEY `fx_txts` (`cont`) /*!50100 WITH PARSER `ngram` */
      ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;

      insert into test (cont) values ('aBc!efg');
      insert into test (cont) values ('aBc@efg');
      insert into test (cont) values ('aBc#efg');
      insert into test (cont) values ('aBc$efg');
      insert into test (cont) values ('aBc%efg');
      insert into test (cont) values ('abc^efg');
      insert into test (cont) values ('abc&efg');
      insert into test (cont) values ('abc*efg');
      insert into test (cont) values ('abc(efg');
      insert into test (cont) values ('abc)efg');

      Then try to select whatever value using boolean mode:
      mysql [localhost]

      {msandbox} (test) > select * from test where match(cont) against ('"c!e"' in boolean mode);
      Empty set (0.00 sec)

      mysql [localhost] {msandbox}

      (test) > select * from test where match(cont) against ('"c#e"' in boolean mode);
      Empty set (0.00 sec)

      mysql [localhost]

      {msandbox} (test) > select * from test where match(cont) against ('"c$e"' in boolean mode);
      Empty set (0.00 sec)

      mysql [localhost] {msandbox}

      (test) > select * from test where match(cont) against ('"c!e"' in boolean mode);
      Empty set (0.00 sec)

      If change to natural language mode it works:
      mysql [localhost]

      {msandbox} (test) > select * from test where match(cont) against ('"c!e"' in natural language mode);
      ------------+
      | seq | cont |
      ------------+
      | 1 | aBc!efg |
      | 11 | abc!efg |
      ------------+
      2 rows in set (0.00 sec)

      The index entries are created as can be seen in INFORMATION_SCHEMA.INNODB_FT_INDEX_CACHE:

      mysql [localhost] {msandbox}

      (test) > SELECT * FROM INFORMATION_SCHEMA.INNODB_FT_INDEX_CACHE; ----------------------------------------------------+

      WORD FIRST_DOC_ID LAST_DOC_ID DOC_COUNT DOC_ID POSITION

      ----------------------------------------------------+

      !e 2 12 2 2 3
      !e 2 12 2 12 3
      #e 4 4 1 4 3
      $e 5 5 1 5 3
      %e 6 6 1 6 3
      &e 8 8 1 8 3
      (e 10 10 1 10 3
      )e 11 11 1 11 3
      *e 9 9 1 9 3
      @e 3 3 1 3 3
      ^e 7 7 1 7 3

      <...>

      c! 2 12 2 2 2
      c! 2 12 2 12 2
      c# 4 4 1 4 2
      c$ 5 5 1 5 2
      c% 6 6 1 6 2
      c& 8 8 1 8 2
      c( 10 10 1 10 2
      c) 11 11 1 11 2
      c* 9 9 1 9 2
      c@ 3 3 1 3 2
      c^ 7 7 1 7 2

      <...>
      ----------------------------------------------------+
      55 rows in set (0.00 sec)

        Smart Checklist

          Attachments

            Issue Links

              Activity

                People

                Assignee:
                Unassigned
                Reporter:
                lpjirasync lpjirasync (Inactive)
                Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                  Dates

                  Created:
                  Updated:
                  Resolved:

                    Time Tracking

                    Estimated:
                    Original Estimate - Not Specified
                    Not Specified
                    Remaining:
                    Remaining Estimate - 0 minutes
                    0m
                    Logged:
                    Time Spent - 3 minutes
                    3m