Uploaded image for project: 'Percona Operator for MySQL based on Percona XtraDB Cluster'
  1. Percona Operator for MySQL based on Percona XtraDB Cluster
  2. K8SPXC-1036

Liveness check fails when XtraBackup is running and wsrep_sync_wait is set

Details

    • Bug
    • Status: Pending Release
    • Medium
    • Resolution: Done
    • 1.10.0
    • 1.12.0
    • None
    • None
    • Yes

    Description

      Backups run on non-writer nodes. When XtraBackup runs, it performs a LOCK TABLES FOR BACKUP on the instance. If the writer instance performs a DDL, such as creating a table, on the node where the backup is running, the DDL will wait behind the BACKUP LOCK to complete. If wsrep_sync_wait is != 0, subsequent SELECTs will remain on the write-set queue until the node is in sync. 

      MySQL client automatically performs two SELECTs when it connects to an instance:

      select @@version_comment limit 1
      select USER()

      When it connects using the --execute flag, it only performs the select @@version_comment. I didn't find a MySQL client option to avoid this last select.

      Since the default timeout for casual reads (repl.causal_read_timeout) is 30 seconds and the MySQL script liveness check is 10 seconds, the liveness check script fails without being able to connect to the database. This makes the node eventually to be rejected from the cluster.

      How to Reproduce:

      On node 1:

      mysql> SET GLOBAL wsrep_sync_wait = 7 ;
      mysql> LOCK TABLES FOR BACKUP ;

      – Keep the session open

      On node 2:

      mysql> CREATE TABLE test.t1 (c1 integer primary key) ;

      – On node 2 succeeds but on node1 you can see monitor user script starting to pile up with select @@version_comment query, similar to below:

       mysql> show processlist ;
      -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      | Id   | User            | Host                                                     | db    | Command | Time | State                                 | Info                                     | Time_ms | Rows_sent | Rows_examined |
      -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      |    1 | system user     |                                                          | test  | Query   |  196 | Waiting for table backup lock         | create table t1 (c1 integer primary key) |  196396 |         0 |             0 |
      |    2 | system user     |                                                          | NULL  | Sleep   | 2846 | wsrep aborter idle                    | NULL                                     | 2845832 |         0 |             0 |
      |    8 | event_scheduler | localhost                                                | NULL  | Daemon  | 2823 | Waiting on empty queue                | NULL                                     | 2823381 |         0 |             0 |
      |   10 | system user     |                                                          | NULL  | Query   |  208 | wsrep: committed TOI write set (8482) | NULL                                     |  207779 |         0 |             0 |
      | 1990 | root            | localhost                                                | NULL  | Sleep   | 1251 |                                       | NULL                                     | 1251017 |         0 |             1 |
      | 3707 | root            | localhost                                                | NULL  | Query   |    0 | init                                  | show processlist                         |       0 |         0 |             0 |
      | 3899 | monitor         | 10.96.1.12:37350                                         | NULL  | Query   |   29 | starting                              | select @@version_comment limit 1         |   28964 |         0 |             0 |
      | 3900 | monitor         | 10.96.1.12:37352                                         | NULL  | Query   |   29 | starting                              | select @@version_comment limit 1         |   28961 |         0 |             0 |
      | 3901 | monitor         | 10.96.1.12:37354                                         | NULL  | Query   |   29 | starting                              | select @@version_comment limit 1         |   28938 |         0 |             0 |
      | 3902 | monitor         | 10.96.1.12:37356                                         | NULL  | Query   |   29 | starting                              | select @@version_comment limit 1         |   28938 |         0 |             0 |
      ... 

      Possible workarounds are setting wsrep_sync_wait to 0 while the backup is running or migrating the script to use another MySQL connector, for example Go connector does not perform these queries when it connects to MySQL.

      Attachments

        Activity

          People

            slava.sarzhan Slava Sarzhan
            juan.arruti Juan Arruti
            Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Smart Checklist