Uploaded image for project: 'Percona Server'
  1. Percona Server
  2. PS-5423

MyRocks - rocksdb_flush_log_at_trx_commit 0 and 2 seem to have the same behavior

    Details

    • Type: Bug
    • Status: On Hold
    • Priority: Medium
    • Resolution: Unresolved
    • Affects Version/s: Not 5.6, 5.7.23-25, 8.0.13-4
    • Fix Version/s: None
    • Component/s: MyRocks
    • Labels:
      None
    • Environment:

      Description

      I did a little testing with rocksdb_flush_log_at_trx_commit in order to get a better idea of how it works, especially considering the value of 0 in InnoDB where you are reliant on innodb_flush_log_at_timeout to help establish when log flushing should occur.

      For test 1 I set the value of  rocksdb_flush_log_at_trx_commit to 1, performed 10 inserts and checked for approximately 10 WAL syncs. No issues here.

       

      mysql> show global variables like 'rocksdb_flush_log_at_trx_commit'; show global variables like 'rocksdb_wal_bytes_per_sync';
      +---------------------------------+-------+
      | Variable_name                   | Value |
      +---------------------------------+-------+
      | rocksdb_flush_log_at_trx_commit | 1     |
      +---------------------------------+-------+
      1 row in set (0.01 sec)+----------------------------+-------+
      | Variable_name              | Value |
      +----------------------------+-------+
      | rocksdb_wal_bytes_per_sync | 0     |
      +----------------------------+-------+
      1 row in set (0.00 sec)mysql> show global status like 'rocksdb_wal%';                                   
      +-------------------------+--------+
      | Variable_name           | Value  |
      +-------------------------+--------+
      | rocksdb_wal_bytes       | 116383 |
      | rocksdb_wal_group_syncs | 0      |
      | rocksdb_wal_synced      | 64     |
      +-------------------------+--------+
      3 rows in set (0.00 sec)mysql> select count(*) from localtest.t1;
      +----------+
      | count(*) |
      +----------+
      |       49 |
      +----------+
      1 row in set (0.00 sec)mysql> show global status like 'rocksdb_wal%';
      +-------------------------+--------+
      | Variable_name           | Value  |
      +-------------------------+--------+
      | rocksdb_wal_bytes       | 116383 |
      | rocksdb_wal_group_syncs | 0      |
      | rocksdb_wal_synced      | 64     |
      +-------------------------+--------+
      3 rows in set (0.00 sec)mysql> select count(*) from localtest.t1;
      +----------+
      | count(*) |
      +----------+
      |       59 |
      +----------+
      1 row in set (0.00 sec)mysql> show global status like 'rocksdb_wal%';
      +-------------------------+--------+
      | Variable_name           | Value  |
      +-------------------------+--------+
      | rocksdb_wal_bytes       | 140003 |
      | rocksdb_wal_group_syncs | 0      |
      | rocksdb_wal_synced      | 75     |
      +-------------------------+--------+
      3 rows in set (0.00 sec)
      

       

      For my second test I set the value to 2 and performed 500 inserts. I also set rocksdb_wal_bytes_per_sync to a very high value to ensure no extra syncs would occur as a result of WAL byte writing. The insert batch took 19 seconds. There were 23 syncs. Accounting for human time I would consider this to be close enough. I also manually checked the rocksdb_wal_synced status variable during this time and during periods of no activity I still saw it increment once per second. This appears to be good as well.

       

      mysql> show global variables like 'rocksdb_flush_log_at_trx_commit'; show global variables like 'rocksdb_wal_bytes_per_sync';
      +---------------------------------+-------+
      | Variable_name                   | Value |
      +---------------------------------+-------+
      | rocksdb_flush_log_at_trx_commit | 2     |
      +---------------------------------+-------+
      1 row in set (0.01 sec)+----------------------------+-----------+
      | Variable_name              | Value     |
      +----------------------------+-----------+
      | rocksdb_wal_bytes_per_sync | 104857600 |
      +----------------------------+-----------+
      1 row in set (0.00 sec)mysql> select count(*) from localtest.t1;
      +----------+
      | count(*) |
      +----------+
      |      559 |
      +----------+
      1 row in set (0.00 sec)mysql> show global status like 'rocksdb_wal%';
      +-------------------------+---------+
      | Variable_name           | Value   |
      +-------------------------+---------+
      | rocksdb_wal_bytes       | 1315723 |
      | rocksdb_wal_group_syncs | 0       |
      | rocksdb_wal_synced      | 438     |
      +-------------------------+---------+
      3 rows in set (0.00 sec)[root@cent2 MySQLDataLoader-master]# time ./DataLoader.py --host=10.0.2.12 -d loc altest -u root -p password --safety_off --rows_per_insert=1 -r 500
      Loading information about localtest schema.  Please wait.
      Creating Data.  Please wait.real    0m19.441s
      user    0m1.353s
      sys     0m0.416s
      mysql> show global status like 'rocksdb_wal%';
      +-------------------------+---------+
      | Variable_name           | Value   |
      +-------------------------+---------+
      | rocksdb_wal_bytes       | 2491333 |
      | rocksdb_wal_group_syncs | 0       |
      | rocksdb_wal_synced      | 461     |
      +-------------------------+---------+
      3 rows in set (0.00 sec)
      

      For my third and final set I set the value of rocksdb_flush_log_at_trx_commit to 0. Per the documentation this indicates "Do not sync on transaction commit". However it doesn't really state when the sync should occur. For InnoDB this is controled by innodb_flush_log_at_timeout, but there doesn't seem to be a similar variable for MyRocks. I figured this might be controlled by rocksdb_wal_bytes_per_sync, but this did not appear to be the case.

      As soon as I configured this variable to 0 I monitored the status of rocksdb_wal_synced and noticed that it was increment once per second, just like value 2. I decided to test further.

      For test 4 I figured that In order to test the syncing behavior of rocksdb_wal_bytes_per_sync I need to write data where rocksdb_wal_bytes_per_sync is greater than the amount of data in 1 row, but less that the amount of data that can be written in 1 second. Per my initial single record write tests, 2461 bytes is written per record. Per a batch test 500 records (individually inserted) takes 20 seconds and consumed 1183053 WAL bytes, meaning it consumes about 60000 bytes per second.

       

      With this taken into consideration, assuming that rocksdb_wal_bytes_per_sync acts as the WAL in-memory buffer size, If I insert 500 records with a value of rocksdb_wal_bytes_per_sync = 30000 I should see 40 syncs for cache fill and 20 syncs for the timed flush, assuming the timed flush still happens

      This failed. Regardless of how I configured rocksdb_wal_bytes_per_sync, rocksdb_wal_synced always incremented once per second when the rocksdb_flush_log_at_trx_commit variable was set to 0

      In conclusion it looks like value 0 and 2 for rocksdb_flush_log_at_trx_commit do the same thing. It's possible that rocksdb_wal_bytes_per_sync is not part of the equation for this as this was just an educated guess on my part. I would suggest including a variable similar to innodb_flush_log_at_timeout, or at least update the documentation to better show under what circumstances the WAL will be synced with rocksdb_flush_log_at_trx_commit set to 0.

        Smart Checklist

          Attachments

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                Sylvester Peter Sylvester
              • Votes:
                1 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated: