Uploaded image for project: 'Percona Server'
  1. Percona Server
  2. PS-4706

MyRocks Data Being Silently Black-Holed After a Disk Full Incident

    XMLWordPrintable

    Details

      Description

      After a data partition gets full, MyRocks returns error to the client and logs errors to the log, but later once more disk space is added, it continues to accept following writes, which however are missing! No errors later - new rows silently disappear!

      Fully reproducible test case as follows.

      Install latest PS (5.7.22) sandbox and enable RocksDB engine:

       

      $ dbdeployer --sandbox-binary=percona deploy single ps5.7.22
      Database installed in $HOME/sandboxes/msb_ps5_7_22
      run 'dbdeployer usage single' for basic instructions'
      . sandbox server started
      $ percona/ps5.7.22/bin/ps-admin --enable-rocksdb -S /tmp/mysql_sandbox5723.sock --user=root --password=msandbox
      Checking if RocksDB plugin is available for installation ...
      INFO: ha_rocksdb.so library for RocksDB found at /home/przemek/sandboxes/percona/ps5.7.22/lib/mysql/plugin/ha_rocksdb.so.
      Checking RocksDB engine plugin status...
      INFO: RocksDB engine plugin is not installed.
      Installing RocksDB engine...
      INFO: Successfully installed RocksDB engine plugin.
      

      Prepare small 5GB partition and mount it under data directory:

       

      $ msb_ps5_7_22/stop 
      stop /home/przemek/sandboxes/msb_ps5_7_22
      $ mv msb_ps5_7_22/data msb_ps5_7_22/data2
      mkfs.ext4 -m10 /dev/przemek-dbg-vg/data1 
      $ sudo mount /dev/przemek-dbg-vg/data1 msb_ps5_7_22/data
      $ sudo chown przemek.przemek -R msb_ps5_7_22/data
      $ rsync -ax msb_ps5_7_22/data2/ msb_ps5_7_22/data/
      

      Limit the visible partition size using ext4 reserved space:

      # tune2fs -m50 /dev/przemek-dbg-vg/data1
      tune2fs 1.44.2 (14-May-2018)
      Setting reserved blocks percentage to 50% (655360 blocks)
      
      $ df -h /dev/mapper/przemek--dbg--vg-data1
      Filesystem Size Used Avail Use% Mounted on
      /dev/mapper/przemek--dbg--vg-data1 4.9G 142M 2.3G 6% /home/przemek/sandboxes/msb_ps5_7_22/data

      Start instance and create example databases:

      $ msb_ps5_7_22/start
      . sandbox server started
      mysql [localhost] {msandbox} (none) > create database db1;
      Query OK, 1 row affected (0.00 sec)
      
      mysql [localhost] {msandbox} (none) > create database db2;
      Query OK, 1 row affected (0.01 sec)
      

      Fill up some data with sysbench:

       

      $ sysbench /usr/share/sysbench/oltp_insert.lua --mysql_storage_engine=rocksdb --table-size=1000000 --tables=4 --mysql-db=db1 --mysql-user=root --mysql-password=msandbox --mysql-socket=/tmp/mysql_sandbox5723.sock --threads=4 --time=200 --report-interval=1 --events=0 --db-driver=mysql prepare
      sysbench 1.0.15 (using bundled LuaJIT 2.1.0-beta2)
      Initializing worker threads...
      Creating table 'sbtest4'...
      Creating table 'sbtest1'...
      Creating table 'sbtest2'...
      Creating table 'sbtest3'...
      Inserting 1000000 records into 'sbtest4'
      Inserting 1000000 records into 'sbtest3'
      Inserting 1000000 records into 'sbtest2'
      Inserting 1000000 records into 'sbtest1'
      Creating a secondary index on 'sbtest1'...
      Creating a secondary index on 'sbtest3'...
      Creating a secondary index on 'sbtest4'...
      Creating a secondary index on 'sbtest2'...
      

       

      Free disk space reduced:

      $ df -h /dev/mapper/przemek--dbg--vg-data1
      Filesystem Size Used Avail Use% Mounted on
      /dev/mapper/przemek--dbg--vg-data1 4.9G 1.7G 725M 70% /home/przemek/sandboxes/msb_ps5_7_22/data

      Second sysbench, filling second database, fails after a while due to max disk space reached:

       

      przemek@przemek-dbg:~/sandboxes$ sysbench /usr/share/sysbench/oltp_insert.lua --mysql_storage_engine=rocksdb --table-size=1000000 --tables=4 --mysql-db=db2 --mysql-user=root --mysql-password=msandbox --mysql-socket=/tmp/mysql_sandbox5723.sock --threads=4 --time=200 --report-interval=1 --events=0 --db-driver=mysql prepare
      sysbench 1.0.15 (using bundled LuaJIT 2.1.0-beta2)
      Initializing worker threads...
      Creating table 'sbtest3'...
      Creating table 'sbtest2'...
      Creating table 'sbtest4'...
      Creating table 'sbtest1'...
      Inserting 1000000 records into 'sbtest4'
      Inserting 1000000 records into 'sbtest3'
      Inserting 1000000 records into 'sbtest2'
      Inserting 1000000 records into 'sbtest1'
      Creating a secondary index on 'sbtest1'...
      FATAL: mysql_drv_query() returned error 1105 ([./.rocksdb/db2.sbtest1_k_1_4_0.bulk_load.tmp] bulk load error: IO error: No space left on deviceWhile appending to file: ./.rocksdb/000180.log: No space left on device) for query 'CREATE INDEX k_1 ON sbtest1(k)'
      FATAL: `sysbench.cmdline.call_command' function failed: /usr/share/sysbench/oltp_common.lua:238: SQL error, errno = 1105, state = 'HY000': [./.rocksdb/db2.sbtest1_k_1_4_0.bulk_load.tmp] bulk load error: IO error: No space left on deviceWhile appending to file: ./.rocksdb/000180.log: No space left on device
      Creating a secondary index on 'sbtest4'...
      Creating a secondary index on 'sbtest2'...
      Creating a secondary index on 'sbtest3'...
      FATAL: mysql_drv_query() returned error 1105 ([./.rocksdb/db2.sbtest4_k_4_5_0.bulk_load.tmp] bulk load error: IO error: No space left on deviceWhile appending to file: ./.rocksdb/000180.log: No space left on device) for query 'CREATE INDEX k_4 ON sbtest4(k)'
      FATAL: `sysbench.cmdline.call_command' function failed: /usr/share/sysbench/oltp_common.lua:238: SQL error, errno = 1105, state = 'HY000': [./.rocksdb/db2.sbtest4_k_4_5_0.bulk_load.tmp] bulk load error: IO error: No space left on deviceWhile appending to file: ./.rocksdb/000180.log: No space left on device
      FATAL: mysql_drv_query() returned error 1105 ([./.rocksdb/db2.sbtest2_k_2_6_0.bulk_load.tmp] bulk load error: IO error: No space left on deviceWhile appending to file: ./.rocksdb/000180.log: No space left on device) for query 'CREATE INDEX k_2 ON sbtest2(k)'
      FATAL: `sysbench.cmdline.call_command' function failed: /usr/share/sysbench/oltp_common.lua:238: SQL error, errno = 1105, state = 'HY000': [./.rocksdb/db2.sbtest2_k_2_6_0.bulk_load.tmp] bulk load error: IO error: No space left on deviceWhile appending to file: ./.rocksdb/000180.log: No space left on device
      FATAL: mysql_drv_query() returned error 1105 ([./.rocksdb/db2.sbtest3_k_3_7_0.bulk_load.tmp] bulk load error: IO error: No space left on deviceWhile appending to file: ./.rocksdb/000180.log: No space left on device) for query 'CREATE INDEX k_3 ON sbtest3(k)'
      FATAL: `sysbench.cmdline.call_command' function failed: /usr/share/sysbench/oltp_common.lua:238: SQL error, errno = 1105, state = 'HY000': [./.rocksdb/db2.sbtest3_k_3_7_0.bulk_load.tmp] bulk load error: IO error: No space left on deviceWhile appending to file: ./.rocksdb/000180.log: No space left on device
      

       

      MySQL keeps running so far. And disk space was:

      $ df -h /dev/mapper/przemek--dbg--vg-data1
      Filesystem Size Used Avail Use% Mounted on
      /dev/mapper/przemek--dbg--vg-data1 4.9G 2.2G 192M 93% /home/przemek/sandboxes/msb_ps5_7_22/data

      So let's "add" more disk space:

      $ sudo tune2fs -m30 /dev/przemek-dbg-vg/data1
      tune2fs 1.44.2 (14-May-2018)
      Setting reserved blocks percentage to 30% (393216 blocks)
      
      $ df -h /dev/mapper/przemek--dbg--vg-data1
      Filesystem Size Used Avail Use% Mounted on
      /dev/mapper/przemek--dbg--vg-data1 4.9G 2.2G 1.2G 65% /home/przemek/sandboxes/msb_ps5_7_22/data

      Now, let's try writing:

       

      mysql [localhost] {msandbox} (db1) > select count(*) from db2.sbtest4;
      +----------+
      | count(*) |
      +----------+
      | 352128 |
      +----------+
      1 row in set (0.12 sec)
      mysql [localhost] {msandbox} (db1) > insert into db2.sbtest4 (k) values (100),(200),(300);
      Query OK, 3 rows affected (0.00 sec)
      Records: 3 Duplicates: 0 Warnings: 0
      mysql [localhost] {msandbox} (db1) > 
      mysql [localhost] {msandbox} (db1) > select count(*) from db2.sbtest4;
      +----------+
      | count(*) |
      +----------+
      | 352128 |
      +----------+
      1 row in set (0.13 sec)
      mysql [localhost] {msandbox} (db1) > select * from db2.sbtest4 where k=100;
      Empty set (0.14 sec)
      

      No matter what we write, all new writes are accepted but LOST!

      On server restart, mysqld crashes and comes up without RocksDB engine active:

      mysql [localhost] {msandbox} (db1) > show create table sbtest1\G
      ERROR 1286 (42000): Unknown storage engine 'ROCKSDB'

      in error log we can see:

      2018-08-02T15:13:19.420666Z 0 [ERROR] RocksDB: Error opening instance, Status Code: 2, Status: Corruption: checksum mismatch
      2018-08-02T15:13:19.420731Z 0 [ERROR] Plugin 'ROCKSDB' init function returned error.
      2018-08-02T15:13:19.420751Z 0 [ERROR] Plugin 'ROCKSDB' registration as a STORAGE ENGINE failed.

      Full error log in attachment.

       

       

        Smart Checklist

          Attachments

            Issue Links

              Activity

                People

                • Assignee:
                  przemyslaw.malkowski@percona.com Przemyslaw Malkowski
                  Reporter:
                  przemyslaw.malkowski@percona.com Przemyslaw Malkowski
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  3 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved:

                    Time Tracking

                    Estimated:
                    Original Estimate - 0 minutes
                    0m
                    Remaining:
                    Remaining Estimate - 0 minutes
                    0m
                    Logged:
                    Time Spent - 45 minutes
                    45m