Uploaded image for project: 'Percona Server for MySQL'
  1. Percona Server for MySQL
  2. PS-5789

Concurrent access attempts to non-shareable parallel doublewrite buffer shards

    XMLWordPrintable

    Details

      Description

      In 8.0, some code paths may result in concurrent threads accessing the same parallel doublewrite shard. Since the shards have no locking due to no-sharing requirement, this results in hard-to-track flushing failures, but luckily not during regular server operation.

      The main culprit is InnoDB initialization before the cleaner thread start, e.g. trx_rseg_adjust_rollback_segments called from srv_start. This may result in preflushing attempt by this thread, e.g.

      Thread 2 (Thread 0x7f23be7a3700 (LWP 47964)):
      #0  0x00007f23c35909f3 in futex_wait_cancelable (private=<optimized out>, expected=0, 
          futex_word=0x7f23b0019724) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
      #1  __pthread_cond_wait_common (abstime=0x0, mutex=0x7f23b00196d0, cond=0x7f23b00196f8)
          at pthread_cond_wait.c:502
      #2  __pthread_cond_wait (cond=cond@entry=0x7f23b00196f8, mutex=0x7f23b00196d0) at pthread_cond_wait.c:655
      #3  0x000056304dd208f1 in os_event::wait (this=this@entry=0x7f23b00196c8)
          at ../../../percona-server/storage/innobase/include/os0event.h:172
      #4  0x000056304dd20a6a in os_event::wait_low (this=0x7f23b00196c8, reset_sig_count=4, reset_sig_count@entry=0)
          at ../../../percona-server/storage/innobase/os/os0event.cc:135
      #5  0x000056304dd20af5 in os_event_wait_low (event=<optimized out>, reset_sig_count=reset_sig_count@entry=0)
          at ../../../percona-server/storage/innobase/os/os0event.cc:320
      #6  0x000056304de4d18e in buf_flush_wait_batch_end (buf_pool=buf_pool@entry=0x0, 
          type=type@entry=BUF_FLUSH_LIST) at ../../../percona-server/storage/innobase/buf/buf0flu.cc:2028
      #7  0x000056304de528f9 in buf_flush_sync_all_buf_pools ()
          at ../../../percona-server/storage/innobase/buf/buf0flu.cc:3305
      #8  0x000056304dce67ea in log_preflush_pool_modified_pages (log=..., new_oldest=new_oldest@entry=165991)
          at ../../../percona-server/storage/innobase/log/log0chkp.cc:717
      #9  0x000056304dce6853 in log_request_checkpoint (log=..., sync=sync@entry=true, lsn=165991)
          at ../../../percona-server/storage/innobase/log/log0chkp.cc:644
      #10 0x000056304dce2d81 in <lambda(bool)>::operator()(bool) const (__closure=__closure@entry=0x7f23be79b4c0)
          at ../../../percona-server/storage/innobase/log/log0buf.cc:647
      #11 0x000056304dce2eaf in ut_wait_for<log_free_check_wait(log_t&, sn_t)::<lambda(bool)> >(<lambda(bool)>, uint64_t, uint64_t) (condition=..., sleep=<optimized out>, spins_limit=0)
          at ../../../percona-server/storage/innobase/include/ut0ut.ic:141
      #12 0x000056304dce2f0f in log_free_check_wait (log=..., sn=<optimized out>)
          at ../../../percona-server/storage/innobase/log/log0buf.cc:653
      #13 0x000056304dcbfdc7 in log_free_check () at ../../../percona-server/storage/innobase/include/log0log.ic:228
      #14 0x000056304ddd9d26 in trx_rseg_create(unsigned int, unsigned long) ()
          at ../../../percona-server/storage/innobase/trx/trx0rseg.cc:373
      #15 0x000056304dddb0ce in trx_rseg_add_rollback_segments(unsigned int, unsigned long, Rsegs*, unsigned long*)
          () at ../../../percona-server/storage/innobase/trx/trx0rseg.cc:518
      #16 0x000056304dddb884 in trx_rseg_adjust_rollback_segments (target_rollback_segments=128)
          at ../../../percona-server/storage/innobase/trx/trx0rseg.cc:656
      #17 0x000056304ddb46b9 in srv_start(bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () at ../../../percona-server/storage/innobase/srv/srv0start.cc:2883
      #18 0x000056304dc85853 in innobase_init_files(dict_init_mode_t, List<Plugin_tablespace const>*) ()
          at ../../../percona-server/storage/innobase/handler/ha_innodb.cc:5554
      #19 0x000056304dc85f82 in innobase_ddse_dict_init(dict_init_mode_t, unsigned int, List<dd::Object_table const>*, List<Plugin_tablespace const>*) () at ../../../percona-server/storage/innobase/handler/ha_innodb.cc:12783
      #20 0x000056304d9ff02c in dd::bootstrap::DDSE_dict_init(THD*, dict_init_mode_t, unsigned int) ()
          at ../../percona-server/sql/dd/impl/bootstrap/bootstrapper.cc:746
      #21 0x000056304da010fe in dd::bootstrap::initialize(THD*) ()
          at ../../percona-server/sql/dd/impl/bootstrap/bootstrapper.cc:903
      #22 0x000056304d0f5a45 in bootstrap::handle_bootstrap (arg=arg@entry=0x563051161560)
          at ../../percona-server/sql/bootstrap.cc:347
      #23 0x000056304dbbaf1f in pfs_spawn_thread (arg=0x5630510f1240)
          at ../../../percona-server/storage/perfschema/pfs.cc:2836
      #24 0x00007f23c358a6db in start_thread (arg=0x7f23be7a3700) at pthread_create.c:463
      #25 0x00007f23c0af988f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      

      while the low log space situation would be at the same time noticed by a log checkpointer thread, and it would try do the same, i.e.

      Thread 15 (Thread 0x7f23aa7f6700 (LWP 48240)):
      #0  0x00007f23c35909f3 in futex_wait_cancelable (private=<optimized out>, expected=0, 
          futex_word=0x7f23b0019724) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
      #1  __pthread_cond_wait_common (abstime=0x0, mutex=0x7f23b00196d0, cond=0x7f23b00196f8)
          at pthread_cond_wait.c:502
      #2  __pthread_cond_wait (cond=cond@entry=0x7f23b00196f8, mutex=0x7f23b00196d0) at pthread_cond_wait.c:655
      #3  0x000056304dd208f1 in os_event::wait (this=this@entry=0x7f23b00196c8)
          at ../../../percona-server/storage/innobase/include/os0event.h:172
      #4  0x000056304dd20a6a in os_event::wait_low (this=0x7f23b00196c8, reset_sig_count=4, reset_sig_count@entry=0)
          at ../../../percona-server/storage/innobase/os/os0event.cc:135
      #5  0x000056304dd20af5 in os_event_wait_low (event=<optimized out>, reset_sig_count=reset_sig_count@entry=0)
          at ../../../percona-server/storage/innobase/os/os0event.cc:320
      #6  0x000056304de4d18e in buf_flush_wait_batch_end (buf_pool=buf_pool@entry=0x0, 
          type=type@entry=BUF_FLUSH_LIST) at ../../../percona-server/storage/innobase/buf/buf0flu.cc:2028
      #7  0x000056304de528f9 in buf_flush_sync_all_buf_pools ()
          at ../../../percona-server/storage/innobase/buf/buf0flu.cc:3305
      #8  0x000056304dce67ea in log_preflush_pool_modified_pages (log=..., new_oldest=new_oldest@entry=177545)
          at ../../../percona-server/storage/innobase/log/log0chkp.cc:717
      #9  0x000056304dce6d72 in log_consider_sync_flush (log=...)
          at ../../../percona-server/storage/innobase/log/log0chkp.cc:764
      #10 0x000056304dce7972 in log_checkpointer(log_t*)::{lambda()#1}::operator()() const [clone .isra.34] ()
          at ../../../percona-server/storage/innobase/log/log0chkp.cc:912
      #11 0x000056304dce7ab4 in log_checkpointer (log_ptr=0x7f23b86a8fc0)
          at ../../../percona-server/storage/innobase/log/log0chkp.cc:928
      #12 0x000056304dcf1427 in Runnable::operator()<void (*)(log_t*), log_t*> (this=0x7f23b87792a8, 
          f=<optimized out>, args#0=<optimized out>)
          at ../../../percona-server/storage/innobase/include/os0thread-create.h:86
      #13 0x00007f23c143c66f in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
      #14 0x00007f23c358a6db in start_thread (arg=0x7f23aa7f6700) at pthread_create.c:463
      #15 0x00007f23c0af988f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      

      Other callers - such as buf_flush_list_now_set in debug builds - are suspect too.

      This is visible on MTR as an intermittent hang on innodb_undo.i_s_files_32k, innodb.create_tablespace_64k, innodb.create_tablespace_32k as analyzed by Slava Sarzhan and Yura Sorokin

      This appears to be a 8.0 regression caused by interplay of upstream redo logging rewrite and our parallel doublewrite, but check 5.7 too.

        Attachments

          Activity

            People

            Assignee:
            Unassigned
            Reporter:
            laurynas.biveinis Laurynas Biveinis (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - Not Specified
                Not Specified
                Logged:
                Time Spent - 4 days, 29 minutes
                4d 29m

                  Smart Checklist