**Reported in Launchpad by Sveta Smirnova last update 15-12-2017 14:14:19
Originally reported at https://bugs.mysql.com/bug.php?id=85158
With master_info_repository=TABLE and variable sync_master_info=0, we should only see table mysql.slave_master_info updated on log file rotation. That is the documented and intended design, and can significantly reduce load on slaves, allowing much greater replication throughput. Assuming a similar issue would come up with FILE repo, but that ends up as fsyncs of the file vs writes and commits to the table. Using TABLE for context here.
However, there's a case where some code crosses paths. The code handling [already-executed] gtid event skips uses a dummy heartbeat (treated as a fake rotate event) to signal the end of a skip period. Sounds great, until you get to the slave handling code, which takes that event and forces a master info sync on every heartbeat - even these dummy ones.
This can, even in a somewhat simple setup, lead to a ~5X increase in write load on a slave. Huge amount of wasted resources, very limiting to replication throughput.
fake hb handling:
which in turn calls:
note "flush_master_info(mi, TRUE)" - true here being force flush, regardless of sync settings. force is only intended to be used for things like repo init, stop slave, and real log rotates.
How to repeat:
Generate a lot of nonsequential skips to demonstrate. A very basic msr setup will do that under load:
gtid on, of course
- Server A, writable master. msr replication from B and C.
- Servers B and C, backup masters. msr replication from A,C and A,B respectively.
- Server D, msr slave of all three, no log_bin (keep it fast).
Now, apply a significant write load to A (sysbench works fine) - I found it nice to organically lag B and C with load to A, but you can stop their sql threads for a period to simulate lag. If you do that, restart them both after a while.
I recommend stopping the writes to A here, so that D is caught up and not executing any row changes in the data - this helps to isolate load caused by these heartbeats.
Now, they'll be chugging through a lot of dupes and sending them down to D as heartbeats. In a pretty short test, I saw ~60 million heartbeats received on D (from p_s.replication_connection_status) - ~30M from each of B and C, only a few from A. Checking the same values on B and C, still just a few on each channel.
Now there's a lot of IO happening, considering D isn't actually executing any changes. Checking p_s.table_io_waits_summary_by_table, the vast majority of load is on slave_master_info. The wait count was 4X as much as for the actual data table being written to. The total iowait time was 6X as much.
Two ways to go about this, depending on the intentions of this process.
First, is it really necessary to send so many heartbeats just for gtid skips? I don't see that adding value, even if you do sync master info every trx. They're skips, they're specifically intended to do nothing. Eliminating them (on the rpl_master.cc side) would be the best solution - but that code was written for a purpose, someone must want them, a setting to control the behavior would be good. Or it might be very important for a reason I don't see in the code!
Second, if it's actually important, and even turning it off via a setting is impractical...take away the force sync. Change that "true" in rpl_slave.cc to "false" so it obeys the same rules other events do (the documented rules!)
Test case for MTR will be attached