Uploaded image for project: 'Percona Server'
  1. Percona Server
  2. PS-3998

Investigate how to take in Facebook MySQL rpl writebatch

    XMLWordPrintable

    Details

    • Type: Admin & Maintenance Task
    • Status: Done
    • Priority: Low
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.7.21-21
    • Component/s: MyRocks
    • Labels:

      Description

      Upstream Facebook has a feature implemented here: https://github.com/facebook/mysql-5.6/commit/cd1496323cfe587b25ddc7e26abd2d60bb2737de

      This feature allows MyRocks to use some special writebatches through RocksDB when running as a replica slave and making lots of assumptions.

      The feature breaks a slave if the master binlog format is not ROW and is fixed here:

      https://github.com/facebook/mysql-5.6/commit/1834a84b2f6e64453a5b46848c05eddd1338aa2d

      The problem is that this fix is invasive to the server code specifically to try to detect if the slave thread is receiving non ROW events and this optimization is enabled. This optimization is very specific to MyRocks/RocksDB and this fix moved a MyRocks option up into server space.

      We need to determine if we want to:
      1 - Accept the full upstream feature, both server and MyRocks portions
      2 - Try to keep the original feature but implement wholly within MyRocks

      • How would MyRocks be able to detect that non ROW events are coming in and stop the slave thread? Maybe instead we can introduce the !ROW detection logic in the second patch and expose a new handler virtual method like "virtual boo rpl_can_handle_event(rowtype)". I don't know much if this is easy or possible at that layer of the code.
        3 - Keep the original faulty feature in MyRocks and document the issues
        4 - Remove the feature entirely.

      Unless there are some clear answers to option 2 that can be done in a relatively short time and port cleanly to 5.7, I am inclined to go for option 4 for now and and put it to Facebook to fix the feature properly with minimal involvement of the server code.

      Laurynas Biveinis may I borrow your knowledge of the replication code for a few minutes to form an opinion on the cleanest route to proceed? The promise of 50% potential performance gain is too much to dismiss but the threat of broken replication due to an unwary user is a showstopper IMHO.

        Smart Checklist

          Attachments

            Activity

              People

              • Assignee:
                george.lorch George Lorch
                Reporter:
                george.lorch George Lorch
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0 minutes
                  0m
                  Logged:
                  Time Spent - 1 day, 1 hour, 15 minutes
                  1d 1h 15m