This looks very similar to
PS-6939 and PS-6000 (customer is precisely on Percona server 5.7.26 where PS-6000 should not happen). I can very easily reproduce with a PS 5.7.29 sandbox set in replication:
So, with the audit plugin working, setup this simple RSS watch on both master and replica
Then simply run the following queries:
You should get something like:
And the RSS monitors in my case showed, on the master:
And in the replica:
We can bn certain that is not the buffer pool, as it's well beyond the size:
And we have high-degree of certain regarding the fact that is the audit plugin because:
1) Customer got the following on one of the writers:
So it was allocating all that memory (We are still working to find the statement)
2) We did perf record -e 'faults' during the time of the event and it shows clearly that the stack producing the most page faults is that related to audit_log_nofity (see attached).
3) UNINSTALL PLUGIN audit_log; immediately resulted in all memory being released.
- Sometimes repeating one of those large INSERTs on same table will not make RSS grow any more. But creating identical table with different name and INSERT'ing there does repeat the problem.
- Stopping the replication SQL thread clears most of the memory on replica (consistently)
- FLUSH TABLES appears to clear most memory on master (erratically)
No way to skip this particular plugin hook when it's being run within a replica thread? The plugin will dispose the event anyway as it does not logs replication event (And the documentation does not state what should be the behavior. Is more clear for ROW, but for STATEMENT...?)
In the master's audit log we see, for example: