Details
Description
Problem
When PXC operator crashes, we should be able to have as much information as possible to analyze the crash.
- Coredumps - It comes disabled by default at kernel level. In order to enable it we need to ajust kernel variables which require users to run in privileged mode. This won't be an option on multiple vendors.
- Unresolved stacktraces - Shipping binary with symbols, this will increase the image size and operators are really sensitive to space constraints
Proposed Solution
Unresolved stacktraces
We decided to print extra information under the stack trace in order to be able to analyze it offline. Since operators may rotate the log, that information should be part of the stack trace. New information is:
- Print binary buildID.
- Print MySQL version.
Coredumps
- We are using https://code.google.com/archive/p/google-coredumper/ . The original source code didn't compiled. zsolt.parragi fixed the issues and created a fork containing the fixes - https://github.com/dutow/coredumper which now has officially moved to https://github.com/Percona-Lab/coredumper
- The idea is to have it as an experimental feature as a submodule located at Percona-Lab.
- We will add compiler flags to control when this is enabled or disabled.
- We will link it statically in order to packaging.
- The default behavior when code-file is enabled will be relay on kernel core dump.
- A new variable will be introduce to control if library coredump will be used. This variable will be like log-bin. If specified, it will save core.timestamp file under the datadir. If a path is specified, the coredump will be saved under the specified path.