[PT-1571] pt-secure-collect uses incorrect regex for hostname obfuscation Created: 25/Jun/18 Updated: 09/Jul/18 Resolved: 02/Jul/18
|Reporter:||Agustín Gallego||Assignee:||Carlos Salguero|
|Remaining Estimate:||0 minutes|
|Time Spent:||1 hour|
|Original Estimate:||Not Specified|
When using pt-secure-collect, some outputs are made unreadable by using an incorrect regex for the hostname detection.
this is what we get in output for 2018_06_25_19_49_32-top. See how in "load average" section we get two "hostname", and in RES and Time columns, we get one, for instance. In reality, the expected output should be:
This is because it's taking floating point numbers (in this case) as hostnames. Note that only the floating point numbers that are followed by another character, in this case, are substituted (see regex below).
The regex used is:
which in turn is used by:
and the replace function:
I haven't had time to check the regex further, but if I'm not mistaken, the following would match 0.01, for instance:
which is a subset of the regex mentioned above. We can use the following URL to test if this is the case or not:
If we use the last regex I sent, and the following string
you will see that the match is what is then seen above as substituted with "hostname".
How to reproduce:
Run the tool with the following command:
and use whatever password you want for encrypting. Then decrypt and decompress outputs, and check the files generated. In this case, the one for pt-stalk's `top` output was used: 2018_06_25_19_49_32-top. But this is seen in many other files, so an exhaustive check should be done (grep -R 'hostname' *; to check all files generated).
I'm setting as "high" priority, since it will make a lot of outputs have no meaning, which will not let us correctly assess server performance and will potentially mean we missed a window of action to capture data due to this.