Uploaded image for project: 'Percona Monitoring and Management'
  1. Percona Monitoring and Management
  2. PMM-5783

Bulk failure of SHOW ALL SLAVES STATUS scraping on PS/MySQL distributions triggers errors

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Done
    • Priority: High
    • Resolution: Fixed
    • Affects Version/s: 1.17.3, 2.5.0
    • Fix Version/s: 2.9.1
    • Component/s: MySQLd_Exporter
    • Labels:
      None
    • Story Points:
      3
    • Sprint:
      Platform Sprint 21
    • Needs Review:
      Yes
    • Needs QA:
      Yes

      Description

      The Problem:
      PMM clients running queries that are not valid for some Distributions.
      The following queries running against DB. Because these queries are failed in a day like 34560 times usual error monitoring raising alarms.

      List of known queries (Failed against Mysql/PS in different versions ):

      • SHOW ALL SLAVES STATUS
      • SHOW ALL SLAVES STATUS NONBLOCKING
      • SHOW ALL SLAVES STATUS NOLOCK
      • SELECT @@query_response_time_stats

      *Example of AuditLog: *

      <AUDIT_RECORD 
      NAME="Query" 
      RECORD="159_2020-04-16T09:00:01" 
      TIMESTAMP="2020-04-16T09:00:04 UTC" 
      COMMAND_CLASS="error" 
      CONNECTION_ID="1282167" 
      STATUS="1064" 
      SQLTEXT="SHOW ALL SLAVES STATUS NONBLOCKING" 
      USER="mysqladm[mysqladm] @ localhost []" 
      HOST="localhost" 
      OS_USER="" 
      IP="" 
      DB="" 
      /> 
      <AUDIT_RECORD 
      NAME="Query" 
      RECORD="160_2020-04-16T09:00:01" 
      TIMESTAMP="2020-04-16T09:00:04 UTC" 
      COMMAND_CLASS="error" 
      CONNECTION_ID="1282167" 
      STATUS="1064" 
      SQLTEXT="SHOW ALL SLAVES STATUS NOLOCK" 
      USER="mysqladm[mysqladm] @ localhost []" 
      HOST="localhost" 
      OS_USER="" 
      IP="" 
      DB=""
      

      Desired State:
      PMM executes queries without generating errors in the Audit log.

      Note: based on an initial discussion: we can consider a solution with a significant decrease in the number of errors if another implementation will be much harder

      ---------------------

      *Additional Technical details: *

      The following logic is used for scraping status of slaves:

      var slaveStatusQueries = [2]string{"SHOW ALL SLAVES STATUS", "SHOW SLAVE STATUS"}
      var slaveStatusQuerySuffixes = [3]string{" NONBLOCKING", " NOLOCK", ""}
      ...
      	for _, query := range slaveStatusQueries {
      		slaveStatusRows, err = db.QueryContext(ctx, query)
      		if err != nil { // MySQL/Percona
      			// Leverage lock-free SHOW SLAVE STATUS by guessing the right suffix
      			for _, suffix := range slaveStatusQuerySuffixes {
      				slaveStatusRows, err = db.QueryContext(ctx, fmt.Sprint(query, suffix))
      				if err == nil {
      					break
      				}
      			}
      		} else { // MariaDB
      			break
      		}
      	}
      
      

      For Percona Server/MySQL it's generating errors, which are later recorder by any audit plugin. This, as a result, can cause monitoring systems to trigger errors (industry standard).

        Smart Checklist

          Attachments

            Issue Links

              Activity

                People

                Assignee:
                Unassigned
                Reporter:
                iwo.panowicz Iwo Panowicz
                Votes:
                0 Vote for this issue
                Watchers:
                8 Start watching this issue

                  Dates

                  Created:
                  Updated:
                  Resolved:

                    Time Tracking

                    Estimated:
                    Original Estimate - Not Specified
                    Not Specified
                    Remaining:
                    Remaining Estimate - Not Specified
                    Not Specified
                    Logged:
                    Time Spent - 3 days, 3 hours, 20 minutes
                    3d 3h 20m