Uploaded image for project: 'Percona Monitoring and Management'
  1. Percona Monitoring and Management
  2. PMM-5783

Bulk failure of SHOW ALL SLAVES STATUS scraping on PS/MySQL distributions triggers errors

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Done
    • Priority: High
    • Resolution: Fixed
    • Affects Version/s: 1.17.3, 2.5.0
    • Fix Version/s: 2.9.1
    • Component/s: MySQLd_Exporter
    • Labels:
      None
    • Story Points:
      3
    • Sprint:
      Platform Sprint 21
    • Needs Review:
      Yes
    • Needs QA:
      Yes

      Description

      The Problem:
      PMM clients running queries that are not valid for some Distributions.
      The following queries running against DB. Because these queries are failed in a day like 34560 times usual error monitoring raising alarms.

      List of known queries (Failed against Mysql/PS in different versions ):

      • SHOW ALL SLAVES STATUS
      • SHOW ALL SLAVES STATUS NONBLOCKING
      • SHOW ALL SLAVES STATUS NOLOCK
      • SELECT @@query_response_time_stats

      *Example of AuditLog: *

      <AUDIT_RECORD 
      NAME="Query" 
      RECORD="159_2020-04-16T09:00:01" 
      TIMESTAMP="2020-04-16T09:00:04 UTC" 
      COMMAND_CLASS="error" 
      CONNECTION_ID="1282167" 
      STATUS="1064" 
      SQLTEXT="SHOW ALL SLAVES STATUS NONBLOCKING" 
      USER="mysqladm[mysqladm] @ localhost []" 
      HOST="localhost" 
      OS_USER="" 
      IP="" 
      DB="" 
      /> 
      <AUDIT_RECORD 
      NAME="Query" 
      RECORD="160_2020-04-16T09:00:01" 
      TIMESTAMP="2020-04-16T09:00:04 UTC" 
      COMMAND_CLASS="error" 
      CONNECTION_ID="1282167" 
      STATUS="1064" 
      SQLTEXT="SHOW ALL SLAVES STATUS NOLOCK" 
      USER="mysqladm[mysqladm] @ localhost []" 
      HOST="localhost" 
      OS_USER="" 
      IP="" 
      DB=""
      

      Desired State:
      PMM executes queries without generating errors in the Audit log.

      Note: based on an initial discussion: we can consider a solution with a significant decrease in the number of errors if another implementation will be much harder

      ---------------------

      *Additional Technical details: *

      The following logic is used for scraping status of slaves:

      var slaveStatusQueries = [2]string{"SHOW ALL SLAVES STATUS", "SHOW SLAVE STATUS"}
      var slaveStatusQuerySuffixes = [3]string{" NONBLOCKING", " NOLOCK", ""}
      ...
      	for _, query := range slaveStatusQueries {
      		slaveStatusRows, err = db.QueryContext(ctx, query)
      		if err != nil { // MySQL/Percona
      			// Leverage lock-free SHOW SLAVE STATUS by guessing the right suffix
      			for _, suffix := range slaveStatusQuerySuffixes {
      				slaveStatusRows, err = db.QueryContext(ctx, fmt.Sprint(query, suffix))
      				if err == nil {
      					break
      				}
      			}
      		} else { // MariaDB
      			break
      		}
      	}
      
      

      For Percona Server/MySQL it's generating errors, which are later recorder by any audit plugin. This, as a result, can cause monitoring systems to trigger errors (industry standard).

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              Unassigned
              Reporter:
              iwo.panowicz Iwo Panowicz
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - Not Specified
                  Not Specified
                  Logged:
                  Time Spent - 3 days, 3 hours, 20 minutes
                  3d 3h 20m

                    Smart Checklist