Uploaded image for project: 'Percona Monitoring and Management'
  1. Percona Monitoring and Management
  2. PMM-6189

Disk Details Dashboard: Disk IO Size chart larger by factor of 512

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Done
    • Priority: Medium
    • Resolution: Fixed
    • Affects Version/s: 2.8.0
    • Fix Version/s: 2.9.1
    • Component/s: Grafana Dashboards
    • Labels:
      None
    • Story Points:
      0
    • Sprint:
      Platform Sprint 21
    • Needs Review:
      Yes
    • Needs QA:
      Yes

      Description

      Panel "Disk IO Size" on "OS/Disk Details" dashboard shows wrong data, multiplying average IO size by 512.

      Disk IO Size panel source
      sum(
      (rate(node_disk_read_bytes_total{node_name="$node_name", device=~"$device"}[$__interval]) * 512 / 
      rate(node_disk_reads_completed_total{node_name="$node_name", device=~"$device"}[$__interval])) > 0 or 
      (irate(node_disk_read_bytes_total{node_name="$node_name", device=~"$device"}[5m]) * 512 / 
      irate(node_disk_reads_completed_total{node_name="$node_name", device=~"$device"}[5m])) > 0
      )
      sum(
      (rate(node_disk_written_bytes_total{node_name="$node_name", device=~"$device"}[$__interval]) * 512 / 
      rate(node_disk_writes_completed_total{node_name="$node_name", device=~"$device"}[$__interval])) > 0 or 
      (irate(node_disk_written_bytes_total{node_name="$node_name", device=~"$device"}[5m]) * 512 / 
      irate(node_disk_writes_completed_total{node_name="$node_name", device=~"$device"}[5m])) > 0
      )
      

      Since early 2018 Prometheus switched these metrics to actual bytes instead of sectors. See https://github.com/prometheus/node_exporter/blob/master/collector/diskstats_linux.go#L105 and https://github.com/prometheus/node_exporter/pull/787

      Can be easily confirmed by running a FIO test on an otherwise-idle server.

      fio command
      # fio --name=randrw --rw=randrw -direct=1 --ioengine=libaio --bs=16k --numjobs=4 --rwmixread=30 --size=1G --runtime=1200 --group_reporting --time_based
      

      This should (and does) generate a steady stream of 16kb-sized IO requests, but Disk IO Size will show a steady stream of 8mb-sized IO requests, or exactly 512 times larger.

        Smart Checklist

          Attachments

            Activity

              People

              Assignee:
              Unassigned
              Reporter:
              sergey.kuzmichev Sergey Kuzmichev
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - Not Specified
                  Not Specified
                  Logged:
                  Time Spent - 1 hour, 45 minutes
                  1h 45m