Uploaded image for project: 'Percona Monitoring and Management'
  1. Percona Monitoring and Management
  2. PMM-6189

Disk Details Dashboard: Disk IO Size chart larger by factor of 512

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Done
    • Priority: Medium
    • Resolution: Fixed
    • Affects Version/s: 2.8.0
    • Fix Version/s: 2.9.1
    • Component/s: Grafana Dashboards
    • Labels:
      None
    • Story Points:
      0
    • Sprint:
      Platform Sprint 21
    • Needs Review:
      Yes
    • Needs QA:
      Yes

      Description

      Panel "Disk IO Size" on "OS/Disk Details" dashboard shows wrong data, multiplying average IO size by 512.

      Disk IO Size panel source
      sum(
      (rate(node_disk_read_bytes_total{node_name="$node_name", device=~"$device"}[$__interval]) * 512 / 
      rate(node_disk_reads_completed_total{node_name="$node_name", device=~"$device"}[$__interval])) > 0 or 
      (irate(node_disk_read_bytes_total{node_name="$node_name", device=~"$device"}[5m]) * 512 / 
      irate(node_disk_reads_completed_total{node_name="$node_name", device=~"$device"}[5m])) > 0
      )
      sum(
      (rate(node_disk_written_bytes_total{node_name="$node_name", device=~"$device"}[$__interval]) * 512 / 
      rate(node_disk_writes_completed_total{node_name="$node_name", device=~"$device"}[$__interval])) > 0 or 
      (irate(node_disk_written_bytes_total{node_name="$node_name", device=~"$device"}[5m]) * 512 / 
      irate(node_disk_writes_completed_total{node_name="$node_name", device=~"$device"}[5m])) > 0
      )
      

      Since early 2018 Prometheus switched these metrics to actual bytes instead of sectors. See https://github.com/prometheus/node_exporter/blob/master/collector/diskstats_linux.go#L105 and https://github.com/prometheus/node_exporter/pull/787

      Can be easily confirmed by running a FIO test on an otherwise-idle server.

      fio command
      # fio --name=randrw --rw=randrw -direct=1 --ioengine=libaio --bs=16k --numjobs=4 --rwmixread=30 --size=1G --runtime=1200 --group_reporting --time_based
      

      This should (and does) generate a steady stream of 16kb-sized IO requests, but Disk IO Size will show a steady stream of 8mb-sized IO requests, or exactly 512 times larger.

        Attachments

          Activity

            People

            Assignee:
            Unassigned
            Reporter:
            sergey.kuzmichev Sergey Kuzmichev
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - Not Specified
                Not Specified
                Logged:
                Time Spent - 1 hour, 45 minutes
                1h 45m

                  Smart Checklist