Details
-
Bug
-
Status: Done
-
Medium
-
Resolution: Fixed
-
2.8.0
-
None
-
0
-
Yes
-
Yes
Description
Panel "Disk IO Size" on "OS/Disk Details" dashboard shows wrong data, multiplying average IO size by 512.
Disk IO Size panel source
sum( (rate(node_disk_read_bytes_total{node_name="$node_name", device=~"$device"}[$__interval]) * 512 / rate(node_disk_reads_completed_total{node_name="$node_name", device=~"$device"}[$__interval])) > 0 or (irate(node_disk_read_bytes_total{node_name="$node_name", device=~"$device"}[5m]) * 512 / irate(node_disk_reads_completed_total{node_name="$node_name", device=~"$device"}[5m])) > 0 ) sum( (rate(node_disk_written_bytes_total{node_name="$node_name", device=~"$device"}[$__interval]) * 512 / rate(node_disk_writes_completed_total{node_name="$node_name", device=~"$device"}[$__interval])) > 0 or (irate(node_disk_written_bytes_total{node_name="$node_name", device=~"$device"}[5m]) * 512 / irate(node_disk_writes_completed_total{node_name="$node_name", device=~"$device"}[5m])) > 0 )
Since early 2018 Prometheus switched these metrics to actual bytes instead of sectors. See https://github.com/prometheus/node_exporter/blob/master/collector/diskstats_linux.go#L105 and https://github.com/prometheus/node_exporter/pull/787
Can be easily confirmed by running a FIO test on an otherwise-idle server.
fio command
# fio --name=randrw --rw=randrw -direct=1 --ioengine=libaio --bs=16k --numjobs=4 --rwmixread=30 --size=1G --runtime=1200 --group_reporting --time_based
This should (and does) generate a steady stream of 16kb-sized IO requests, but Disk IO Size will show a steady stream of 8mb-sized IO requests, or exactly 512 times larger.