Details
-
Bug
-
Status: Done
-
High
-
Resolution: Fixed
-
2.1.0, 2.2.0
-
None
-
2
-
Platform Sprint 15
-
Yes
-
Yes
Description
User Impact:
AVG values for Query time in Details and Profile sections are different. one of them is wrong, so user can't see real numbers
Reproducible test case:
# pmm-admin add mysql ps5728 --query-source=slowlog --username=msandbox --password=msandbox 127.0.0.1:5728 --environment=bugtest MySQL Service added. Service ID : /service_id/d4324fc4-6504-4bd3-ae32-fcde18a9920a Service name: ps5728 Load some data: sysbench /usr/share/sysbench/oltp_read_write.lua --mysql_storage_engine=innodb --table-size=10000 --tables=1 --mysql-db=test --mysql-user=msandbox --mysql-password=msandbox --mysql-socket=/tmp/mysql_sandbox5728.sock --threads=10 --time=900 --report-interval=1 --events=0 --db-driver=mysql prepare sysbench /usr/share/sysbench/oltp_read_write.lua --mysql_storage_engine=innodb --table-size=10000 --tables=1 --mysql-db=test --mysql-user=msandbox --mysql-password=msandbox --mysql-socket=/tmp/mysql_sandbox5728.sock --threads=10 --time=100 --report-interval=1 --events=0 --db-driver=mysql run
Example1: Difference in average time with something wrong with the decimal point or units.
QAN Query: select c from sbtest1 where id=?
Example2:
QAN Query:update sbtest1 set k=k? where id=?
Expected Behavior:
Avg values should the same and correct at both places.
Original report:
---------
There are consistently latency reporting inconsistencies in QAN, generally with the average.
I've included an example where the individual query stats are clearly wrong, where the average is higher than the p99. In the second screenshot, it appears to be correct when hovering on the chart in the list, and the average is a completely different number.
In this next example, it's a similar problem with the average, but this time the average isn't a completely different number, it appears to be something wrong with the decimal point or units.
DOD:
The average values in object details should be the same as in the top table SUM({metric_name}_sum)/SUM(query_numbers)
Suggested implementation:
Move calculation of average from Go code into ClickHouse query.
Instead of SUM({metric_name}_sum)/SUM({metric_name}_cnt) should be SUM({metric_name}_sum)/SUM(query_numbers)