Uploaded image for project: 'Percona Monitoring and Management'
  1. Percona Monitoring and Management
  2. PMM-5547

PMM dashboards were failing when presenting data from more than 100 monitored instances.

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Done
    • Priority: Medium
    • Resolution: Done
    • Affects Version/s: 2.3.0
    • Fix Version/s: 2.5.0
    • Component/s: Grafana Dashboards
    • Labels:
      None
    • Story Points:
      2
    • Sprint:
      Platform Sprint 12, Platform Sprint 13
    • Needs Review:
      Yes
    • Needs QA:
      Yes
    • Needs Packaging:
      No
    • Needs Doc:
      No

      Description

      Issue:

      with 100+ instances in monitoring for dashboard like Nodes Overview, MySQL Instances Overview, Percona PMM home page where the default view will get information about all the servers will take forever to load graphs or for few, it will throw an error (see attachment).

      Grafana log will have follwing warnings,

       

      mysql-instances-overview : service_name=All&var-environment=All&var-cluster=All
      2020/03/06 10:10:52 http: proxy error: context canceled
      t=2020-03-06T10:10:52+0000 lvl=info msg="Request Completed" logger=context userId=1 orgId=1 uname=admin method=GET path=/api/datasources/proxy/1/api/v1/series status=502 remote_addr=127.0.0.1 time_ms=2 size=0 referer="https://<pmm-server-ip>/graph/d/mysql-instance-overview/mysql-instances-overview?orgId=1&refresh=1m"
      2020/03/06 10:11:13 http: proxy error: context canceled
      t=2020-03-06T10:11:13+0000 lvl=info msg="Request Completed" logger=context userId=1 orgId=1 uname=admin method=GET path=/api/datasources/proxy/1/api/v1/query_range status=502 remote_addr=127.0.0.1 time_ms=15 size=0 referer="https://<pmm-server-ip>/graph/d/mysql-instance-overview/mysql-instances-overview?orgId=1&from=now-12h&to=now&refresh=1m"

       

       

      Steps to reproduce:

      1. docker create -v /srv --name pmm-data percona/pmm-server:2.3.0 /bin/true
      1. docker run -d -p 80:80 -p 443:443 --volumes-from pmm-data --name pmm-server --restart always percona/pmm-server:2.3.0

      Configure PMM client

      1. pmm-admin config --server-insecure-tls --server-url=https://admin:admin@172.17.0.2:443

       

      For testing rather then installing 100+ mysql, start few mysql servers and add them in pmm monitoring with different service name.

      Option1: add it at once 

      for i in {1..102} ; do pmm-admin add mysql node_$i 127.0.0.1:3306 --username=msandbox --password=msandbox; done

      Option2: Add then in batch using 4 different mysql servers

       

      for i in {1..25} ; do pmm-admin add mysql node_$i 127.0.0.1:22604 --username=msandbox --password=msandbox; done
      for i in {26..50} ; do pmm-admin add mysql node_$i 127.0.0.1:22605 --username=msandbox --password=msandbox; done
      for i in {51..75} ; do pmm-admin add mysql node_$i 127.0.0.1:22606 --username=msandbox --password=msandbox; done
      for i in {76..102} ; do pmm-admin add mysql node_$i 127.0.0.1:22606 --username=msandbox --password=msandbox; done
      

       

       

      Access  PMM MySQL–> MySQL Instances Overview you will notice multiple errors, for a few graphs it will take forever to load graphs.

       

      grafana.log

       

      mysql-instances-overview : service_name=All&var-environment=All&var-cluster=All
      2020/03/06 10:10:52 http: proxy error: context canceled
      t=2020-03-06T10:10:52+0000 lvl=info msg="Request Completed" logger=context userId=1 orgId=1 uname=admin method=GET path=/api/datasources/proxy/1/api/v1/series status=502 remote_addr=127.0.0.1 time_ms=2 size=0 referer="https://<pmm-server-ip>/graph/d/mysql-instance-overview/mysql-instances-overview?orgId=1&refresh=1m"
      2020/03/06 10:11:13 http: proxy error: context canceled
      t=2020-03-06T10:11:13+0000 lvl=info msg="Request Completed" logger=context userId=1 orgId=1 uname=admin method=GET path=/api/datasources/proxy/1/api/v1/query_range status=502 remote_addr=127.0.0.1 time_ms=15 size=0 referer="https://<pmm-server-ip>/graph/d/mysql-instance-overview/mysql-instances-overview?orgId=1&from=now-12h&to=now&refresh=1m"
      
      -----Same issue when number of node(host) 100+
      nodes-overview : var-service_name=All&var-environment=All&var-cluster=All&var-replication_set=All
      http: proxy error: context canceled
      t=2020-03-02T23:17:27+0000 lvl=info msg="Request Completed" logger=context userId=3 orgId=1 uname=admin method=GET path=/api/datasources/proxy/1/api/v1/query_range status=502 remote_addr=127.0.0.1 time_ms=17 size=0 referer="https://<pmm-server-ip>/graph/d/node-instance-overview/nodes-overview?orgId=1&refresh=1m"
      http: proxy error: context canceled
      http: proxy error: context canceled
      http: proxy error: context canceled
      http: proxy error: context canceled
      http: proxy error: context canceled
      

       

      nginx.access.log
      46.149.95.219 - - [06/Mar/2020:11:28:17 +0000] "" 000 0 "" "" ""

      cleanup:
      for i in {1..100} ; do pmm-admin remove mysql node_$i; done

       

      Expected behavior:  should not throw errors  and should able to load graphs when host/instances count is more, like in this case it is 100+

      Suggested solution:
      Please try to add

      data['jsonData']['httpMethod']='POST'

      to https://github.com/percona/pmm-server/blob/master/import-dashboards.py#L200 and check if it works.

       

        Attachments

        1. grafana_dashboard_error1.png
          grafana_dashboard_error1.png
          150 kB
        2. grafana_mysql_instances_overview_error2.png
          grafana_mysql_instances_overview_error2.png
          167 kB
        3. image-2020-03-06-18-38-04-277.png
          image-2020-03-06-18-38-04-277.png
          85 kB
        4. mysql_102_instances.png
          mysql_102_instances.png
          168 kB
        5. pmm_dashboard_error.png
          pmm_dashboard_error.png
          217 kB
        6. pmm_dashboard_grafana_error2.png
          pmm_dashboard_grafana_error2.png
          457 kB
        7. Screenshot 2020-03-25 at 14.02.26.png
          Screenshot 2020-03-25 at 14.02.26.png
          419 kB
        8. Screenshot 2020-03-25 at 14.02.39.png
          Screenshot 2020-03-25 at 14.02.39.png
          386 kB
        9. working_dashboard_1.png
          working_dashboard_1.png
          267 kB
        10. Working_dashboard_2.png
          Working_dashboard_2.png
          233 kB

          Issue Links

            Activity

              People

              Assignee:
              Unassigned
              Reporter:
              lalit.choudhary Lalit Choudhary
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - Not Specified
                  Not Specified
                  Logged:
                  Time Spent - 2 hours, 20 minutes
                  2h 20m

                    Smart Checklist