Details
-
Bug
-
Status: Done
-
Medium
-
Resolution: Cannot Reproduce
-
2.26.0
-
None
-
None
-
1
-
Yes
-
Yes
-
Yes
-
C/S Core
Description
PMM periodically shows the bogus status "Exporter is not Connected" at the "ReplSet States" graph on the "MongoDB ReplSet Summary" dashboard.
How to Repeat
- Setup 3-nodes MongoDB replica set, for example, with mlaunch
- Install PMM
- Setup agents. At this step it does not matter if they can connect to MongoDB instances or not. They just need to be added to PMM.
sudo pmm-admin add mongodb --replication-set=myset --port=27017 127.0.0.1:27017 --service-name=automongo --metrics-mode=auto sudo pmm-admin add mongodb --replication-set=myset --port=27018 127.0.0.1:27018 --service-name=pushmongo --metrics-mode=push sudo pmm-admin add mongodb --replication-set=myset --port=27019 127.0.0.1:27019 --service-name=pullmongo --metrics-mode=pull
- Wait 5 minutes or more, then remove these agents:
sudo pmm-admin remove mongodb pullmongo sudo pmm-admin remove mongodb pushmongo sudo pmm-admin remove mongodb automongo
- Add agents with the same names.
sudo pmm-admin add mongodb --replication-set=myset --port=27017 127.0.0.1:27017 --service-name=automongo --metrics-mode=auto sudo pmm-admin add mongodb --replication-set=myset --port=27018 127.0.0.1:27018 --service-name=pushmongo --metrics-mode=push sudo pmm-admin add mongodb --replication-set=myset --port=27019 127.0.0.1:27019 --service-name=pullmongo --metrics-mode=pull
- Now ensure that agents can connect to MongoDB and PMM shows status for them on the "MongoDB ReplSet Summary" dashboard.
- So far so good.
- Wait for 12 hours, so agents that you removed do not show up for the "Last 12 hours" interval. The bug can be seen in the "Last 6 hours" and "Last 3 hours" intervals too but larger the interval you take, more chances to see the issue you have.
- Now is the tricky part: replicas need to change PRIMARY. I tried to force this operation manually, but the most reliable way to repeat the issue is to put laptop to the sleep mode, then wake it up. Then eventually you will see the picture as in the attached screenshot bogus_status.png: pushmongo is PRIMARY, pullmongo is SECONDARY, and automongo is in the bogus status "Exporter is not Connected".
- Now check data for the server that is in the bogus status, wait for 5 minutes (for 12 hours interval) or 1 minute (for 3 hours interval as in my example) and check data again. Sort data by time descending. You will see that it does not have data only for the upper row, no matter which time it is now. Screenshot 20-28-18.png, then 20-29-20.png
- Now go to the smaller interval, say, 15 minutes and see that it shows the status of the same server properly. Data for the last moment is also present. Screenshots last15.png and last15_data.png
- So I would say this is not exporter issue, rather how PMM uses low and medium resolution metrics