• 0
pperreault

collector fail count

Question

One of our collectors is experiencing what seems to be connectivity issues. Common symptoms are it loses communication with the LM cloud, remote sessions to it or monitored devices fail to complete. I also notice that the collector heartbeat fail datapoint is increasing with time. I've seen it's value over 6000. Support hasn't been able to tell me what this value actual is other than providing developer notes, which are unfortunately unhelpful.  Can anyone provide some insight to what this failure count is actually counting? Has anyone seen and resolved this symptom?

We are planning on rebuilding the host server and recreating the collector.

Share this post


Link to post
Share on other sites

1 answer to this question

Recommended Posts

  • 0

I don't know about the heartbeat fail datapoint other then what the description says "Number of failed attempts to execute the heartbeat task" but what I've setup is for all our collectors to ping LM (x.logicmonitor.com), ping 8.8.8.8 and each collector pings all the other collectors. It helps us determine if for example the internet is down vs LM SaaS itself is down vs VPN down vs internal networking issues.

Perhaps it might even make sense to temporarily add the collector server as a resource a 2nd time but have another collector monitor it. But if you have the option to just rebuild the server and collector, that might just be the simplest option.

 

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Answer this question...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.