• 0

Ping Failing from collector to device and back?


Dominique
 Share

Question

Hello 
I am getting errors:
“VIPEIEEMP01 is suffering ping loss. 100.0% of pings are not returning, placing the host into critical state.”
“The host VIPEIEEMP01 is down. No data has been received”
 

 If I do a ping from the command prompt of the Client to the Collector  it works

The firewall is wide opened!!!

What did I miss?

Thanks,

Dom

Link to comment
Share on other sites

17 answers to this question

Recommended Posts

  • 1
6 minutes ago, Dominique said:
Hello 
I am getting errors:
“VIPEIEEMP01 is suffering ping loss. 100.0% of pings are not returning, placing the host into critical state.”
“The host VIPEIEEMP01 is down. No data has been received”
 

 If I do a ping from the command prompt of the Client to the Collector  it works

The firewall is wide opened!!!

What did I miss?

Thanks,

Dom

This is an old bug.  I tried (and keep trying) to get it fixed for years, but no luck so far.  The problem is that any firewall keeps a session table.  ping is not session-based like TCP, but firewalls still keep track of ICMP ID and use timers to invalidate sessions.  Same for UDP (SNMP).  The LM collector code is "lazy" and does not generate new session-equivalents for successive checks, so eventually the traffic is dropped by the firewall because it is matched to an invalid session.  You can workaround this by restarting the collector.  For SNMP they have recently added some knobs in the collector config to help, but for ICMP it is still messed up.

Link to comment
Share on other sites

  • 1
2 minutes ago, Mike Moniz said:

Interesting, I haven't run into that yet myself. I typically have collectors located on the same network segment as the devices being monitored so not hitting firewalls, but some situations do go thru some. Is this specific to Windows or Linux Collectors?

The world is passing LM by there -- most organizations these days are moving to internal compartmentalization, which means firewalls of some sort.  We have generally seen it for smaller remote sites that have no desire or facilities for a local collector.  I don't think it matters what the collector platform is since the checks are all the same (Java/Groovy).  I tried a lot with our CSM back in 2018 and while they agreed in principal, it was considered a "feature request" and well, we know where those generally end up. I was also treated to circular logic (determined to be a problem with Palo Alto firewalls only, which is completely untrue, and evidence was an LM support page saying that Palo Altos have that problem).

Link to comment
Share on other sites

  • 1
Guest Stuart Weenig
1 hour ago, Mike Moniz said:

have collectors located on the same network segment as the devices being monitored so not hitting firewalls

This is the recommended architecture.

1 hour ago, mnagel said:

most organizations these days are moving to internal compartmentalization, which means firewalls of some sort

If your firewalls are blocking legitimate business traffic, they need to not do that.

Link to comment
Share on other sites

  • 1
5 minutes ago, Stuart Weenig said:

This is the recommended architecture.

If your firewalls are blocking legitimate business traffic, they need to not do that.

Folks are not going to place collectors in every subnet and due to increased security concerns, there will be more and more situations where this will be an issue.

As far as "blocking legitimate traffic" that is not what is happening here (OP specifically said the firewall was wide open).  It is allowing the traffic, but firewalls track sessions and due to bad programming, LM triggers firewalls to block traffic in some cases.  For example, we had a remote location (all WAN sites transit firewalls, a very common architecture) that had suffered a power outage.  Pings began failing because LM reuses the same ICMP ID forever and the original session established previously was no longer valid.

As I mentioned, I escalated this to our CSM in 2018 and got back "I get it, but you need to open a feature request".  Since then, someone in LM has at least figured out this needs support for SNMP -- this is what we were provided and it works fairly well (that seems to only target SNMPv3, but we tend to use that when possible so it is OK).

By default, the collector does not change the SNMP library session until a collector restart, which is why that resolves the issue. You may be able to work around this by adjusting the following fields in the collector debug.

snmp.shareThreads.impl.v3.switchport.enable=true
snmp.shareThreads.impl.v3.initialCheckDelay.minutes=3

 

We get frequent annoyed tickets from clients who are told by LM that a host is not responding to ping, which is trivially proved wrong by them.  Our only solution when it happens is to restart the collector.  You know, rather than LM fixing broken code.

Link to comment
Share on other sites

  • 1
Guest Stuart Weenig
20 minutes ago, mnagel said:

triggers firewalls to block traffic in some cases

I get and agree that LM is causing the case, but if the firewall is eventually blocking the traffic, the firewall blocking the legitimate traffic (that looks illegitimate). Do you know the FR number?

Link to comment
Share on other sites

  • 1
19 minutes ago, Stuart Weenig said:

I get and agree that LM is causing the case, but if the firewall is eventually blocking the traffic, the firewall blocking the legitimate traffic (that looks illegitimate). Do you know the FR number?

It is no longer legitimate traffic after the "session" is invalid -- this is common firewall behavior that would be avoided by using a fresh ICMP ID for each ping check, like everyone else does.  

FR number?  When did that start being a thing?  I thought you just create them in the forum and cross your fingers someone sees them.  If feedback, then no, those are usually one-way -- I know tickets are generated internally because one of our CSMs actually shared them with me to help prioritize, but usually they are invisible with no followup (with one exception historically for API issues when Sarah Terry was there). The ticket ID for the last time I tried to get help on this is 107847 (last updated 7/23/2018).  The SNMP info above came from a more recent interaction by someone else on my team -- not sure of the ticket ID on that one.

Link to comment
Share on other sites

  • 1
Guest Stuart Weenig

We're arguing semantics now, so i'll bow out.

As far as FR numbers, if you only put it here in the community, it likely didn't get entered into the system, so product didn't even know about it. If you spoke to your CSM about it, they would have put it into the system and good CSMs keep track of those entries. If you did it through the feedback system in the product, it would have made it into the system, but your CSM might not have seen it. Granted, the FR system needs a major overhaul. It's one of the big focuses of our upcoming focus on community (including a new hosting platform). 

Link to comment
Share on other sites

  • 1
9 minutes ago, mnagel said:

It is no longer legitimate traffic after the "session" is invalid -- this is common firewall behavior that would be avoided by using a fresh ICMP ID for each ping check, like everyone else does.  

FR number?  When did that start being a thing?  I thought you just create them in the forum and cross your fingers someone sees them.  If feedback, then no, those are usually one-way -- I know tickets are generated internally because one of our CSMs actually shared them with me to help prioritize, but usually they are invisible with no followup (with one exception historically for API issues when Sarah Terry was there). The ticket ID for the last time I tried to get help on this is 107847 (last updated 7/23/2018).  The SNMP info above came from a more recent interaction by someone else on my team -- not sure of the ticket ID on that one.

I have been told "Other ticket numbers for the ping/SNMP issue are 286866 and the latest - 337366."  Generally we are told to open a FR each time.

Link to comment
Share on other sites

  • 1
2 minutes ago, Stuart Weenig said:

We're arguing semantics now, so i'll bow out.

As far as FR numbers, if you only put it here in the community, it likely didn't get entered into the system, so product didn't even know about it. If you spoke to your CSM about it, they would have put it into the system and good CSMs keep track of those entries. If you did it through the feedback system in the product, it would have made it into the system, but your CSM might not have seen it. Granted, the FR system needs a major overhaul. It's one of the big focuses of our upcoming focus on community (including a new hosting platform). 

You can say it is semantics, but ICMP is connectionless and generally firewalls need to do inspection to identify sessions.  For ICMP that is the ICMP ID that together with the src and dst address allow firewalls to allow an echo reply response after seeing the outgoing echo.  An echo reply that does not match will be dropped since an unsolicited ICMP packet should not just be sent to targets.  Because LM has this bug where it uses the same ICMP ID for all ping checks, it trips firewalls that do inspection. If you argue folks should not use firewalls internally, that is battling windmills -- it is very common and getting more so to limit lateral attacks.  That each of our tickets has generated zero understanding and a punt to open a feature request is just sad.  I am not going to get into the general abilities of our successive CSMs, but if you look at my first referenced you will see the way these things tend to end up.

Link to comment
Share on other sites

  • 1
Guest Stuart Weenig

How about this: you come to Elevate in June and we'll get some fluffy sumo wrestling suits on and duke it out? haha. We'll make one of the sales guys expense it. We'll collapse in the end and realize we're agreeing with each other.

Link to comment
Share on other sites

  • 1
41 minutes ago, Michael Rodrigues said:

Collector PM here. Quick glance at tickets suggests this ICMP firewall issue realization never made it to the collector team. I do recall the similar issue with SNMP.

I'll bring this up with the collector team tomorrow night. While best not to ping through a FW, if all we have to do to fix it is randomize the ICMP ID, that seems like a reasonable ask. I'm finding old TS tickets that appear to be this same issue, whether it was recognized at the time or not.


Sarah Terry is still here, by the way, she is Senior Director of Product these days.

 

Thank you for jumping in!  I did not have any idea where to acquire a Sumo suit :).

Link to comment
Share on other sites

  • 0

Interesting, I haven't run into that yet myself. I typically have collectors located on the same network segment as the devices being monitored so not hitting firewalls, but some situations do go thru some. Is this specific to Windows or Linux Collectors?

Link to comment
Share on other sites

  • 0
  • LogicMonitor Staff

Collector PM here. Quick glance at tickets suggests this ICMP firewall issue realization never made it to the collector team. I do recall the similar issue with SNMP.

I'll bring this up with the collector team tomorrow night. While best not to ping through a FW, if all we have to do to fix it is randomize the ICMP ID, that seems like a reasonable ask. I'm finding old TS tickets that appear to be this same issue, whether it was recognized at the time or not.


Sarah Terry is still here, by the way, she is Senior Director of Product these days.

Link to comment
Share on other sites

  • 0
On 4/1/2022 at 3:18 PM, Mike Moniz said:

Interesting, I haven't run into that yet myself. I typically have collectors located on the same network segment as the devices being monitored so not hitting firewalls, but some situations do go thru some. Is this specific to Windows or Linux Collectors?


We've already faced this internally with some customers too (where having 1 collector per subnet is not feasible - due to the size, licensing, etc...).
If I'm not mistaken the OS is irrelevant to this issue (since it's the way LM is working currently).

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Answer this question...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share