• 1
Vitor Santos

Cisco EIGRP Peer alarm(s) not being supressed?

Question

Hello,

We've noticed the Cisco EIGRP PeerDown alarm(s) aren't being suppressed if the actual device goes down on LM.
When lost SNMP connectivity to one of our routers, it started returning PeerDown alarms (since SNMP wasn't responding, causing the 'NoData' condition at the 'upTime' datapoint).
This becomes an issue because the actual datapoint that checks the Peer status, bases itself on the data retrieved by the 'upTime' datapoint (which at this point, is as 'NoData).

image.thumb.png.e03693a6341866d67de1ccdeeb615e86.png

Basically, if the 'upTime' doesn't return data (which happens if the actual device goes down) it'll trigger an alarm for the PeerDown instances (since it'll always return False).
LogicMonitor only sees the actual device as 'down' after 5 minutes (when not retrieving data). This DS will alarm first (since the PeerDown will return an alarm on 2 consecutive tools - which means 3 minutes).

As per the documentation, all the alarm(s) emanating from the host will be suppressed. My question here (just to make sure) is, this will only be the case for alarms that hit 'AFTER' the host down condition correct?
If that's true, how can we surpass this without having to increase the time that 'PeerDown' alarms took to appear in the console?

Is there any type of expression that we can use in that ComplexDatapoint (instead of the current one).
Because, currently the fact of this device being down, caused 100 alarm(s) on the console (since it's a central point for our EIGRP routing).

Thank you!

Regards,

  • Upvote 2

Share this post


Link to post
Share on other sites

9 answers to this question

Recommended Posts

  • 0

That is my understanding too, LM has server-side logic to declare a device dead after 6 minutes (but Host Status will alert after 5min), so any alerts that occur before those 6 minutes will cause notifications.

PeerDown is using the un() function so it's specifically looking if it's NaN or not. I don't know how this particular DataSource or Cisco EIGRP works so I'm not clear if upTime can tell the difference between peer down or switch down, there might be a trick to do so. But in a more generic solution and since this is a script based DataSource, I likely would add a new DataPoint and code for something like snmpDown that reports 1 if snmp isn't working (aka device will be dead soon) and then modify the PeerDown to also check if snmp is working before alerting.

Share this post


Link to post
Share on other sites
  • 0

After checking the OIDs I don't believe the upTime can tell that difference.
I'll try to leverage that 'general' change & see if it works for us. That's a great idea!

Basically we could just add a new complex datapoint (via groovy) & try to poll a basic OID. If it doesn't return data, then assume snmp isn't replying (snmpDown == 1).
From there just tweak the actual PeerDown to actually have that value in mind before returning 0. 

Am I in the right path? Or you had something more simple in mind?

Thank you anyway for the input on this !

Share this post


Link to post
Share on other sites
  • 0
Posted (edited)

That's the basic idea. You can't make complex datapoint via groovy so snmpDown would be a normal datapoint which you can then refer to it in PeerDown. Also I think you can just wrap the snmp.get/walk line or section in a try/catch and that will let you know the snmp request failed.

Edited by Mike Moniz

Share this post


Link to post
Share on other sites
  • 0

Ok so I've added that try, except on the actual script.

So it pretty much returns 0 if the SNMP portion goes well & returns 1 if it catches the timeout exception.
Just added the actual SNMP walk code into the try{} & added the one below as catch()

image.png.72b878fbfb4f6cad20bda6795a42cc3f.png

So now we're able to know if SNMP isn't working. I'm kinda lost on what to do at the 'PeerDown' datapoint (in terms of expressions). Can you help?
Never used the complex datapoint features before.

Share this post


Link to post
Share on other sites
  • 0

Basically I want to do what the PeerDown expression currently does:
image.png.4a2ab3661482ca3ff853223476e63c8a.png
 

Only if the snmpDown == 0, else, return 2 (or something != than 0)

Share this post


Link to post
Share on other sites
  • 0

You can nest if's together in the same kinda way you do in Excel. This is just off the top of my head and untested, but you would do something like:

if(snmpDown,1,if(un(upTime),0,1))

 

Share this post


Link to post
Share on other sites
  • 0
Posted (edited)

Ok, so I ended up doing it like this:

- if(eq(snmpDown,1),2,if(un(upTime),0,1))

It does the trick, thank you! 

I've disabled the SNMP on the device (to force the condition), however, LM doesn't see that device as dead.
What's exactly needed for LM to consider the 'Device Dead'? It relies on ICMP as well?

Edited by Vitor Santos

Share this post


Link to post
Share on other sites
  • 0

It will rely on ICMP and other things too...

"...the idleinterval datapoint within the HostStatus DataSource measures the amount of time in seconds since the LogicMonitor Collector was able to collect data from your host via built-in collection methods (SNMP, Ping, WMI, ESX, etc.)... Note that data collected by script DataSources does not affect the value of the idleinterval datapoint."

https://www.logicmonitor.com/support/logicmodules/datasources/creating-managing-datasources/host-status-host-behavior/

Share this post


Link to post
Share on other sites
  • 0
2 hours ago, Mike Moniz said:

It will rely on ICMP and other things too...

"...the idleinterval datapoint within the HostStatus DataSource measures the amount of time in seconds since the LogicMonitor Collector was able to collect data from your host via built-in collection methods (SNMP, Ping, WMI, ESX, etc.)... Note that data collected by script DataSources does not affect the value of the idleinterval datapoint."

https://www.logicmonitor.com/support/logicmodules/datasources/creating-managing-datasources/host-status-host-behavior/

 

Got it. I'll differ this internally, because this could be an issue for us.
We've clients that don't give us ICMP access on purpose (but then we've SNMP access).

Thank you for the info!

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Answer this question...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.