Vitor Santos

Members
  • Content Count

    53
  • Joined

  • Last visited

  • Days Won

    7

Community Reputation

24 Excellent

2 Followers

About Vitor Santos

  • Rank
    Explorer

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. Got it. I'll differ this internally, because this could be an issue for us. We've clients that don't give us ICMP access on purpose (but then we've SNMP access). Thank you for the info!
  2. Ok, so I ended up doing it like this: - if(eq(snmpDown,1),2,if(un(upTime),0,1)) It does the trick, thank you! I've disabled the SNMP on the device (to force the condition), however, LM doesn't see that device as dead. What's exactly needed for LM to consider the 'Device Dead'? It relies on ICMP as well?
  3. Basically I want to do what the PeerDown expression currently does: Only if the snmpDown == 0, else, return 2 (or something != than 0)
  4. Ok so I've added that try, except on the actual script. So it pretty much returns 0 if the SNMP portion goes well & returns 1 if it catches the timeout exception. Just added the actual SNMP walk code into the try{} & added the one below as catch() So now we're able to know if SNMP isn't working. I'm kinda lost on what to do at the 'PeerDown' datapoint (in terms of expressions). Can you help? Never used the complex datapoint features before.
  5. After checking the OIDs I don't believe the upTime can tell that difference. I'll try to leverage that 'general' change & see if it works for us. That's a great idea! Basically we could just add a new complex datapoint (via groovy) & try to poll a basic OID. If it doesn't return data, then assume snmp isn't replying (snmpDown == 1). From there just tweak the actual PeerDown to actually have that value in mind before returning 0. Am I in the right path? Or you had something more simple in mind? Thank you anyway for the input on this !
  6. Hello, We've noticed the Cisco EIGRP PeerDown alarm(s) aren't being suppressed if the actual device goes down on LM. When lost SNMP connectivity to one of our routers, it started returning PeerDown alarms (since SNMP wasn't responding, causing the 'NoData' condition at the 'upTime' datapoint). This becomes an issue because the actual datapoint that checks the Peer status, bases itself on the data retrieved by the 'upTime' datapoint (which at this point, is as 'NoData). Basically, if the 'upTime' doesn't return data (which happens if the actual device goes down) it'll trigger an alarm for the PeerDown instances (since it'll always return False). LogicMonitor only sees the actual device as 'down' after 5 minutes (when not retrieving data). This DS will alarm first (since the PeerDown will return an alarm on 2 consecutive tools - which means 3 minutes). As per the documentation, all the alarm(s) emanating from the host will be suppressed. My question here (just to make sure) is, this will only be the case for alarms that hit 'AFTER' the host down condition correct? If that's true, how can we surpass this without having to increase the time that 'PeerDown' alarms took to appear in the console? Is there any type of expression that we can use in that ComplexDatapoint (instead of the current one). Because, currently the fact of this device being down, caused 100 alarm(s) on the console (since it's a central point for our EIGRP routing). Thank you! Regards,
  7. Hello Michael, Actually the legacy SNMP datasource(s) cover pretty much everything we need in terms of the UCS C Series stuff (hardware, temperature, power consumption, etc...). I've re-applied those only to UCS C stuff by tweaking the AppliesTo. Therefore, I don't think there's a need of developing new ones.
  8. So after reading the UCS monitoring article I found that we need to actually add the Manager & Fabrics (making use of the API keys for those). However, it requires a few tweaks on the data sources part in order to avoid duplicated monitoring. Since the 'sysinfo' on the UCS stuff contains 'NX-OS' it grabs a bunch of extra stuff (that we actually don't require for UCS because those are already monitored via the new data sources suite). The solution for us was tweaking all those data sources 'appliesTo' in order to remain the same & not get applied if the condition below is true. - !hasCategory("CiscoUCSManager") && !hasCategory("CiscoUCSFabricInterconnect") We had the need to do this because for a few UCS we also have 'snmp.community' on the same dynamic group (since it's a common group for CIMC's as well). Therefore, we need to make this filter to avoid SNMP datasources to be applied into UCS Manager resources. This does exactly the trick for us. Not sure if it'll help anyone else, however, just sharing it. On a side note, seems that LM also forgot about the CIMC (UCS C series) devices. Since they advise to disable the Legacy SNMP datasources but, those are also used by the CIMC device(s). Once again, we followed the same logic & applied the same filter on those. Thanks,
  9. Hello, Now that a new suite of data sources is available for the UCS stuff, we're having some doubts on the monitoring related with the Cisco UCS Manager & their respective components (Fabric Interconnects, etc...) I'm sure the new suite removes the need of the SNMP legacy datasources (that's great btw!!!), however, we're now kinda lost on what should we add into LM (in terms of components). I'm pretty sure in the past we had to add the UCS Manager + the Fabric Interconnect devices (to fetch some hardware related info that wasn't fetched via UCS Manager), but currently, I see the same datasources applied to the UCS & both Fabrics. I'm assuming that we only require to monitor the actual UCS Manager from now on (everything else will be monitored under it). Specially the hardware portion of both Fabric Interconnects (at least from what I'm seeing). Is my assumption correct? Or, is there anything else we need to have in mind in terms of the Fabrics. Appreciate the clarification guys! Regards,
  10. Thank you for the feedback Stuart. Just dropped a message to him. Regards,
  11. Hello guys, We have created a few alarm 'views' for our different monitoring teams. Is there any way to share the alarm console filters (across users, roles, etc...)? If not, is that possibility coming in a near future? Appreciate your time. Regards,
  12. Hello, It doesn't make sense to have multiple alarms each time a polling fails, if the polling is failing always at the same datapoint it'll alarm. If an alarm is already in the console for that specific failure, why do you need to have duplicated alarms for the same exact problem?
  13. Hey Stuart, I've deployed & it works just fine! Thanks a lot for the sharing.
  14. I've raised this concern as well y'day ( https://communities.logicmonitor.com/topic/5892-monitoring-linux-processes-their-status-resource-usage-etc-via-ssh/#comment-13628 ). Stuart shared a DS created by him that basically monitors the processes. I've deployed it to our environment & it's working fine (applied some tweaks according to our needs but, overall it's working) Hope it helps you guys.
  15. Hmm... yeah probably we'll just wait for something official. However, thanks again for the sharing. I'll poke on it once I've some free time Regards,