• 0
The Other Josh

Active Discovery and instance deletion

Question

I keep waffling on whether this is a bug, feature request or I'm just thinking about it wrong.

 

Currently, if you configure instances to delete automatically it will prevent any alarms (the instance doesn't exist any more).  At first, support told me to 'delete after 30 days', which makes sense at a quick glance, but doesn't actually work; the instance doesn't exist, so there is no incoming data and hence no alerts to trigger.  The 30 days is just a way to preserve data in case the instance comes back (failed hardware, or intermittent service, etc).

 

This means that you cannot enable automatic deletion for any instance where you need to alert on a change that would result in it being filtered from active discovery.

 

Initially I had enabled automatic deletion for network interfaces, in order to automatically keep things clean if modules are added or removed, or interfaces change status.  However, we found that this meant we would never get an alert for an interface being down (since those are filtered from AD).  I disabled automatic deletion, but that leaves 2 problems:

 

1) We have to manually clean things (at the very least when changing hardware, modules, etc)

2) Even though the instance still exists, since it isn't picked up by AD none of the properties are updated.  We use an 'alert_enable' string, but the description never gets updated by AD so it continues to alert (even though it's changed on the device)

 

Options:

  1) Stop filtering in AD (expands instance count, collector resources, etc)

  2) Have LM alarm on removal of an instance (possibly overload the 'no data' rules, or add a different flag, possibly related to whether instance deletion is immediate or after 30 days)

  3) If not deleting instances, possibly have AD update any properties or other values that exist before doing the filtering.  This would let AD update descriptions, admin status, etc on an interface (allowing it to clear things like description, operational status, etc) - it's possible that is undesirable if people want to see the properties and values as the existed when the instance was 'live'.  

 

I'm just not that happy with any of the current solutions, but not convinced whether these possible features would be any better or more desirable.

Share this post


Link to post
Share on other sites

1 answer to this question

Recommended Posts

  • 0
On 5/29/2020 at 9:35 AM, The Other Josh said:

Currently, if you configure instances to delete automatically it will prevent any alarms (the instance doesn't exist any more).  At first, support told me to 'delete after 30 days', which makes sense at a quick glance, but doesn't actually work; the instance doesn't exist, so there is no incoming data and hence no alerts to trigger.  The 30 days is just a way to preserve data in case the instance comes back (failed hardware, or intermittent service, etc).

That's exactly the reason for this feature. The other reason is that in case the instance comes back, it just shows up as a hole in data instead of losing historical data and resuming as a "new" instance.

On 5/29/2020 at 9:35 AM, The Other Josh said:

This means that you cannot enable automatic deletion for any instance where you need to alert on a change that would result in it being filtered from active discovery.

That's right. If an object has 4 states and you need to monitor for two of those states, those two states shouldn't be used as discovery filters, period. You'd need to find another attribute you can use to filter interfaces. AdminStatus is probably the best one for programmatically distinguishing between interfaces that are down on purpose vs. those that are accidentally down.

 

Theoretical rambling (needs to be thoroughly thought through):
You could get fancy and create an automatic aging property to exclude them from discovery X days after the inactive state is reached. On the first discovery cycle where the instance is down, you could set a timestamp in a property. The discovery script could check for the presence of this property and exclude the instance from the discovery output if the difference between property's value and now is greater than X days (which X could also be set as a property). This would allow alarming on an instance for X days after it goes down, then exclude it from discovery. Since the old interfaces DS uses SNMP as the collector, you'd have to switch over to a scripted DS. I think there is one coming out soon (or already available?) that is scripted to help with performance on very large devices. The property would likely need to be a device level property as a list of key/value pairs since 'instanceProps.get()' doesn't work (or does it?) in batchscript.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Answer this question...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.