Eric Johnson

SNMP Trap Event Consolidation

Recommended Posts

It would be useful to have SNMP traps that trigger within a specific timeframe to be considered the same alert.  We have a few cases where devices start throwing traps every minute and by the time we react to fix we already have dozens of alerts.  It would be better to consider the same trap within a time frame to be the same alert to avoid this alert flood.

  • Like 1
  • Upvote 1

Share this post


Link to post
Share on other sites

I completely agree with this.  Different vendors use TRAPs differently and from my experience may send the same trap multiple times, sometimes several times a minute or more even.  The TRAP functionality in LogicMonitor will not be useable in these cases because the noise it will create will be a huge distraction for any NOC to be able to handle. It takes their eyes off of other possibly critical events because of multiple duplicate alerts for the same issue.

Let's take Barracuda for instance.  Their NextGen Firewalls have a TRAP for HA Partner Unreachable.  We received a trap every 5 minutes for about 2 hours while this situation was occurring.  From Barracuda's standpoint, this was a single event with notifications that go out every 5 minutes until the error goes away.  They don't have a pollable "GET" MIB to track this scenario either.

I would propose the logic this way:  LM receives a trap that matches an EventSource criteria and triggers the configured alert.  That eventsource is configured with a timeout value (let's say 60 minutes).  If another Trap from the same device with the same content comes in before the timeout value, don't create a new alert, but rather increase a "count" counter on that alert AND RESET THE TIMER.  As long as no new traps come in within the configured timeout (60 minutes in this example), the alert will clear like normal.  If a new trap comes in after the timer, a new alert is generated.  You may need to provide an interface to view all the Trap data associated with that one Event Alert since there will now be multiple.

This issue is plaguing our company right now and we are a very large MSP.  We are at the mercy of vendors who don't provide polling MIBs for some critical actions like this, hence why SNMP Traps become more of a necessity.

  • Upvote 3

Share this post


Link to post
Share on other sites

Right on Jeff and Eric.  We like Barracuda and need a solution where LM takes 1 or 100 traps of same event source criteria and understands it is one problem and not 20 separate problems.  Anyone out there been able to solve another way?  If not, can we get as Jeff suggests?

Share this post


Link to post
Share on other sites

I like the time out idea and would like to have that applied not just to SNMP trap event sources but to all event sources.  A similar thing can happen with Windows event logs where an event log repeats but is actually the same incident.

Share this post


Link to post
Share on other sites
40 minutes ago, Mosh said:

I like the time out idea and would like to have that applied not just to SNMP trap event sources but to all event sources.  A similar thing can happen with Windows event logs where an event log repeats but is actually the same incident.

That's a very good point.  we do have the same problems with Windows events and likely any other event types supported.  It comes down to handling asynchronous event technologies differently that accommodates the nature of those event types to be handled well by a NOC or IT team. 

Share this post


Link to post
Share on other sites

Reviving an old thread, but we're currently reevaluating EventSource suppression logic.

Some of the other EventSource types already use a timeout like mechanism to avoid duplicates, but we don't do anything like that for SNMP traps.

The general idea right now is to let the user decide which duplicate fields indicate a duplicate event, and suppress anything within the "effective interval" of the original alert. I think it makes sense to have the timer reset logic be optional. I also like the idea of providing more visibility on how many events were suppressed.

We've also had a fair number of requests for a mechanism like the DataSource "trigger interval", where we only trigger an alert if we see the same event N times in the interval.

Anyways, any additional feedback is appreciated.

Share this post


Link to post
Share on other sites

SYSLOG is another where we have this issue. We have received 1000s of duplicate alerts in the span of minutes. We had to create special escalation chains just to throttle it properly when it does it.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now