Recommended Posts

  • Administrators

Back at it again, we're having another Live Training Webinar to introduce some new functionality in your LogicMonitor portal. We would like to invite you to join us at 11am PST (1pm CST), Wednesday July 22nd, for another "What's new at LogicMonitor" webinar. This release is all about our Early Warning System with phase two of the Dynamic Thresholds feature. Specifically, you'll learn how you can reduce the administrative overhead of tuning alert thresholds while still getting alerts for the right metrics at the right time, in many cases with issues being caught sooner and more accurately.

Register here.

Link to post
Share on other sites
  • 2 weeks later...

Thank you for the presentation today. Great work. The features looks fantastic and we want to implement them as soon as possible. Will we be able to get a copy of the recording? I'd like to watch it again to make notes for our teams. 

Edited by Kevin W
Link to post
Share on other sites
  • Administrators

Q&A Transcript:

Q: How can the Dynamic Threshold minimize false positive deltas caused by 
-Memory Utilization of a system over time [normal increases in utilization vs memory leakage] and sudden increases in memory utilization after a firmware/OS upgrade that vary widely from the normalized trend/reference point of utilization.
-CPU jumps caused by turning on features and services, or upgrades in a system
-Interface utilization jumps caused by our customers' backups overnight.
A: Touching on how this is done over the next slide or so, please let us know if it's not clear after and we can touch on this with more detail/specific context. Live answered at 17:33.

Q: Would Dynamic thresholds also have Seasonality Buckets that are week, 2 week month long? like a client having a VERY busy Christmas week or VERY busy mid Jan to mid Feb
A: Live answered at 32:58
Followup pending from Chris in Product management

Q: Is this a licensed feature or included with what we already have?
A: Dynamic thresholds are available to LogicMonitor Enterprise accounts

Q: What happens then if that's a pattern? Lets say we've clients doing backups during the weekend (which fills the disks temporarily over the weekend, or an interface that gets a huge usage over the weekend)... Would AI learn that behavior & expect it over the weekend?

A: Yes that should fall under the “seasonality” data training - “Daily and weekly trends also factor into dynamic threshold calculations. For example, a load balancer with high traffic volumes Monday through Friday, but significantly decreased volumes on Saturdays and Sundays, will have expected data ranges that adjust accordingly between the workweek and weekends.”
https://www.logicmonitor.com/support/alerts/aiops-features-for-alerting/enabling-dynamic-thresholds-for-datapoints

Q: Can we specify/Configure Dynamic Thresholds by Device Groups? For example when a Group is a Client company for MSP
A: At the moment I believe they can only be configured at the Global, or Instance level. Each resource would determine it’s own bounds for “normal” data. https://www.logicmonitor.com/support/alerts/aiops-features-for-alerting/enabling-dynamic-thresholds-for-datapointsChris Sternberg has informed me that device-group dynamic threshold configuration is planned for release in the near future.

Q: Our portal does not have Dynamic Thresholds in the Datapoint config pages. What is needed to enable this?
A: Dynamic Thresholds are available to LogicMonitor Enterprise, if you DM me your portal name I can check your subscription level if you’re unsure of it. https://www.logicmonitor.com/support/alerts/aiops-features-for-alerting/enabling-dynamic-thresholds-for-datapoints

Q: When setting the dynamic thresholds, does the alert come across to the resolver groups clearly listing that this is delivered as a dynamic alert vs a static?  Or would we need to add a value to be traded with the API, such as service now, etc?  Thanks!
A: The alert description will include a note that it is triggered via a dynamic threshold. Check out the Viewing Alerts for Dynamic Thresholds section here for some more detail. https://www.logicmonitor.com/support/alerts/aiops-features-for-alerting/enabling-dynamic-thresholds-for-datapoints

Q: How often is the "normal band" recalculated? I.E. if a disk is filling up VERY slowly, 1.5 of "normal" might not ever be triggered, depending on when (if) normal is recalculated occasionally.
A: Live answered at 35:24.

Q: Thanks!
A: Glad that we can help! Please feel free to reach out to our support team any time if you have additional questions.

Q: Using reports was mentioned on how to tune- would that be done in the same way as static thresholds? I.e. turn thresholds on and run alert reports to view number of alerts?
A: Live answered at 37:30.

Q: Do static thresholds remain in place when dynamic ones are enabled?  Use case:   I've set a 75% POE utilization for a switch,  and I would still want to know if that is exceeded but would the dynamic threshold be able to operate concurrently and trip an alert if the utilization dropped to 0?
A: Yes, with the initial release of Dynamic Thresholds, alerts still trigger at static thresholds but will suppress the notification from being sent unless the data is determined to be anomalousThere’s a more in depth explanation under “Assigning both Static and Dynamic Thresholds to a Datapoint” here https://www.logicmonitor.com/support/alerts/aiops-features-for-alerting/enabling-dynamic-thresholds-for-datapoints

Q: To follow up on the baseline creep question, how often is normal band recalculated. Is it continually, or daily, or weekly, etc. Maybe I missed (or misunderstood) the explanation?
A: Live answered at 39:20.

Q: How do we determine the number of alerts that were suppressed by the dynamic threshold historically, the data does not get captured in the alerts report
A: Live answered at 40:41.

Q: Thanks. This is a problematic situation b/c I need to know if the top value is exceeded, but I simultaneously need to know if the value drops to 0 (this consistently indicates POE HW failure in the switch) but I haven't figured out how to accomplish this duality with existing static thresholds, considering that not all switches deliver POE normally--0 is the expected value for them.  hope that explanation is understandable
A: Live answered at 42:53.

Q: You mentioned that v2 learns faster. How much faster? Before it needed 3 days to figure out what is normal.
A: Live answered at 42:23.

Q: For the alert frequency report where do I find it - it does not appear in my Start a new report.  Can you show us on screen perhaps?
A: Live answered at 44:36.

Q: As an MSP, we've had Dynamic Thresholds enabled on thousands of endpoints across various datapoints since Phase One, which worked very well and as expected – basically for tuning alerts out on datasets that were not considered anomalous.  As Phase Two of Dynamic Thresholds went live, we encountered a large increase in alerts thrown that were still considered anomalies based off historical data, but well underneath our previously configured static thresholds.  The “upper bound” selection was selected by default on this move from Phase One to Phase Two.
Question is – is there a best practice for handling these previous datapoints that had dynamic thresholds moved from Phase One to Phase Two?
A: Live answered at 46:16.
Followup pending from Chris in Product management

Q: Thanks. Do we need to enable the dynamic thresholds first before being able to see the offsets options?
A: Those options should be available once the data has trained (e.g. after enabling them) there is a Dynamic Threshold Advanced Config option.

Q: How long would a datapoint have to be at an “abnormal” level before it would become the new “normal” level?  Would an existing alert clear when this happens?
A: Live answered at 48:09.

Q: Followup: What determines "as little as" 13 hours?
A: Live answered at 48:55

Q: Does it look at Historical data for the instance...say we just turn it on
A: Live answered at 51:21

Q: These are very helpful and find them valuable. Continue these webinars...
A: We’re glad you enjoy them! Thanks for joining us and feel free to reach out to our support team with any other questions you might have.

Q: Thanks! Great webinar.
A: Thanks for joining! Feel free to reach out to our support team anytime if you have additional questions.

Link to post
Share on other sites

Is there a plan to add the ability to manage dynamic thresholds at the group level?

Right now it seems to be all-or-nothing -- either we have to modify the datasource and apply it everywhere or we have to modify every instance we want to be dynamic individually.

It would be great if we could apply different thresholds for different groups (i.e. MSP customers).

Link to post
Share on other sites

Also, what happens if you have, say, a dynamic threshold set for disk usage and the usage grows very slowly and steadily.  If I also have a static threshold set for when the disk is dangerously full will it still send notifications or will they be suppressed since the value isn't anomalous?

Link to post
Share on other sites
  • Administrators
37 minutes ago, David Good said:

Is there a plan to add the ability to manage dynamic thresholds at the group level?

Right now it seems to be all-or-nothing -- either we have to modify the datasource and apply it everywhere or we have to modify every instance we want to be dynamic individually.

It would be great if we could apply different thresholds for different groups (i.e. MSP customers).

Most definitely.

giphy.gif

Link to post
Share on other sites
  • LogicMonitor Staff
18 hours ago, David Good said:

Also, what happens if you have, say, a dynamic threshold set for disk usage and the usage grows very slowly and steadily.  If I also have a static threshold set for when the disk is dangerously full will it still send notifications or will they be suppressed since the value isn't anomalous?

It's possible if the usage is slowly & steadily increasing that the band will adjust to that usage. You can avoid suppression by having static & dynamic thresholds at separate severities (eg static threshold for critical severity & dynamic for warn | error). Sometimes these metrics where you generally know that certain good and bad ranges always hold true (e.g. above 90% is always bad, below 90% is always good) are a better fit for static thresholds. Dynamic Thresholds are most useful when it is not possible or difficult to identify a range generally/globally across all instances (e.g. above 10MB/s may be bad for instance A, but for instance B it should be above 100MB/s).

Link to post
Share on other sites

If I understand correctly when initially enabling dynamic thresholds there is a possibility that false positives will result in alarms - at the very least while the algorithm is detecting patterns. Would it be reasonable to say then to have alarms disabled for at least 24 hours first if possible (or whatever period  appears to be a pattern based on an alarms report) ?

Link to post
Share on other sites

In the instance threshold menu, you can see a graph simulating the initial offset in blue/grey before you hit save. If you see a lot of red in there, it might not be a good idea to enable dynamic thresholds yet, since those will be your alerts. 

 

image.png.a8d6cd99f558e184bec8aa1d43a1c014.png

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.