Search the Community

Showing results for tags 'alerts'.

More search options

  • Search By Tags

    Type tags separated by commas.
  • Search By Author

Content Type


  • From LogicMonitor
    • Product Announcements
    • LM Staff Contributions
    • Community Events
  • LogicMonitor Product Discussion
    • Feature Requests
    • LM Exchange
    • Ask the Community

Find results in...

Find results that contain...

Date Created

  • Start


Last Updated

  • Start


Filter by number of...


  • Start



About Me

Found 53 results

  1. This is not an advertisement by any means, just offering to help anyone who struggles with this as well. As an MSP, we have struggled with how to handle alert tuning in bulk with it comes to things like Interfaces (instances). Some of the interfaces you want to alarm as critical, some you want as error and others you don't care about at all. LM provided a partial fix for that with their Groovy based "Status" alarm based on the interface description, but it didn't take it far enough. We started creating manual interface groups called "Critical" and performing Alert Tuning on that "parent" only to find out that it doesn't work as interfaces move in and out of it. I was beyond disappointed, but it said it right at the top of the page: Changes made to Alerting or Thresholds will only affect existing instances currently in this Instance Group. Instances added later will not be subject to the changes. Anyway, long story short we finally decided to write our own application to do it and built it in Azure. We built it to handle multiple data sources so we could group other instances (like VMware vDisks) and do the same bulk changes. It was written to be a data source in your environment, so that you can apply it to whatever devices you want and just call out to the API with the device name. If you have any interest in using it, let me know. There are costs associated as Azure bills based on usage, but it is pretty small for us (< $200/mo). Trust me, I wish LM solved this without having to write the app!
  2. So is there no way to easily export the alerts page? Am I missing something or why is there no way to export the alerts I need to share with folks to a CSV, PDF, etc....? Why is there no way to do this on the main Alerts page? Also is there any way to list, show the actual alert details? I don't want to have to click on every alert so I can then view the alert details. Is there a way to show this as one of the columns? So I'm looking for a way to export the Alerts I am viewing after creating my filter and to also include the Alert Message details. Please don't tell me I have to use the horrific "Reports' section and then build a Report based on the "Alerts" template to export alerts. This as far as I can tell doesn't even allow to show the alert message details which I need included in the export/report.
  3. The ability to modify alert notes en mass the same way you can acknowledge multiple alerts at once would be a nice thing to have. When multiple (20+) alerts have a incorrect note put into them it is time consuming to go back and manually fix them one by one. I see the in new UI you can mass tag them but when you go to modify the note it tells you their already acknowledged. Thanks!
  4. We have begun implementing a tagging standard in our cloud accounts to better control discovered resources and route alerts accordingly. I would like to be able to route alerts by default based on the value of a tag. I'm aware that I can already set up specific users and then achieve exactly what I'm requesting, but I would much prefer to have a blanket rule that uses the tag's value as the recipient email address(es) directly. Some examples below: See the screenshots below for a visual example of how I'd like to structure this automation.
  5. I noticed that the requests rights to "show notifications". It would be awesome if our LogicMonitor instances could do the same and that we could configure in settings, so that we could push style notifications to our workstations on alerts.
  6. A feature enhancement that enables alerts to be limited to certain days of the week as well as hours/mins would be very beneficial as there are often occasions when an alert is needed in the working week but not at the weekend. An example is NetApp snapmirror lagtime. Mon-Sat these are set to replicate but not on a Sunday. We look for 24 hour lag most of the time to see an issue but on a Monday this would be 48 hours (as there would have been no snapmirror since the Sat). I appreciate I can create ways to manage alerts using time based escalations however there is no way to affect the alerts view on the dashboard with this approach. Hopefully something that other might also want which can be added in the future?
  7. For JDBC datasources, please create a token that would enable us to include the JDBC driver exception message in the alert for Query Status data point alerts, the ones that are based on: Query status - 1=ok, 2=credential invalid, 3=connection string invalid, 4=connection rejected, 5=driver not supported, 6=connection failure, 7=query failure This would greatly help us to achieve faster time to resolution of incidents when the exception is code of type 6 and 7.
  8. Windows Drive Space Alerts By default, LogicMonitor alerts on the percentage used on any drive. This in general is fine, but sometimes not. Let’s imagine you have a 2.2 terabytes drive. You might have your critical threshold set at 90%, which sounds fine, until you realise that you are going to get a critical alert when you still have 220 GB free. In my case that would be a cause for some celebration, not really an urgent need to get up at 3 A.M. and delete files so the world doesn’t end. Now Imagine your 2.2TB drive is divided up as: C: 10 GB (OS) D: 500 GB (Mission critical applications) E: 1 TB (Backups) F: 510 GB (Other Applications) A 90% alert will give you a critical at 1GB,50GB,100GB and 51GB respectively. Now the C: drive may be a cause for concern, but the others not so much. The two application drives you might only be concerned if they have less than 4GB free and the backup less than 10GB. So, we decide to alert on the following C: freespace is <1 GB D: freespace is <4 GB E: freespace is <10 GB F: freespace is <4 GB You could clone the datasource so you have four copies one for each drive but this is harder to maintain in the future and does not scale well. It would be better if you could somehow get the drive letter and assign a threshold based on that. Logicmonitor’s scripted complex datapoint using groovy to the rescue. The disks datasource queries the class Win32_Volume. We need to use the raw drive letter output from the WMI class so would write a groovy script like: Drive=output["DRIVELETTER"]; return(Drive); This returns C:,D:,E: and F: Not much use as Logicmonitor doesn’t deal with text, only metrics. Let’s beef up the script. drive = output['DRIVELETTER']; freeSpaceLowerLimitGigabyte = '0'; if (drive == 'C:') {freeSpaceLowerLimitGigabyte = '1';} if (drive == 'D:' || drive == 'F:') {freeSpaceLowerLimitGigabyte = '4';} if (drive == 'E:') {freeSpaceLowerLimitGigabyte = '10';} return freeSpaceLowerLimitGigabyte; This returns 1,4,10 and 4 for each drive, now we have a complex datapoint that returns the lowerlimit in GB for each drive dependant on the drive letter. Again, we can’t alert on this so we need another datapoint So we can use this to check if freespace is less than the freeSpaceLowerLimitGigabyte. To do that create a CapacityAlert datapoint using this expression if ( lt (FreeSpace, FreeSpaceLowerLimitGigabyte * 1024000000) , 1, 0) Which breaks down as if freespace is less than the assigned limit for that drive letter then return 1 (which you alert on.) Otherwise return 0. Alert threshold set at = 1 1 1, and we get critical alerts if: C: freespace is <1 GB D: freespace is <4 GB E: freespace is <10 GB F: freespace is <4 GB
  9. Hi I'm pretty new to LM and am struggling with the big number widget. I have a need to show alert counts for a specific subscription, showing new (unacknowledged/cleared) alerts and then show some history i.e. unacknowledged/cleared over last 7 days, current month etc. Any guidance appreciated
  10. We've been seeing an issue where we get a critical alert, we are notified through our escalation chains, and we acknowledge the alert. However, the action we take to resolve the alert is only enough drop the severity on the alert to error or warning, not clear it entirely. If that alert crosses a critical threshold again it will show up as acknowledged from the first time it went critical, which will prevent all notification. For example we have threshold for percent used on a volume at >=90 95 98. The volume hits 98%, we are notified and ack the alert, but are only able to clear space to drop the volume down to 92%. If that volume hits 98% again it will show up as already acknowledged and prevents all notifications (see below): This is the expected behavior according to LM, but I don't see a benefit in this behavior and it seems risky if you expect to get alerted any time a threshold is crossed. We'd like to be able to receive a notification any time an alert crosses an threshold, regardless if it has been acknowledged at a higher severity for that alert "instance."
  11. I have a device group that has the same datasource applied. This datasource auto-discovers and will spin up matching instances across all devices in the group. I would like to have clustered alerts based on the matched instances across all devices in the group. For example, (pardon the ASCII-like visualization) ClusterGroup |__ Device1 | |__ DatasourceA | |_ Instance_ABC | |_ Datapoint_I | |_ Datapoint_II | |_ Instance_DEF | |_ Datapoint_I | |_ Datapoint_II | |_ Instance_GHI | |_ Datapoint_I | |_ Datapoint_II |__ Device2 | |__ DatasourceA | |_ Instance_ABC | |_ Datapoint_I | |_ Datapoint_II | |_ Instance_DEF | |_ Datapoint_I | |_ Datapoint_II | |_ Instance_GHI | |_ Datapoint_I | |_ Datapoint_II |__ Device3 |__ DatasourceA |_ Instance_ABC |_ Datapoint_I |_ Datapoint_II |_ Instance_DEF |_ Datapoint_I |_ Datapoint_II |_ Instance_GHI |_ Datapoint_I |_ Datapoint_II If Instance_ABC's Datapoint_I is alerting at the specified cluster threshold in my hypothetical group, I want to generate a cluster alert. If some time afterwards, the situation in my environment gets worse and Instance_GHI's Datapoint_II is alerting at the specified cluster threshold, I want another cluster alert for that instance-datapoint as well.
  12. I took a working groovy script datasource and am now trying to adjust it to some needs we have. This data will end up giving us alert totals for each month so we can build reports. Any ideas? Here is what I have so far. import javax.crypto.Mac; import javax.crypto.spec.SecretKeySpec; import org.apache.commons.codec.binary.Hex; import groovy.json.JsonSlurper; //define credentials and url def accessId = hostProps.get(''); def accessKey = hostProps.get('lmaccess.key'); def account = hostProps.get('lmaccount'); def alertgroup = hostProps.get(''); def collectionFailures = 0 def failures = [:] def client = new LogicMonitorRestClient(accessId, accessKey, account, this.&println) try { def alerts = client.get("/device/groups/" + alertgroup + "/alerts", fields: "severity", filter: "startEpoch>:1538370000,endEpoch<:1541048399,cleared:*") //warnings = alerts.findAll {it.severity == 2}.size() println "WarningCount: ${alerts.findAll {it.severity == 2}.size()}" println "ErrorCount: ${alerts.findAll { it.severity == 3 }.size()}" println "CriticalCount: ${alerts.findAll { it.severity == 4 }.size()}" println "TotalAlerts: ${alerts.size()}" } catch (Throwable e) { failures["alerts"] = e.toString() collectionFailures += 1 } // Do error reporting println "CollectionFailures:${collectionFailures}" failures.each{ query, exception -> println "Exception while querying $query:" println exception } return 0 ////////////////////// // HELPER FUNCTIONS // ////////////////////// class LogicMonitorRestClient { String userKey String userId String account int maxPages = 20 int itemsPerPage = 1000 def println LogicMonitorRestClient(userId, userKey, account, printFunction) { this.userId = userId this.userKey = userKey this.account = account this.println = printFunction } def generateHeaders(verb, path) { def headers = [:] def epoch = System.currentTimeMillis() def requestVars = verb + epoch + path // Calculate signature def hmac = Mac.getInstance('HmacSHA256') def secret = new SecretKeySpec(userKey.getBytes(), 'HmacSHA256') hmac.init(secret) // Sign the request def hmac_signed = Hex.encodeHexString(hmac.doFinal(requestVars.getBytes())) def signature = hmac_signed.bytes.encodeBase64() headers["Authorization"] = "LMv1 " + userId + ":" + signature + ":" + epoch headers["Content-Type"] = "application/json" return headers } def packParams(params) { def pairs = [] params.each{ k, v -> pairs << ("${k}=${v}")} return pairs.join("&") } // Non paginating, raw version of the get function def _rawGet(path, params) { def baseUrl = 'https://' + account + '' + '/santaba/rest' + path def packedParams = "" if(params) { packedParams = "?"+packParams(params) } def query = baseUrl+packedParams def url = query.toURL() def response = url.getText(useCaches: true, allowUserInteraction: false, requestProperties: generateHeaders("GET", path)) return response } // Public interface for getting stuff. def get(Map args=[:], path) { def itemsReceived = [] def pageReads = 0 // Impose our own paging parameters. args.size = itemsPerPage args.offset = 0 while(true) { // Do da nastieh def response = new JsonSlurper().parseText(_rawGet(path, args)) if (response.errmsg == "OK") { // Catch individual items if ( == null) { return } itemsReceived += // Check if there are more items // if ( > itemsReceived.size()) // { args.offset = args.size + args.offset // } // else // { // break // we are done // } } else { // Throw an exception with whatever error message we got. throw new Exception(response.errmsg) } pageReads += 1 // Check that we don't exceed max pages. if (pageReads >= maxPages) { break } if ( > 0) { break } } return itemsReceived } } If I run the URL with the API creds in my test powershell script, it works perfectly. When I test it in LM as a datasource, I get the attached error.
  13. Hi, I already raised this with LogicMonitor via email, but just re-iterating here. For some datapoints, where we want to generate warning/error/critical alerts, you can use the collection interval and alert trigger interval to basically set the amount of time that should elapse if a datapoint threshold/logic triggers an alert. But it's not possible to currently for example set a completely custom interval based on duration. e.g. if I want to generate a warning alert after 3 hours, and an error alert after 4 hours, you have to use a combination of the two things above to get close enough to the duration you want. It would be great if you could, regardless of the collection interval, have more options in the alert trigger interval (currently 1 to 10, and 20,30,40,50,60). So, if I have a collection interval of 5 minutes, I can currently achieve 2.5 hours or 5 hours using 30 and 60 alert trigger interval respectively. Couldn't there be a regular number input rather than a drop-down with predefined options for the alert trigger interval? or a separate option that allows a completely flexible duration? Also, can a custom interval already be set using the API, regardless of the UI, as I could try that? If there's another way to achieve what I want, would be happy to hear it.. :-) Thanks, Roland
  14. Hey All, Couldn't find a way to do this using the alert tokens available so I figured I would post it here. I noticed some cool features from other monitoring tools that allow graphs to be sent in the alert body to PagerDuty. So when I receive a PagerDuty page from LM it would be nice to see the associated graph with the data point that is alerting. While just the alert text is good enough for most scenarios I think seeing how big of a jump/spike the data point made before alerting would be useful. The alert "C drive is 90% full" is all fine and good but when you see a graph showing it go from 12% to that 90% in just a minute or two then you know something is really up and might need more expediency as it will probably continue to fill up at that rate.
  15. Archana

    LM - alerts

    When I go through the documents regarding Alerts, I understood that the threshold for a particular Metric/Datapoint can be set in the Datasource itself. I have made an example alert and got an clear idea about the concept. I have a doubt whether can we create an alert for a widget. For example, I have a Gauge widget and represented CPU percentage over there. So how can I set my alert only to that particular widget if my CPU percentage crosses 60% or 90%.
  16. Hello all, Recently, I had mounted a RHEL ISO on a /data/rhel_iso directory, on a system that is monitored with LogicMonitor. 5 minutes later I received an alert about 105% utilization of /data/rhel_iso, which is reasonable but strange, as ISO takes same space as the files inside it. When I unmounted the ISO I got an alert of a filesystem that is not responding. How to disable those ISO related alerts? They are irrational. Many thanks in advance, Szymon
  17. IFTTT is a free SaaS platform that helps you "do more with all your apps and devices" - by providing an integration point between commonly used services and platforms. In the following example, we're using the IFTTT Applet webhooks "trigger" to activate a Philips Hue wireless lighting "action" - blinking the lights of the connected Hue platform as a result of a LogicMonitor alert! Other things you might be able to do with LogicMonitor alerts, through IFTTT (lots of untested possibilities!) : Change lighting colors based on alert status (red for new, green for cleared, etc.) Receive alert notifications to connected systems like Skype, Twitter, Evernote, or Google. Play music on a connected Sonos system after triggering an alert. Turn on a connected Smart Plug like the Wemo from Belkin. The Finished Result The following tutorial assumes that you have an IFTTT account created and permissions to add an integration to your LogicMonitor account. Step 1: Log into your IFTTT account and create a new 'Applet' Step 2: Search for and choose the 'Webhooks' service. Step 3: Choose the 'Receive a Web Request' trigger. Step 4: Configure (and remember) the event name that will be recognized by the incoming webhook to trigger the event. Step 5: Configure the 'Action' that will be taken when this event is triggered in IFTTT - lots of intriguing possibilities! Step 6: Once you've added and configured the 'Action,' review the applet settings and click 'Finish' to save the Applet. Step 7: Select 'Services' from the account dropdown - we will be looking up the incoming webhook URL for our account so we know where to send our alerts. Step 8: Search for the 'Webhooks' service and select it to proceed. Step 9: Select the 'Documentation' link from the 'Webhooks' services page. Step 10: Copy the incoming Event trigger URL along with the key for your account. You will replace {event} in the URL with the one you configured above. Step 11: Moving to your LogicMonitor account, navigate to 'Settings -> Integrations' and add a new 'Custom HTTP Delivery' integration using the event name from Step 4 and the URL (with key) from Step 10 : Step 12: IFTTT allows you to include an (optional!) payload - which will show in the 'Activity Log' of the IFTTT Applet. Step 13: Test Alert Delivery and you should see output similar to below in the IFTTT Activity Log. Step 14: Save your integration, assign it to an Escalation Chain, and assign the Escalation Chain to an Alert Rule - and now we've configured a simple integration between LogicMonitor and IFTTT that could form the basis of a handful of interesting alert actions!
  18. The majority of the alerts we get every day are high volume usage and sometimes its hard to work with this because you don't know how large the volume is. For example, working on a 50TB system and your threshold is set at 90% you will begin to receive alerts when you still have 5 TB left. Would it be possible to have a feature that would allow you to see the size of the drive and set the alert for GB instead of percentage? This would allow for faster use of thresholds on drive alerts.
  19. Currently the hyperlink in an alert notification email requires that users have permission to view the all Alerts view. We don't want users to have access to this view. Please make it so that the notification hyperlink to ack an alert works without the need for this permission.
  20. Hello all, I am trying to get updates on alerts that span multiple days. So, our normal code will grab new alert data for today, let's say. However, we need to go back to old alerts and see if they have been resolved yet, so we can do accurate reporting. What I want to do is provide a list of the IDs for a handful of alerts and get more than 1 row back. Any ideas on how I can formulate an alert query so that I can get 2 rows back for, say ID=DS1234567 and ID=1234568 in the same request? Here is my example call:,id:DS1472582 My hope is to combine a batch of calls so I don't flood the service and so I can get results faster. Given a list of unique IDs, any thoughts on requesting a batch? E
  21. One of the biggest frustrations for us with LogicMonitor is breaking a bunch of dashboards and alerts if we move device groups to another location in the overall device group tree. For example: Say we have a nested device group called "Infrastructure/Hosts". Now our environment has changed a bit, and we want to add better organization to support the new changes to our environment. We move the hosts group to the following location "Infrastructure/PhysicalDevices/Hosts". All alert rules and dashboards that were filtering on "Infrastructure/Hosts" have now been broken, even though the devices in the group need the same alerting and dashboards. Now we have to go through and fix each Alert Rule and Dashboard Widget that used "Infrastructure/Hosts" to now point to "Infrastructure/PhysicalDevices/Hosts". As you can imagine, as environments scale up and evolve, subgroups are going to be moved around all the time. Redoing dashboards and alerts every time this happens adds a tremendous amount of labor, and can lead to people missing changes, leaving behind broken alerts or dashboards that you may not find out about until an emergency has already happened. What we're proposing: "Sticky" device group handling - If a group or subgroup used in an Alert Rule or Dashboard changes location, this location should automatically be updated and reflected in the dashboard. This is how most modern applications handle this sort of thing anyway, and it's a huge time saver. Given the critical nature of this tool's function, this would go along way towards preventing accidentally breaking monitoring that companies rely on to keep their environments running.
  22. Hi to all: I have problems with "StatusFlap" type alerts, the truth is that the platform send me many messages to my email, so, does anyone know if they are false positives? My network devices have no problems on the interfaces, how can i stop this? let me know please, kind regards Iván Martínez
  23. When we reboot a Server or a Application Set our NOC does not know all the Devices, Instances and/or services impacted so we get flooded with alerts for a known event. Example: I need to reboot device WebServer-xyz - The Server, the Switch ports, Storage Sessions and HTTP/S Service are all monitored in LM Like to be able to SDT just these items with one SDT, and not entire switches or devices. So be able to create an "Alert-Group" ie "WebServer-xyz" where you can then add Instances from multiple devices, entire device, Service, Instance Groups, Device Group aka any defined in LM. Then just Add one SDT to the Alert-Group aka one-stop-shopping.
  24. Currently, Alert sounds are persistent even when an instance is placed in SDT. We need functionality added to suppress all alert sounds while an instance is in SDT.
  25. Looking on the alerts list in my account, the scroll bars are visible but in order to view them I have to scroll down. Try determining the escalation chain (right most column) on the 3rd row. You will have to scroll down to see the horizontal scroll bar which moves the 3rd line out of screen. Then after you scroll right you scroll back up and then you don’t see the name as it moved out of the screen (1st column). The scroll bars (and column headers) should be always visible allowing me to scroll without losing my flow of thought