Search the Community

Showing results for tags 'alerts'.



More search options

  • Search By Tags

    Type tags separated by commas.
  • Search By Author

Content Type


Forums

  • LogicModule Exchange
    • LM Exchange
    • LM Staff Contributions
  • Product Announcements
    • LogicMonitor Notices
  • LogicMonitor Product Q&A
    • Feature Requests
    • Ask the Community
    • From the Front

Find results in...

Find results that contain...


Date Created

  • Start

    End


Last Updated

  • Start

    End


Filter by number of...

Joined

  • Start

    End


Group


Found 47 results

  1. David Lee

    Windows Drive Space Alerts

    Windows Drive Space Alerts By default, LogicMonitor alerts on the percentage used on any drive. This in general is fine, but sometimes not. Let’s imagine you have a 2.2 terabytes drive. You might have your critical threshold set at 90%, which sounds fine, until you realise that you are going to get a critical alert when you still have 220 GB free. In my case that would be a cause for some celebration, not really an urgent need to get up at 3 A.M. and delete files so the world doesn’t end. Now Imagine your 2.2TB drive is divided up as: C: 10 GB (OS) D: 500 GB (Mission critical applications) E: 1 TB (Backups) F: 510 GB (Other Applications) A 90% alert will give you a critical at 1GB,50GB,100GB and 51GB respectively. Now the C: drive may be a cause for concern, but the others not so much. The two application drives you might only be concerned if they have less than 4GB free and the backup less than 10GB. So, we decide to alert on the following C: freespace is <1 GB D: freespace is <4 GB E: freespace is <10 GB F: freespace is <4 GB You could clone the datasource so you have four copies one for each drive but this is harder to maintain in the future and does not scale well. It would be better if you could somehow get the drive letter and assign a threshold based on that. Logicmonitor’s scripted complex datapoint using groovy to the rescue. The disks datasource queries the class Win32_Volume. We need to use the raw drive letter output from the WMI class so would write a groovy script like: Drive=output["DRIVELETTER"]; return(Drive); This returns C:,D:,E: and F: Not much use as Logicmonitor doesn’t deal with text, only metrics. Let’s beef up the script. drive = output['DRIVELETTER']; freeSpaceLowerLimitGigabyte = '0'; if (drive == 'C:') {freeSpaceLowerLimitGigabyte = '1';} if (drive == 'D:' || drive == 'F:') {freeSpaceLowerLimitGigabyte = '4';} if (drive == 'E:') {freeSpaceLowerLimitGigabyte = '10';} return freeSpaceLowerLimitGigabyte; This returns 1,4,10 and 4 for each drive, now we have a complex datapoint that returns the lowerlimit in GB for each drive dependant on the drive letter. Again, we can’t alert on this so we need another datapoint So we can use this to check if freespace is less than the freeSpaceLowerLimitGigabyte. To do that create a CapacityAlert datapoint using this expression if ( lt (FreeSpace, FreeSpaceLowerLimitGigabyte * 1024000000) , 1, 0) Which breaks down as if freespace is less than the assigned limit for that drive letter then return 1 (which you alert on.) Otherwise return 0. Alert threshold set at = 1 1 1, and we get critical alerts if: C: freespace is <1 GB D: freespace is <4 GB E: freespace is <10 GB F: freespace is <4 GB
  2. Graham Bell

    Alert Count in Big Number widget

    Hi I'm pretty new to LM and am struggling with the big number widget. I have a need to show alert counts for a specific subscription, showing new (unacknowledged/cleared) alerts and then show some history i.e. unacknowledged/cleared over last 7 days, current month etc. Any guidance appreciated
  3. We've been seeing an issue where we get a critical alert, we are notified through our escalation chains, and we acknowledge the alert. However, the action we take to resolve the alert is only enough drop the severity on the alert to error or warning, not clear it entirely. If that alert crosses a critical threshold again it will show up as acknowledged from the first time it went critical, which will prevent all notification. For example we have threshold for percent used on a volume at >=90 95 98. The volume hits 98%, we are notified and ack the alert, but are only able to clear space to drop the volume down to 92%. If that volume hits 98% again it will show up as already acknowledged and prevents all notifications (see below): This is the expected behavior according to LM, but I don't see a benefit in this behavior and it seems risky if you expect to get alerted any time a threshold is crossed. We'd like to be able to receive a notification any time an alert crosses an threshold, regardless if it has been acknowledged at a higher severity for that alert "instance."
  4. I have a device group that has the same datasource applied. This datasource auto-discovers and will spin up matching instances across all devices in the group. I would like to have clustered alerts based on the matched instances across all devices in the group. For example, (pardon the ASCII-like visualization) ClusterGroup |__ Device1 | |__ DatasourceA | |_ Instance_ABC | |_ Datapoint_I | |_ Datapoint_II | |_ Instance_DEF | |_ Datapoint_I | |_ Datapoint_II | |_ Instance_GHI | |_ Datapoint_I | |_ Datapoint_II |__ Device2 | |__ DatasourceA | |_ Instance_ABC | |_ Datapoint_I | |_ Datapoint_II | |_ Instance_DEF | |_ Datapoint_I | |_ Datapoint_II | |_ Instance_GHI | |_ Datapoint_I | |_ Datapoint_II |__ Device3 |__ DatasourceA |_ Instance_ABC |_ Datapoint_I |_ Datapoint_II |_ Instance_DEF |_ Datapoint_I |_ Datapoint_II |_ Instance_GHI |_ Datapoint_I |_ Datapoint_II If Instance_ABC's Datapoint_I is alerting at the specified cluster threshold in my hypothetical group, I want to generate a cluster alert. If some time afterwards, the situation in my environment gets worse and Instance_GHI's Datapoint_II is alerting at the specified cluster threshold, I want another cluster alert for that instance-datapoint as well.
  5. Joe Williams

    Issues With Creating A Datasource

    I took a working groovy script datasource and am now trying to adjust it to some needs we have. This data will end up giving us alert totals for each month so we can build reports. Any ideas? Here is what I have so far. import javax.crypto.Mac; import javax.crypto.spec.SecretKeySpec; import org.apache.commons.codec.binary.Hex; import groovy.json.JsonSlurper; //define credentials and url def accessId = hostProps.get('lmaccess.id'); def accessKey = hostProps.get('lmaccess.key'); def account = hostProps.get('lmaccount'); def alertgroup = hostProps.get('lmaccess.group'); def collectionFailures = 0 def failures = [:] def client = new LogicMonitorRestClient(accessId, accessKey, account, this.&println) try { def alerts = client.get("/device/groups/" + alertgroup + "/alerts", fields: "severity", filter: "startEpoch>:1538370000,endEpoch<:1541048399,cleared:*") //warnings = alerts.findAll {it.severity == 2}.size() println "WarningCount: ${alerts.findAll {it.severity == 2}.size()}" println "ErrorCount: ${alerts.findAll { it.severity == 3 }.size()}" println "CriticalCount: ${alerts.findAll { it.severity == 4 }.size()}" println "TotalAlerts: ${alerts.size()}" } catch (Throwable e) { failures["alerts"] = e.toString() collectionFailures += 1 } // Do error reporting println "CollectionFailures:${collectionFailures}" failures.each{ query, exception -> println "Exception while querying $query:" println exception } return 0 ////////////////////// // HELPER FUNCTIONS // ////////////////////// class LogicMonitorRestClient { String userKey String userId String account int maxPages = 20 int itemsPerPage = 1000 def println LogicMonitorRestClient(userId, userKey, account, printFunction) { this.userId = userId this.userKey = userKey this.account = account this.println = printFunction } def generateHeaders(verb, path) { def headers = [:] def epoch = System.currentTimeMillis() def requestVars = verb + epoch + path // Calculate signature def hmac = Mac.getInstance('HmacSHA256') def secret = new SecretKeySpec(userKey.getBytes(), 'HmacSHA256') hmac.init(secret) // Sign the request def hmac_signed = Hex.encodeHexString(hmac.doFinal(requestVars.getBytes())) def signature = hmac_signed.bytes.encodeBase64() headers["Authorization"] = "LMv1 " + userId + ":" + signature + ":" + epoch headers["Content-Type"] = "application/json" return headers } def packParams(params) { def pairs = [] params.each{ k, v -> pairs << ("${k}=${v}")} return pairs.join("&") } // Non paginating, raw version of the get function def _rawGet(path, params) { def baseUrl = 'https://' + account + '.logicmonitor.com' + '/santaba/rest' + path def packedParams = "" if(params) { packedParams = "?"+packParams(params) } def query = baseUrl+packedParams def url = query.toURL() def response = url.getText(useCaches: true, allowUserInteraction: false, requestProperties: generateHeaders("GET", path)) return response } // Public interface for getting stuff. def get(Map args=[:], path) { def itemsReceived = [] def pageReads = 0 // Impose our own paging parameters. args.size = itemsPerPage args.offset = 0 while(true) { // Do da nastieh def response = new JsonSlurper().parseText(_rawGet(path, args)) if (response.errmsg == "OK") { // Catch individual items if (response.data.items == null) { return response.data } itemsReceived += response.data.items // Check if there are more items // if (response.data.total > itemsReceived.size()) // { args.offset = args.size + args.offset // } // else // { // break // we are done // } } else { // Throw an exception with whatever error message we got. throw new Exception(response.errmsg) } pageReads += 1 // Check that we don't exceed max pages. if (pageReads >= maxPages) { break } if (response.data.total > 0) { break } } return itemsReceived } } If I run the URL with the API creds in my test powershell script, it works perfectly. When I test it in LM as a datasource, I get the attached error.
  6. Hi, I already raised this with LogicMonitor via email, but just re-iterating here. For some datapoints, where we want to generate warning/error/critical alerts, you can use the collection interval and alert trigger interval to basically set the amount of time that should elapse if a datapoint threshold/logic triggers an alert. But it's not possible to currently for example set a completely custom interval based on duration. e.g. if I want to generate a warning alert after 3 hours, and an error alert after 4 hours, you have to use a combination of the two things above to get close enough to the duration you want. It would be great if you could, regardless of the collection interval, have more options in the alert trigger interval (currently 1 to 10, and 20,30,40,50,60). So, if I have a collection interval of 5 minutes, I can currently achieve 2.5 hours or 5 hours using 30 and 60 alert trigger interval respectively. Couldn't there be a regular number input rather than a drop-down with predefined options for the alert trigger interval? or a separate option that allows a completely flexible duration? Also, can a custom interval already be set using the API, regardless of the UI, as I could try that? If there's another way to achieve what I want, would be happy to hear it.. :-) Thanks, Roland
  7. Hey All, Couldn't find a way to do this using the alert tokens available so I figured I would post it here. I noticed some cool features from other monitoring tools that allow graphs to be sent in the alert body to PagerDuty. So when I receive a PagerDuty page from LM it would be nice to see the associated graph with the data point that is alerting. While just the alert text is good enough for most scenarios I think seeing how big of a jump/spike the data point made before alerting would be useful. The alert "C drive is 90% full" is all fine and good but when you see a graph showing it go from 12% to that 90% in just a minute or two then you know something is really up and might need more expediency as it will probably continue to fill up at that rate.
  8. Archana

    LM - alerts

    When I go through the documents regarding Alerts, I understood that the threshold for a particular Metric/Datapoint can be set in the Datasource itself. I have made an example alert and got an clear idea about the concept. I have a doubt whether can we create an alert for a widget. For example, I have a Gauge widget and represented CPU percentage over there. So how can I set my alert only to that particular widget if my CPU percentage crosses 60% or 90%.
  9. Szymon Grzemski

    Alerts for mounted ISOs on Linux server

    Hello all, Recently, I had mounted a RHEL ISO on a /data/rhel_iso directory, on a system that is monitored with LogicMonitor. 5 minutes later I received an alert about 105% utilization of /data/rhel_iso, which is reasonable but strange, as ISO takes same space as the files inside it. When I unmounted the ISO I got an alert of a filesystem that is not responding. How to disable those ISO related alerts? They are irrational. Many thanks in advance, Szymon
  10. IFTTT is a free SaaS platform that helps you "do more with all your apps and devices" - by providing an integration point between commonly used services and platforms. In the following example, we're using the IFTTT Applet webhooks "trigger" to activate a Philips Hue wireless lighting "action" - blinking the lights of the connected Hue platform as a result of a LogicMonitor alert! Other things you might be able to do with LogicMonitor alerts, through IFTTT (lots of untested possibilities!) : Change lighting colors based on alert status (red for new, green for cleared, etc.) Receive alert notifications to connected systems like Skype, Twitter, Evernote, or Google. Play music on a connected Sonos system after triggering an alert. Turn on a connected Smart Plug like the Wemo from Belkin. The Finished Result The following tutorial assumes that you have an IFTTT account created and permissions to add an integration to your LogicMonitor account. Step 1: Log into your IFTTT account and create a new 'Applet' Step 2: Search for and choose the 'Webhooks' service. Step 3: Choose the 'Receive a Web Request' trigger. Step 4: Configure (and remember) the event name that will be recognized by the incoming webhook to trigger the event. Step 5: Configure the 'Action' that will be taken when this event is triggered in IFTTT - lots of intriguing possibilities! Step 6: Once you've added and configured the 'Action,' review the applet settings and click 'Finish' to save the Applet. Step 7: Select 'Services' from the account dropdown - we will be looking up the incoming webhook URL for our account so we know where to send our alerts. Step 8: Search for the 'Webhooks' service and select it to proceed. Step 9: Select the 'Documentation' link from the 'Webhooks' services page. Step 10: Copy the incoming Event trigger URL along with the key for your account. You will replace {event} in the URL with the one you configured above. Step 11: Moving to your LogicMonitor account, navigate to 'Settings -> Integrations' and add a new 'Custom HTTP Delivery' integration using the event name from Step 4 and the URL (with key) from Step 10 : Step 12: IFTTT allows you to include an (optional!) payload - which will show in the 'Activity Log' of the IFTTT Applet. Step 13: Test Alert Delivery and you should see output similar to below in the IFTTT Activity Log. Step 14: Save your integration, assign it to an Escalation Chain, and assign the Escalation Chain to an Alert Rule - and now we've configured a simple integration between LogicMonitor and IFTTT that could form the basis of a handful of interesting alert actions!
  11. The majority of the alerts we get every day are high volume usage and sometimes its hard to work with this because you don't know how large the volume is. For example, working on a 50TB system and your threshold is set at 90% you will begin to receive alerts when you still have 5 TB left. Would it be possible to have a feature that would allow you to see the size of the drive and set the alert for GB instead of percentage? This would allow for faster use of thresholds on drive alerts.
  12. Currently the hyperlink in an alert notification email requires that users have permission to view the all Alerts view. We don't want users to have access to this view. Please make it so that the notification hyperlink to ack an alert works without the need for this permission.
  13. Hello all, I am trying to get updates on alerts that span multiple days. So, our normal code will grab new alert data for today, let's say. However, we need to go back to old alerts and see if they have been resolved yet, so we can do accurate reporting. What I want to do is provide a list of the IDs for a handful of alerts and get more than 1 row back. Any ideas on how I can formulate an alert query so that I can get 2 rows back for, say ID=DS1234567 and ID=1234568 in the same request? Here is my example call: https://mysite.logicmonitor.com/santaba/rest/alert/alerts?filter=id:DS1275294,id:DS1472582 My hope is to combine a batch of calls so I don't flood the service and so I can get results faster. Given a list of unique IDs, any thoughts on requesting a batch? E
  14. One of the biggest frustrations for us with LogicMonitor is breaking a bunch of dashboards and alerts if we move device groups to another location in the overall device group tree. For example: Say we have a nested device group called "Infrastructure/Hosts". Now our environment has changed a bit, and we want to add better organization to support the new changes to our environment. We move the hosts group to the following location "Infrastructure/PhysicalDevices/Hosts". All alert rules and dashboards that were filtering on "Infrastructure/Hosts" have now been broken, even though the devices in the group need the same alerting and dashboards. Now we have to go through and fix each Alert Rule and Dashboard Widget that used "Infrastructure/Hosts" to now point to "Infrastructure/PhysicalDevices/Hosts". As you can imagine, as environments scale up and evolve, subgroups are going to be moved around all the time. Redoing dashboards and alerts every time this happens adds a tremendous amount of labor, and can lead to people missing changes, leaving behind broken alerts or dashboards that you may not find out about until an emergency has already happened. What we're proposing: "Sticky" device group handling - If a group or subgroup used in an Alert Rule or Dashboard changes location, this location should automatically be updated and reflected in the dashboard. This is how most modern applications handle this sort of thing anyway, and it's a huge time saver. Given the critical nature of this tool's function, this would go along way towards preventing accidentally breaking monitoring that companies rely on to keep their environments running.
  15. ivan_martinez

    masive statusflap alerts

    Hi to all: I have problems with "StatusFlap" type alerts, the truth is that the platform send me many messages to my email, so, does anyone know if they are false positives? My network devices have no problems on the interfaces, how can i stop this? let me know please, kind regards Iván Martínez
  16. Dave Binford

    Custom Alert-Groups for SDT

    When we reboot a Server or a Application Set our NOC does not know all the Devices, Instances and/or services impacted so we get flooded with alerts for a known event. Example: I need to reboot device WebServer-xyz - The Server, the Switch ports, Storage Sessions and HTTP/S Service are all monitored in LM Like to be able to SDT just these items with one SDT, and not entire switches or devices. So be able to create an "Alert-Group" ie "WebServer-xyz" where you can then add Instances from multiple devices, entire device, Service, Instance Groups, Device Group aka any defined in LM. Then just Add one SDT to the Alert-Group aka one-stop-shopping.
  17. A feature enhancement that enables alerts to be limited to certain days of the week as well as hours/mins would be very beneficial as there are often occasions when an alert is needed in the working week but not at the weekend. An example is NetApp snapmirror lagtime. Mon-Sat these are set to replicate but not on a Sunday. We look for 24 hour lag most of the time to see an issue but on a Monday this would be 48 hours (as there would have been no snapmirror since the Sat). I appreciate I can create ways to manage alerts using time based escalations however there is no way to affect the alerts view on the dashboard with this approach. Hopefully something that other might also want which can be added in the future?
  18. Currently, Alert sounds are persistent even when an instance is placed in SDT. We need functionality added to suppress all alert sounds while an instance is in SDT.
  19. DanAvni

    Alerts list scroll bars

    Looking on the alerts list in my account, the scroll bars are visible but in order to view them I have to scroll down. Try determining the escalation chain (right most column) on the 3rd row. You will have to scroll down to see the horizontal scroll bar which moves the 3rd line out of screen. Then after you scroll right you scroll back up and then you don’t see the name as it moved out of the screen (1st column). The scroll bars (and column headers) should be always visible allowing me to scroll without losing my flow of thought
  20. Extending alert information from LogicMonitor to other 3rd Party systems is pretty common for us, however, the available tokens today to describe the alert is missing a few bits of data (we feel). It would be incredibly helpful to have an alert token that contains the LM User responsible for Acknowledging the alert, and a separate token for the Ack comment. Having these tokens allows us to better map alerting details to upstream and downstream integrations.
  21. rixter

    Enhanced alert notes

    We have a team that handle all alerts escalations which spans 3 different shifts. If in case an alert can't be immediately corrected and requires a followup, a note is entered for the alert on the alerts dashboard with basic info: date, person contacted and any information the the shift can review to determine whether or not a follow up is required from them. Unfortunately, once multiple notes are entered, legibility decreases and unless you enter your name, there's no way of easily determining who entered the note. The ability to enter a note where a record of the time of entry and user would improve functionality; more like a log for the alert itself.
  22. apardiwalla

    Clearing Alerts Manually

    Hey guys! So I wanted to bring up the idea of clearing alerts manually. I searched the feature requests threads and haven't really found an answer or a thread that matched what I was looking for so I thought I would take a shot at doing one of these. Apologies in advance if this has been discussed already.. Or if I don't make much sense. I'm fairly new to using the platform so I might not be fully up to speed with all the lingo. So let me explain a bit of what brought me to this request.. I have set up monitoring on our virtual machines to monitor CPU usage by percentage (x\100). I then have an alert setup to indicate a stuck process which would shoot out an alert if a data point hasn't changed (+/-3%) on the next 3 intervals (which is set to 3 minutes). The alert clears if it changes after the next 4 intervals. The process above has been working great so far but I quickly realized that we didnt really care about anything stuck between 0-50%.. we only wanted to focus on values that were stuck at 50% or above. I then changed the valid value range to be between 5000-10000 (50-100%) which produced a lot more productive results. I did notice that CPU's which did end up being stuck within the 50-100% range, then clear to a value outside of the valid value range (X<50) then this would produce NO DATA thus having the initial alert stay in limbo forever. You could manually clear them by going to the device and toggle alerting on the device off and on again.. but doing that for a large amount of alerts takes a lot of time. I'm okay with the way I have it set up (but I do believe that the above may be a bug..) I just kind wished we could manually clear alerts from the alerting window without having to take extra steps. Maybe something next to the acknowledge button? I might have jumbled this up so please ask if I need to clarify any of the information above. I can provide screenshots if needed as well. Thanks for taking the time to read this! TL;DR = Let us manually clear alerts from the alert window without having to go into specific devices and toggle alerting.
  23. Please add an option so that when a device is in the IdleInterval state (HostStatus DS), then all other alerts are automatically removed. At the moment some devices retain their ping loss alert even though the HostStatus DS has triggered the IdleInterval alert (no data being received). Our users are finding it confusing when some devices have both alerts, while others have only the IdleInterval alert.
  24. Jeffrey McGovern

    Add scheduling option to alerting

    Use Case: Provider I am a provider with a substantial amount of customers being monitored by the platform. A single customer requests monitoring to be suspended for 14 days as they do a physical DC move. The move will be 1:1 so all systems will come back up in same logical location and only move physical locations. Requests are filed, meetings are had and the day comes to move and NOC turns alerting off for customer. Uneventful days go by and on the day that alerting is supposed to be turned back on a regional event happens that the provider NOC is responding to for other customers (You can insert any normal well defined chaos that happens in a NOC here) and alerting does not get re-enabled for the customer with the physical DC move. Enterprise: Oracle team notifies the NOC that a weekend upgrade will be happening on the Oracle customer and the upgrade team does not want to be notified of any alarms as they will have their hands full with the upgrade and they will call back when the upgrade is complete. NOC turns alerting off and upgrade team never calls to say that they are done working. Request: Much like SDT enable calendaring and scheduling as a option for enabling/disabling alerting as a backup option in case of failure in manual processes.
  25. Kris Wolton

    SDT for minor alerts

    Hi, Every morning i have to clear a couple of hundred alerts from my inbox that come in while our customers servers are running backups. We often get 'disk latency' 'network latency' type alerts while the backups are running. As they are run outside of hours, we do not really need these. Please could you add a way of creating SDT based on alert severity or better yet, build a mechanism to schedule backup window times to filter noise alerts like disk latency. I'm sure I'm not the only one to encounter this issue. Kris