Search the Community

Showing results for tags 'alerts'.



More search options

  • Search By Tags

    Type tags separated by commas.
  • Search By Author

Content Type


Forums

  • From LogicMonitor
    • Product Announcements
    • LM Staff Contributions
    • Community Events
  • LogicMonitor Product Discussion
    • Feature Requests
    • LM Exchange
    • Ask the Community

Find results in...

Find results that contain...


Date Created

  • Start

    End


Last Updated

  • Start

    End


Filter by number of...

Joined

  • Start

    End


Group


About Me

Found 53 results

  1. This is not an advertisement by any means, just offering to help anyone who struggles with this as well. As an MSP, we have struggled with how to handle alert tuning in bulk with it comes to things like Interfaces (instances). Some of the interfaces you want to alarm as critical, some you want as error and others you don't care about at all. LM provided a partial fix for that with their Groovy based "Status" alarm based on the interface description, but it didn't take it far enough. We started creating manual interface groups called "Critical" and performing Alert Tuning on that "parent" only to find out that it doesn't work as interfaces move in and out of it. I was beyond disappointed, but it said it right at the top of the page: Changes made to Alerting or Thresholds will only affect existing instances currently in this Instance Group. Instances added later will not be subject to the changes. Anyway, long story short we finally decided to write our own application to do it and built it in Azure. We built it to handle multiple data sources so we could group other instances (like VMware vDisks) and do the same bulk changes. It was written to be a data source in your environment, so that you can apply it to whatever devices you want and just call out to the API with the device name. If you have any interest in using it, let me know. There are costs associated as Azure bills based on usage, but it is pretty small for us (< $200/mo). Trust me, I wish LM solved this without having to write the app!
  2. So is there no way to easily export the alerts page? Am I missing something or why is there no way to export the alerts I need to share with folks to a CSV, PDF, etc....? Why is there no way to do this on the main Alerts page? Also is there any way to list, show the actual alert details? I don't want to have to click on every alert so I can then view the alert details. Is there a way to show this as one of the columns? So I'm looking for a way to export the Alerts I am viewing after creating my filter and to also include the Alert Message details. Please don't tell me I have to use the horrific "Reports' section and then build a Report based on the "Alerts" template to export alerts. This as far as I can tell doesn't even allow to show the alert message details which I need included in the export/report.
  3. The ability to modify alert notes en mass the same way you can acknowledge multiple alerts at once would be a nice thing to have. When multiple (20+) alerts have a incorrect note put into them it is time consuming to go back and manually fix them one by one. I see the in new UI you can mass tag them but when you go to modify the note it tells you their already acknowledged. Thanks!
  4. We have begun implementing a tagging standard in our cloud accounts to better control discovered resources and route alerts accordingly. I would like to be able to route alerts by default based on the value of a tag. I'm aware that I can already set up specific users and then achieve exactly what I'm requesting, but I would much prefer to have a blanket rule that uses the tag's value as the recipient email address(es) directly. Some examples below: system.aws.tag.MonitorAlertEmail=ThisIsAnExample@PleaseOfferThisFeature.com system.aws.tag.SendAlertsHere=AnotherExcellentExample@WeWillPayExtraForThisFunctionality.gov AnotherEmailAddress@OkayNotReally.net See the screenshots below for a visual example of how I'd like to structure this automation.
  5. I noticed that the communities.logicmonitor.com requests rights to "show notifications". It would be awesome if our LogicMonitor instances could do the same and that we could configure in settings, so that we could push style notifications to our workstations on alerts.
  6. For JDBC datasources, please create a token that would enable us to include the JDBC driver exception message in the alert for Query Status data point alerts, the ones that are based on: Query status - 1=ok, 2=credential invalid, 3=connection string invalid, 4=connection rejected, 5=driver not supported, 6=connection failure, 7=query failure This would greatly help us to achieve faster time to resolution of incidents when the exception is code of type 6 and 7.
  7. Hi I'm pretty new to LM and am struggling with the big number widget. I have a need to show alert counts for a specific subscription, showing new (unacknowledged/cleared) alerts and then show some history i.e. unacknowledged/cleared over last 7 days, current month etc. Any guidance appreciated
  8. I took a working groovy script datasource and am now trying to adjust it to some needs we have. This data will end up giving us alert totals for each month so we can build reports. Any ideas? Here is what I have so far. import javax.crypto.Mac; import javax.crypto.spec.SecretKeySpec; import org.apache.commons.codec.binary.Hex; import groovy.json.JsonSlurper; //define credentials and url def accessId = hostProps.get('lmaccess.id'); def accessKey = hostProps.get('lmaccess.key'); def account = hostProps.get('lmaccount'); def alertgroup = hostProps.get('lmaccess.group'); def collectionFailures = 0 def failures = [:] def client = new LogicMonitorRestClient(accessId, accessKey, account, this.&println) try { def alerts = client.get("/device/groups/" + alertgroup + "/alerts", fields: "severity", filter: "startEpoch>:1538370000,endEpoch<:1541048399,cleared:*") //warnings = alerts.findAll {it.severity == 2}.size() println "WarningCount: ${alerts.findAll {it.severity == 2}.size()}" println "ErrorCount: ${alerts.findAll { it.severity == 3 }.size()}" println "CriticalCount: ${alerts.findAll { it.severity == 4 }.size()}" println "TotalAlerts: ${alerts.size()}" } catch (Throwable e) { failures["alerts"] = e.toString() collectionFailures += 1 } // Do error reporting println "CollectionFailures:${collectionFailures}" failures.each{ query, exception -> println "Exception while querying $query:" println exception } return 0 ////////////////////// // HELPER FUNCTIONS // ////////////////////// class LogicMonitorRestClient { String userKey String userId String account int maxPages = 20 int itemsPerPage = 1000 def println LogicMonitorRestClient(userId, userKey, account, printFunction) { this.userId = userId this.userKey = userKey this.account = account this.println = printFunction } def generateHeaders(verb, path) { def headers = [:] def epoch = System.currentTimeMillis() def requestVars = verb + epoch + path // Calculate signature def hmac = Mac.getInstance('HmacSHA256') def secret = new SecretKeySpec(userKey.getBytes(), 'HmacSHA256') hmac.init(secret) // Sign the request def hmac_signed = Hex.encodeHexString(hmac.doFinal(requestVars.getBytes())) def signature = hmac_signed.bytes.encodeBase64() headers["Authorization"] = "LMv1 " + userId + ":" + signature + ":" + epoch headers["Content-Type"] = "application/json" return headers } def packParams(params) { def pairs = [] params.each{ k, v -> pairs << ("${k}=${v}")} return pairs.join("&") } // Non paginating, raw version of the get function def _rawGet(path, params) { def baseUrl = 'https://' + account + '.logicmonitor.com' + '/santaba/rest' + path def packedParams = "" if(params) { packedParams = "?"+packParams(params) } def query = baseUrl+packedParams def url = query.toURL() def response = url.getText(useCaches: true, allowUserInteraction: false, requestProperties: generateHeaders("GET", path)) return response } // Public interface for getting stuff. def get(Map args=[:], path) { def itemsReceived = [] def pageReads = 0 // Impose our own paging parameters. args.size = itemsPerPage args.offset = 0 while(true) { // Do da nastieh def response = new JsonSlurper().parseText(_rawGet(path, args)) if (response.errmsg == "OK") { // Catch individual items if (response.data.items == null) { return response.data } itemsReceived += response.data.items // Check if there are more items // if (response.data.total > itemsReceived.size()) // { args.offset = args.size + args.offset // } // else // { // break // we are done // } } else { // Throw an exception with whatever error message we got. throw new Exception(response.errmsg) } pageReads += 1 // Check that we don't exceed max pages. if (pageReads >= maxPages) { break } if (response.data.total > 0) { break } } return itemsReceived } } If I run the URL with the API creds in my test powershell script, it works perfectly. When I test it in LM as a datasource, I get the attached error.
  9. Hi, I already raised this with LogicMonitor via email, but just re-iterating here. For some datapoints, where we want to generate warning/error/critical alerts, you can use the collection interval and alert trigger interval to basically set the amount of time that should elapse if a datapoint threshold/logic triggers an alert. But it's not possible to currently for example set a completely custom interval based on duration. e.g. if I want to generate a warning alert after 3 hours, and an error alert after 4 hours, you have to use a combination of the two things above to get close enough to the duration you want. It would be great if you could, regardless of the collection interval, have more options in the alert trigger interval (currently 1 to 10, and 20,30,40,50,60). So, if I have a collection interval of 5 minutes, I can currently achieve 2.5 hours or 5 hours using 30 and 60 alert trigger interval respectively. Couldn't there be a regular number input rather than a drop-down with predefined options for the alert trigger interval? or a separate option that allows a completely flexible duration? Also, can a custom interval already be set using the API, regardless of the UI, as I could try that? If there's another way to achieve what I want, would be happy to hear it.. :-) Thanks, Roland
  10. Archana

    LM - alerts

    When I go through the documents regarding Alerts, I understood that the threshold for a particular Metric/Datapoint can be set in the Datasource itself. I have made an example alert and got an clear idea about the concept. I have a doubt whether can we create an alert for a widget. For example, I have a Gauge widget and represented CPU percentage over there. So how can I set my alert only to that particular widget if my CPU percentage crosses 60% or 90%.
  11. IFTTT is a free SaaS platform that helps you "do more with all your apps and devices" - by providing an integration point between commonly used services and platforms. In the following example, we're using the IFTTT Applet webhooks "trigger" to activate a Philips Hue wireless lighting "action" - blinking the lights of the connected Hue platform as a result of a LogicMonitor alert! Other things you might be able to do with LogicMonitor alerts, through IFTTT (lots of untested possibilities!) : Change lighting colors based on alert status (red for new, green for cleared, etc.) Receive alert notifications to connected systems like Skype, Twitter, Evernote, or Google. Play music on a connected Sonos system after triggering an alert. Turn on a connected Smart Plug like the Wemo from Belkin. The Finished Result The following tutorial assumes that you have an IFTTT account created and permissions to add an integration to your LogicMonitor account. Step 1: Log into your IFTTT account and create a new 'Applet' Step 2: Search for and choose the 'Webhooks' service. Step 3: Choose the 'Receive a Web Request' trigger. Step 4: Configure (and remember) the event name that will be recognized by the incoming webhook to trigger the event. Step 5: Configure the 'Action' that will be taken when this event is triggered in IFTTT - lots of intriguing possibilities! Step 6: Once you've added and configured the 'Action,' review the applet settings and click 'Finish' to save the Applet. Step 7: Select 'Services' from the account dropdown - we will be looking up the incoming webhook URL for our account so we know where to send our alerts. Step 8: Search for the 'Webhooks' service and select it to proceed. Step 9: Select the 'Documentation' link from the 'Webhooks' services page. Step 10: Copy the incoming Event trigger URL along with the key for your account. You will replace {event} in the URL with the one you configured above. Step 11: Moving to your LogicMonitor account, navigate to 'Settings -> Integrations' and add a new 'Custom HTTP Delivery' integration using the event name from Step 4 and the URL (with key) from Step 10 : Step 12: IFTTT allows you to include an (optional!) payload - which will show in the 'Activity Log' of the IFTTT Applet. Step 13: Test Alert Delivery and you should see output similar to below in the IFTTT Activity Log. Step 14: Save your integration, assign it to an Escalation Chain, and assign the Escalation Chain to an Alert Rule - and now we've configured a simple integration between LogicMonitor and IFTTT that could form the basis of a handful of interesting alert actions!
  12. Hello all, Recently, I had mounted a RHEL ISO on a /data/rhel_iso directory, on a system that is monitored with LogicMonitor. 5 minutes later I received an alert about 105% utilization of /data/rhel_iso, which is reasonable but strange, as ISO takes same space as the files inside it. When I unmounted the ISO I got an alert of a filesystem that is not responding. How to disable those ISO related alerts? They are irrational. Many thanks in advance, Szymon
  13. The majority of the alerts we get every day are high volume usage and sometimes its hard to work with this because you don't know how large the volume is. For example, working on a 50TB system and your threshold is set at 90% you will begin to receive alerts when you still have 5 TB left. Would it be possible to have a feature that would allow you to see the size of the drive and set the alert for GB instead of percentage? This would allow for faster use of thresholds on drive alerts.
  14. Currently the hyperlink in an alert notification email requires that users have permission to view the all Alerts view. We don't want users to have access to this view. Please make it so that the notification hyperlink to ack an alert works without the need for this permission.
  15. Hello all, I am trying to get updates on alerts that span multiple days. So, our normal code will grab new alert data for today, let's say. However, we need to go back to old alerts and see if they have been resolved yet, so we can do accurate reporting. What I want to do is provide a list of the IDs for a handful of alerts and get more than 1 row back. Any ideas on how I can formulate an alert query so that I can get 2 rows back for, say ID=DS1234567 and ID=1234568 in the same request? Here is my example call: https://mysite.logicmonitor.com/santaba/rest/alert/alerts?filter=id:DS1275294,id:DS1472582 My hope is to combine a batch of calls so I don't flood the service and so I can get results faster. Given a list of unique IDs, any thoughts on requesting a batch? E
  16. One of the biggest frustrations for us with LogicMonitor is breaking a bunch of dashboards and alerts if we move device groups to another location in the overall device group tree. For example: Say we have a nested device group called "Infrastructure/Hosts". Now our environment has changed a bit, and we want to add better organization to support the new changes to our environment. We move the hosts group to the following location "Infrastructure/PhysicalDevices/Hosts". All alert rules and dashboards that were filtering on "Infrastructure/Hosts" have now been broken, even though the devices in the group need the same alerting and dashboards. Now we have to go through and fix each Alert Rule and Dashboard Widget that used "Infrastructure/Hosts" to now point to "Infrastructure/PhysicalDevices/Hosts". As you can imagine, as environments scale up and evolve, subgroups are going to be moved around all the time. Redoing dashboards and alerts every time this happens adds a tremendous amount of labor, and can lead to people missing changes, leaving behind broken alerts or dashboards that you may not find out about until an emergency has already happened. What we're proposing: "Sticky" device group handling - If a group or subgroup used in an Alert Rule or Dashboard changes location, this location should automatically be updated and reflected in the dashboard. This is how most modern applications handle this sort of thing anyway, and it's a huge time saver. Given the critical nature of this tool's function, this would go along way towards preventing accidentally breaking monitoring that companies rely on to keep their environments running.
  17. Hey All, Couldn't find a way to do this using the alert tokens available so I figured I would post it here. I noticed some cool features from other monitoring tools that allow graphs to be sent in the alert body to PagerDuty. So when I receive a PagerDuty page from LM it would be nice to see the associated graph with the data point that is alerting. While just the alert text is good enough for most scenarios I think seeing how big of a jump/spike the data point made before alerting would be useful. The alert "C drive is 90% full" is all fine and good but when you see a graph showing it go from 12% to that 90% in just a minute or two then you know something is really up and might need more expediency as it will probably continue to fill up at that rate.
  18. When we reboot a Server or a Application Set our NOC does not know all the Devices, Instances and/or services impacted so we get flooded with alerts for a known event. Example: I need to reboot device WebServer-xyz - The Server, the Switch ports, Storage Sessions and HTTP/S Service are all monitored in LM Like to be able to SDT just these items with one SDT, and not entire switches or devices. So be able to create an "Alert-Group" ie "WebServer-xyz" where you can then add Instances from multiple devices, entire device, Service, Instance Groups, Device Group aka any defined in LM. Then just Add one SDT to the Alert-Group aka one-stop-shopping.
  19. Looking on the alerts list in my account, the scroll bars are visible but in order to view them I have to scroll down. Try determining the escalation chain (right most column) on the 3rd row. You will have to scroll down to see the horizontal scroll bar which moves the 3rd line out of screen. Then after you scroll right you scroll back up and then you don’t see the name as it moved out of the screen (1st column). The scroll bars (and column headers) should be always visible allowing me to scroll without losing my flow of thought
  20. Extending alert information from LogicMonitor to other 3rd Party systems is pretty common for us, however, the available tokens today to describe the alert is missing a few bits of data (we feel). It would be incredibly helpful to have an alert token that contains the LM User responsible for Acknowledging the alert, and a separate token for the Ack comment. Having these tokens allows us to better map alerting details to upstream and downstream integrations.
  21. We have a team that handle all alerts escalations which spans 3 different shifts. If in case an alert can't be immediately corrected and requires a followup, a note is entered for the alert on the alerts dashboard with basic info: date, person contacted and any information the the shift can review to determine whether or not a follow up is required from them. Unfortunately, once multiple notes are entered, legibility decreases and unless you enter your name, there's no way of easily determining who entered the note. The ability to enter a note where a record of the time of entry and user would improve functionality; more like a log for the alert itself.
  22. Hey guys! So I wanted to bring up the idea of clearing alerts manually. I searched the feature requests threads and haven't really found an answer or a thread that matched what I was looking for so I thought I would take a shot at doing one of these. Apologies in advance if this has been discussed already.. Or if I don't make much sense. I'm fairly new to using the platform so I might not be fully up to speed with all the lingo. So let me explain a bit of what brought me to this request.. I have set up monitoring on our virtual machines to monitor CPU usage by percentage (x\100). I then have an alert setup to indicate a stuck process which would shoot out an alert if a data point hasn't changed (+/-3%) on the next 3 intervals (which is set to 3 minutes). The alert clears if it changes after the next 4 intervals. The process above has been working great so far but I quickly realized that we didnt really care about anything stuck between 0-50%.. we only wanted to focus on values that were stuck at 50% or above. I then changed the valid value range to be between 5000-10000 (50-100%) which produced a lot more productive results. I did notice that CPU's which did end up being stuck within the 50-100% range, then clear to a value outside of the valid value range (X<50) then this would produce NO DATA thus having the initial alert stay in limbo forever. You could manually clear them by going to the device and toggle alerting on the device off and on again.. but doing that for a large amount of alerts takes a lot of time. I'm okay with the way I have it set up (but I do believe that the above may be a bug..) I just kind wished we could manually clear alerts from the alerting window without having to take extra steps. Maybe something next to the acknowledge button? I might have jumbled this up so please ask if I need to clarify any of the information above. I can provide screenshots if needed as well. Thanks for taking the time to read this! TL;DR = Let us manually clear alerts from the alert window without having to go into specific devices and toggle alerting.
  23. Please add an option so that when a device is in the IdleInterval state (HostStatus DS), then all other alerts are automatically removed. At the moment some devices retain their ping loss alert even though the HostStatus DS has triggered the IdleInterval alert (no data being received). Our users are finding it confusing when some devices have both alerts, while others have only the IdleInterval alert.
  24. Use Case: Provider I am a provider with a substantial amount of customers being monitored by the platform. A single customer requests monitoring to be suspended for 14 days as they do a physical DC move. The move will be 1:1 so all systems will come back up in same logical location and only move physical locations. Requests are filed, meetings are had and the day comes to move and NOC turns alerting off for customer. Uneventful days go by and on the day that alerting is supposed to be turned back on a regional event happens that the provider NOC is responding to for other customers (You can insert any normal well defined chaos that happens in a NOC here) and alerting does not get re-enabled for the customer with the physical DC move. Enterprise: Oracle team notifies the NOC that a weekend upgrade will be happening on the Oracle customer and the upgrade team does not want to be notified of any alarms as they will have their hands full with the upgrade and they will call back when the upgrade is complete. NOC turns alerting off and upgrade team never calls to say that they are done working. Request: Much like SDT enable calendaring and scheduling as a option for enabling/disabling alerting as a backup option in case of failure in manual processes.
  25. Hi, Every morning i have to clear a couple of hundred alerts from my inbox that come in while our customers servers are running backups. We often get 'disk latency' 'network latency' type alerts while the backups are running. As they are run outside of hours, we do not really need these. Please could you add a way of creating SDT based on alert severity or better yet, build a mechanism to schedule backup window times to filter noise alerts like disk latency. I'm sure I'm not the only one to encounter this issue. Kris