Search the Community

Showing results for tags 'alerts'.



More search options

  • Search By Tags

    Type tags separated by commas.
  • Search By Author

Content Type


Forums

  • LogicModule Exchange
    • LM Exchange
    • LM Staff Contributions
  • Product Announcements
    • LogicMonitor Notices
  • LogicMonitor Product Q&A
    • Feature Requests
    • Ask the Community
    • From the Front

Found 44 results

  1. Issues With Creating A Datasource

    I took a working groovy script datasource and am now trying to adjust it to some needs we have. This data will end up giving us alert totals for each month so we can build reports. Any ideas? Here is what I have so far. import javax.crypto.Mac; import javax.crypto.spec.SecretKeySpec; import org.apache.commons.codec.binary.Hex; import groovy.json.JsonSlurper; //define credentials and url def accessId = hostProps.get('lmaccess.id'); def accessKey = hostProps.get('lmaccess.key'); def account = hostProps.get('lmaccount'); def alertgroup = hostProps.get('lmaccess.group'); def collectionFailures = 0 def failures = [:] def client = new LogicMonitorRestClient(accessId, accessKey, account, this.&println) try { def alerts = client.get("/device/groups/" + alertgroup + "/alerts", fields: "severity", filter: "startEpoch>:1538370000,endEpoch<:1541048399,cleared:*") //warnings = alerts.findAll {it.severity == 2}.size() println "WarningCount: ${alerts.findAll {it.severity == 2}.size()}" println "ErrorCount: ${alerts.findAll { it.severity == 3 }.size()}" println "CriticalCount: ${alerts.findAll { it.severity == 4 }.size()}" println "TotalAlerts: ${alerts.size()}" } catch (Throwable e) { failures["alerts"] = e.toString() collectionFailures += 1 } // Do error reporting println "CollectionFailures:${collectionFailures}" failures.each{ query, exception -> println "Exception while querying $query:" println exception } return 0 ////////////////////// // HELPER FUNCTIONS // ////////////////////// class LogicMonitorRestClient { String userKey String userId String account int maxPages = 20 int itemsPerPage = 1000 def println LogicMonitorRestClient(userId, userKey, account, printFunction) { this.userId = userId this.userKey = userKey this.account = account this.println = printFunction } def generateHeaders(verb, path) { def headers = [:] def epoch = System.currentTimeMillis() def requestVars = verb + epoch + path // Calculate signature def hmac = Mac.getInstance('HmacSHA256') def secret = new SecretKeySpec(userKey.getBytes(), 'HmacSHA256') hmac.init(secret) // Sign the request def hmac_signed = Hex.encodeHexString(hmac.doFinal(requestVars.getBytes())) def signature = hmac_signed.bytes.encodeBase64() headers["Authorization"] = "LMv1 " + userId + ":" + signature + ":" + epoch headers["Content-Type"] = "application/json" return headers } def packParams(params) { def pairs = [] params.each{ k, v -> pairs << ("${k}=${v}")} return pairs.join("&") } // Non paginating, raw version of the get function def _rawGet(path, params) { def baseUrl = 'https://' + account + '.logicmonitor.com' + '/santaba/rest' + path def packedParams = "" if(params) { packedParams = "?"+packParams(params) } def query = baseUrl+packedParams def url = query.toURL() def response = url.getText(useCaches: true, allowUserInteraction: false, requestProperties: generateHeaders("GET", path)) return response } // Public interface for getting stuff. def get(Map args=[:], path) { def itemsReceived = [] def pageReads = 0 // Impose our own paging parameters. args.size = itemsPerPage args.offset = 0 while(true) { // Do da nastieh def response = new JsonSlurper().parseText(_rawGet(path, args)) if (response.errmsg == "OK") { // Catch individual items if (response.data.items == null) { return response.data } itemsReceived += response.data.items // Check if there are more items // if (response.data.total > itemsReceived.size()) // { args.offset = args.size + args.offset // } // else // { // break // we are done // } } else { // Throw an exception with whatever error message we got. throw new Exception(response.errmsg) } pageReads += 1 // Check that we don't exceed max pages. if (pageReads >= maxPages) { break } if (response.data.total > 0) { break } } return itemsReceived } } If I run the URL with the API creds in my test powershell script, it works perfectly. When I test it in LM as a datasource, I get the attached error.
  2. Hi, I already raised this with LogicMonitor via email, but just re-iterating here. For some datapoints, where we want to generate warning/error/critical alerts, you can use the collection interval and alert trigger interval to basically set the amount of time that should elapse if a datapoint threshold/logic triggers an alert. But it's not possible to currently for example set a completely custom interval based on duration. e.g. if I want to generate a warning alert after 3 hours, and an error alert after 4 hours, you have to use a combination of the two things above to get close enough to the duration you want. It would be great if you could, regardless of the collection interval, have more options in the alert trigger interval (currently 1 to 10, and 20,30,40,50,60). So, if I have a collection interval of 5 minutes, I can currently achieve 2.5 hours or 5 hours using 30 and 60 alert trigger interval respectively. Couldn't there be a regular number input rather than a drop-down with predefined options for the alert trigger interval? or a separate option that allows a completely flexible duration? Also, can a custom interval already be set using the API, regardless of the UI, as I could try that? If there's another way to achieve what I want, would be happy to hear it.. :-) Thanks, Roland
  3. LM - alerts

    When I go through the documents regarding Alerts, I understood that the threshold for a particular Metric/Datapoint can be set in the Datasource itself. I have made an example alert and got an clear idea about the concept. I have a doubt whether can we create an alert for a widget. For example, I have a Gauge widget and represented CPU percentage over there. So how can I set my alert only to that particular widget if my CPU percentage crosses 60% or 90%.
  4. IFTTT is a free SaaS platform that helps you "do more with all your apps and devices" - by providing an integration point between commonly used services and platforms. In the following example, we're using the IFTTT Applet webhooks "trigger" to activate a Philips Hue wireless lighting "action" - blinking the lights of the connected Hue platform as a result of a LogicMonitor alert! Other things you might be able to do with LogicMonitor alerts, through IFTTT (lots of untested possibilities!) : Change lighting colors based on alert status (red for new, green for cleared, etc.) Receive alert notifications to connected systems like Skype, Twitter, Evernote, or Google. Play music on a connected Sonos system after triggering an alert. Turn on a connected Smart Plug like the Wemo from Belkin. The Finished Result The following tutorial assumes that you have an IFTTT account created and permissions to add an integration to your LogicMonitor account. Step 1: Log into your IFTTT account and create a new 'Applet' Step 2: Search for and choose the 'Webhooks' service. Step 3: Choose the 'Receive a Web Request' trigger. Step 4: Configure (and remember) the event name that will be recognized by the incoming webhook to trigger the event. Step 5: Configure the 'Action' that will be taken when this event is triggered in IFTTT - lots of intriguing possibilities! Step 6: Once you've added and configured the 'Action,' review the applet settings and click 'Finish' to save the Applet. Step 7: Select 'Services' from the account dropdown - we will be looking up the incoming webhook URL for our account so we know where to send our alerts. Step 8: Search for the 'Webhooks' service and select it to proceed. Step 9: Select the 'Documentation' link from the 'Webhooks' services page. Step 10: Copy the incoming Event trigger URL along with the key for your account. You will replace {event} in the URL with the one you configured above. Step 11: Moving to your LogicMonitor account, navigate to 'Settings -> Integrations' and add a new 'Custom HTTP Delivery' integration using the event name from Step 4 and the URL (with key) from Step 10 : Step 12: IFTTT allows you to include an (optional!) payload - which will show in the 'Activity Log' of the IFTTT Applet. Step 13: Test Alert Delivery and you should see output similar to below in the IFTTT Activity Log. Step 14: Save your integration, assign it to an Escalation Chain, and assign the Escalation Chain to an Alert Rule - and now we've configured a simple integration between LogicMonitor and IFTTT that could form the basis of a handful of interesting alert actions!
  5. Alerts for mounted ISOs on Linux server

    Hello all, Recently, I had mounted a RHEL ISO on a /data/rhel_iso directory, on a system that is monitored with LogicMonitor. 5 minutes later I received an alert about 105% utilization of /data/rhel_iso, which is reasonable but strange, as ISO takes same space as the files inside it. When I unmounted the ISO I got an alert of a filesystem that is not responding. How to disable those ISO related alerts? They are irrational. Many thanks in advance, Szymon
  6. The majority of the alerts we get every day are high volume usage and sometimes its hard to work with this because you don't know how large the volume is. For example, working on a 50TB system and your threshold is set at 90% you will begin to receive alerts when you still have 5 TB left. Would it be possible to have a feature that would allow you to see the size of the drive and set the alert for GB instead of percentage? This would allow for faster use of thresholds on drive alerts.
  7. Currently the hyperlink in an alert notification email requires that users have permission to view the all Alerts view. We don't want users to have access to this view. Please make it so that the notification hyperlink to ack an alert works without the need for this permission.
  8. Hello all, I am trying to get updates on alerts that span multiple days. So, our normal code will grab new alert data for today, let's say. However, we need to go back to old alerts and see if they have been resolved yet, so we can do accurate reporting. What I want to do is provide a list of the IDs for a handful of alerts and get more than 1 row back. Any ideas on how I can formulate an alert query so that I can get 2 rows back for, say ID=DS1234567 and ID=1234568 in the same request? Here is my example call: https://mysite.logicmonitor.com/santaba/rest/alert/alerts?filter=id:DS1275294,id:DS1472582 My hope is to combine a batch of calls so I don't flood the service and so I can get results faster. Given a list of unique IDs, any thoughts on requesting a batch? E
  9. One of the biggest frustrations for us with LogicMonitor is breaking a bunch of dashboards and alerts if we move device groups to another location in the overall device group tree. For example: Say we have a nested device group called "Infrastructure/Hosts". Now our environment has changed a bit, and we want to add better organization to support the new changes to our environment. We move the hosts group to the following location "Infrastructure/PhysicalDevices/Hosts". All alert rules and dashboards that were filtering on "Infrastructure/Hosts" have now been broken, even though the devices in the group need the same alerting and dashboards. Now we have to go through and fix each Alert Rule and Dashboard Widget that used "Infrastructure/Hosts" to now point to "Infrastructure/PhysicalDevices/Hosts". As you can imagine, as environments scale up and evolve, subgroups are going to be moved around all the time. Redoing dashboards and alerts every time this happens adds a tremendous amount of labor, and can lead to people missing changes, leaving behind broken alerts or dashboards that you may not find out about until an emergency has already happened. What we're proposing: "Sticky" device group handling - If a group or subgroup used in an Alert Rule or Dashboard changes location, this location should automatically be updated and reflected in the dashboard. This is how most modern applications handle this sort of thing anyway, and it's a huge time saver. Given the critical nature of this tool's function, this would go along way towards preventing accidentally breaking monitoring that companies rely on to keep their environments running.
  10. Hey All, Couldn't find a way to do this using the alert tokens available so I figured I would post it here. I noticed some cool features from other monitoring tools that allow graphs to be sent in the alert body to PagerDuty. So when I receive a PagerDuty page from LM it would be nice to see the associated graph with the data point that is alerting. While just the alert text is good enough for most scenarios I think seeing how big of a jump/spike the data point made before alerting would be useful. The alert "C drive is 90% full" is all fine and good but when you see a graph showing it go from 12% to that 90% in just a minute or two then you know something is really up and might need more expediency as it will probably continue to fill up at that rate.
  11. Custom Alert-Groups for SDT

    When we reboot a Server or a Application Set our NOC does not know all the Devices, Instances and/or services impacted so we get flooded with alerts for a known event. Example: I need to reboot device WebServer-xyz - The Server, the Switch ports, Storage Sessions and HTTP/S Service are all monitored in LM Like to be able to SDT just these items with one SDT, and not entire switches or devices. So be able to create an "Alert-Group" ie "WebServer-xyz" where you can then add Instances from multiple devices, entire device, Service, Instance Groups, Device Group aka any defined in LM. Then just Add one SDT to the Alert-Group aka one-stop-shopping.
  12. Alerts list scroll bars

    Looking on the alerts list in my account, the scroll bars are visible but in order to view them I have to scroll down. Try determining the escalation chain (right most column) on the 3rd row. You will have to scroll down to see the horizontal scroll bar which moves the 3rd line out of screen. Then after you scroll right you scroll back up and then you don’t see the name as it moved out of the screen (1st column). The scroll bars (and column headers) should be always visible allowing me to scroll without losing my flow of thought
  13. Extending alert information from LogicMonitor to other 3rd Party systems is pretty common for us, however, the available tokens today to describe the alert is missing a few bits of data (we feel). It would be incredibly helpful to have an alert token that contains the LM User responsible for Acknowledging the alert, and a separate token for the Ack comment. Having these tokens allows us to better map alerting details to upstream and downstream integrations.
  14. Enhanced alert notes

    We have a team that handle all alerts escalations which spans 3 different shifts. If in case an alert can't be immediately corrected and requires a followup, a note is entered for the alert on the alerts dashboard with basic info: date, person contacted and any information the the shift can review to determine whether or not a follow up is required from them. Unfortunately, once multiple notes are entered, legibility decreases and unless you enter your name, there's no way of easily determining who entered the note. The ability to enter a note where a record of the time of entry and user would improve functionality; more like a log for the alert itself.
  15. Clearing Alerts Manually

    Hey guys! So I wanted to bring up the idea of clearing alerts manually. I searched the feature requests threads and haven't really found an answer or a thread that matched what I was looking for so I thought I would take a shot at doing one of these. Apologies in advance if this has been discussed already.. Or if I don't make much sense. I'm fairly new to using the platform so I might not be fully up to speed with all the lingo. So let me explain a bit of what brought me to this request.. I have set up monitoring on our virtual machines to monitor CPU usage by percentage (x\100). I then have an alert setup to indicate a stuck process which would shoot out an alert if a data point hasn't changed (+/-3%) on the next 3 intervals (which is set to 3 minutes). The alert clears if it changes after the next 4 intervals. The process above has been working great so far but I quickly realized that we didnt really care about anything stuck between 0-50%.. we only wanted to focus on values that were stuck at 50% or above. I then changed the valid value range to be between 5000-10000 (50-100%) which produced a lot more productive results. I did notice that CPU's which did end up being stuck within the 50-100% range, then clear to a value outside of the valid value range (X<50) then this would produce NO DATA thus having the initial alert stay in limbo forever. You could manually clear them by going to the device and toggle alerting on the device off and on again.. but doing that for a large amount of alerts takes a lot of time. I'm okay with the way I have it set up (but I do believe that the above may be a bug..) I just kind wished we could manually clear alerts from the alerting window without having to take extra steps. Maybe something next to the acknowledge button? I might have jumbled this up so please ask if I need to clarify any of the information above. I can provide screenshots if needed as well. Thanks for taking the time to read this! TL;DR = Let us manually clear alerts from the alert window without having to go into specific devices and toggle alerting.
  16. Please add an option so that when a device is in the IdleInterval state (HostStatus DS), then all other alerts are automatically removed. At the moment some devices retain their ping loss alert even though the HostStatus DS has triggered the IdleInterval alert (no data being received). Our users are finding it confusing when some devices have both alerts, while others have only the IdleInterval alert.
  17. Add scheduling option to alerting

    Use Case: Provider I am a provider with a substantial amount of customers being monitored by the platform. A single customer requests monitoring to be suspended for 14 days as they do a physical DC move. The move will be 1:1 so all systems will come back up in same logical location and only move physical locations. Requests are filed, meetings are had and the day comes to move and NOC turns alerting off for customer. Uneventful days go by and on the day that alerting is supposed to be turned back on a regional event happens that the provider NOC is responding to for other customers (You can insert any normal well defined chaos that happens in a NOC here) and alerting does not get re-enabled for the customer with the physical DC move. Enterprise: Oracle team notifies the NOC that a weekend upgrade will be happening on the Oracle customer and the upgrade team does not want to be notified of any alarms as they will have their hands full with the upgrade and they will call back when the upgrade is complete. NOC turns alerting off and upgrade team never calls to say that they are done working. Request: Much like SDT enable calendaring and scheduling as a option for enabling/disabling alerting as a backup option in case of failure in manual processes.
  18. SDT for minor alerts

    Hi, Every morning i have to clear a couple of hundred alerts from my inbox that come in while our customers servers are running backups. We often get 'disk latency' 'network latency' type alerts while the backups are running. As they are run outside of hours, we do not really need these. Please could you add a way of creating SDT based on alert severity or better yet, build a mechanism to schedule backup window times to filter noise alerts like disk latency. I'm sure I'm not the only one to encounter this issue. Kris
  19. Please document which properties of alerts are searched by the Search function of the alerts view. We sometimes see results for search strings that we cannot explain why they are included in the search results. For example, if I search for "WMI" I get alerts that do not contain "WMI" anywhere that I can see. Perhaps the a list of the properties that contain the search string could be displayed in the UI as a tooltip to the search string input textbox.
  20. Windows Drive Space Alerts

    Windows Drive Space Alerts By default, LogicMonitor alerts on the percentage used on any drive. This in general is fine, but sometimes not. Let’s imagine you have a 2.2 terabytes drive. You might have your critical threshold set at 90%, which sounds fine, until you realise that you are going to get a critical alert when you still have 220 GB free. In my case that would be a cause for some celebration, not really an urgent need to get up at 3 A.M. and delete files so the world doesn’t end. Now Imagine your 2.2TB drive is divided up as: C: 10 GB (OS) D: 500 GB (Mission critical applications) E: 1 TB (Backups) F: 510 GB (Other Applications) A 90% alert will give you a critical at 1GB,50GB,100GB and 51GB respectively. Now the C: drive may be a cause for concern, but the others not so much. The two application drives you might only be concerned if they have less than 4GB free and the backup less than 10GB. So, we decide to alert on the following C: freespace is <1 GB D: freespace is <4 GB E: freespace is <10 GB F: freespace is <4 GB You could clone the datasource so you have four copies one for each drive but this is harder to maintain in the future and does not scale well. It would be better if you could somehow get the drive letter and assign a threshold based on that. Logicmonitor’s scripted complex datapoint using groovy to the rescue. The disks datasource queries the class Win32_Volume. We need to use the raw drive letter output from the WMI class so would write a groovy script like: Drive=output["DRIVELETTER"]; return(Drive); This returns C:,D:,E: and F: Not much use as Logicmonitor doesn’t deal with text, only metrics. Let’s beef up the script. drive = output['DRIVELETTER']; freeSpaceLowerLimitGigabyte = '0'; if (drive == 'C:') {freeSpaceLowerLimitGigabyte = '1';} if (drive == 'D:' || drive == 'F:') {freeSpaceLowerLimitGigabyte = '4';} if (drive == 'E:') {freeSpaceLowerLimitGigabyte = '10';} return freeSpaceLowerLimitGigabyte; This returns 1,4,10 and 4 for each drive, now we have a complex datapoint that returns the lowerlimit in GB for each drive dependant on the drive letter. Again, we can’t alert on this so we need another datapoint So we can use this to check if freespace is less than the freeSpaceLowerLimitGigabyte. To do that create a CapacityAlert datapoint using this expression if ( lt (FreeSpace, FreeSpaceLowerLimitGigabyte * 1024000000) , 1, 0) Which breaks down as if freespace is less than the assigned limit for that drive letter then return 1 (which you alert on.) Otherwise return 0. Alert threshold set at = 1 1 1, and we get critical alerts if: C: freespace is <1 GB D: freespace is <4 GB E: freespace is <10 GB F: freespace is <4 GB
  21. You must have set up your Alert Rules & Escalation Chains hoping that it is setup correctly. What if it was not set up accurately and it does not Alert the right group or even worse it does not alert at all? The worst thing is for you not to receive an alert when a device is down or let's say you have a disk which is filling up due to logs which have been set to a verbose mode which one of your teammates did not change the level back after troubleshooting. In this article, you will be guided how to setup an effective Alert Rule & Escalation chain. In addition, we will show you how to deliver a live alert without creating any impact to the system in question. Before diving into the troubleshooting steps, below are the difference between Alert Rules and Escalation Chains. Alert Rules are used to tag the respective Escalation Chains when a certain device reaches the defined severity level. You could define this Alert Rule to use an Escalation chain only when a certain data point is reached. Escalation Chains are used to set the delivery method for Alerts. This could be set to deliver your alerts via email, sms, ticketing systems, custom HTTP integrations, etc. You may also set your Escalation Chain to be routed to different groups of people during different times/days. This is useful for different sets of standby engineers for a 24x7 operation. Alert Rules & Escalation Chains are very powerful if used correctly. To begin, we will first create an Escalation Chain. For this example, i will create it for Windows devices. We recommend enabling rate limit as you will not want to receive a flood of alerts. By doing so, it limits the maximum number of Alerts delivered in the defined time. If you are wondering, i created 3 stages for different delivery methods (email, Hipchat & voice). The duration that it takes to move from one chain to the other is defined within the Escalation Interval of the Alert Rule. This is an optional section where we have the ability to route alerts to different people depending on the time and day. It is quite simple, just select the days & timing for the respective stages. This section below for the creation of Alert Rules requires good planning. Alerts are triggered based on on the priority level. It will start from the lowest to the highest number. It should start with the most granular to the most number of wildcards. A common use case is: Create an Alert rule to send Interface related Alerts to the network team Create an Alert rule to send hardware or performance Alerts to sysadmin team Create an Alert rule to send Exchange Alerts to the messaging team Create an Alert rule to send all other alerts to the sysadmin team Another essential portion which we need to focus on is the Group which it is applied to. We get this question asked countless times. It’s an easy fix but it is knowing what to fix. If you set it to * it will apply to all groups - which is great. However, we know that we can’t apply the Alert rule to all devices. We might need to apply different alert rules to a different type of devices (e.g: Server, Switches, Routers, WAN Links, etc). Let's say you have a router “wan01” which resides in the group “Infrastructure -> Critical -> Networking -> Routers -> WAN”. If you apply the Alert Rule to “Infrastructure/Critical/”, your device will not pick up this Alert Rule as it resides in subtree. The fix is simple, just apply the Alert Rule to “Infrastructure/Critical/*”. This will Apply to all subgroups under Critical. Now, once you have set that up, I'm sure you would like to verify if that if the Alert Rule is picked up by the datasource or instance in question. To do so, navigate to the datasource or instance in question. Click on the COG button and it will show you the Alert Rule, Escalation Chain and delivery method for each stage. This is how you can determine if your Alert Rule or Escalation chain is picked up. The next thing is to validate the delivery of an Alert. Yes, we could click on the “Send Test Alert”. I’m sure we prefer to have an actual alert to see how it works. My favourite datasource to use is the Ping datasource with the PingLossPercent datapoint. To trigger an alert, we could change this value to “>=0”. What this will do is to send an Alert when the Ping Loss is more than or equal to 0. To do so, it’s quite easy too.Click on the pencil icon within the line of PingLossPercent. Click on the + sign as this will create an instance level threshold. What you want to do is to set the value to 0 for critical. You should receive the Alerts quite soon after. Once you have received the alerts and verified its all working, remember to remove it as you dont want to get flooded with alerts. I hope this article has provided you with sufficient information on how to setup an alert, test and trigger the Alerts.
  22. No Alerts Icon Size

    Please can you make the "No Alerts" check mark icon fill at least 90% of the widget height for improved visibility at a distance. Otherwise it looks like a widget is not showing anything.
  23. masive statusflap alerts

    Hi to all: I have problems with "StatusFlap" type alerts, the truth is that the platform send me many messages to my email, so, does anyone know if they are false positives? My network devices have no problems on the interfaces, how can i stop this? let me know please, kind regards Iván Martínez
  24. It would be good to be able to set AlertRules based on DataSourceInstanceGroups and DataSourceInstanceProperties
  25. Currently, Alert sounds are persistent even when an instance is placed in SDT. We need functionality added to suppress all alert sounds while an instance is in SDT.