Steve Francis

LogicMonitor Staff
  • Posts

  • Joined

  • Last visited

Everything posted by Steve Francis

  1. So if I understand you correctly - you could get the reported interface speed by adding a complex datapoint (RawSpeed, say) with this value: if(lt(##auto.BasicSpeed##,4294967295),##auto.BasicSpeed##/1000000,##auto.Speed##) Is that what you mean?
  2. It's in core with v102 release. (So next week) I renamed the StatusRaw datapoint to OperStatus. (Good idea, thanks.) This datasources uses groovy to do the Admin/operational state alerting, as it also does regex matching, so I didn't run into the lack of neq - an oddly never have before. I'll open a ticket to get that added..
  3. The updated interfaces datasource - which you can get from the Registry via locator code RMP2DL - has a graph showing utilization as a percentage. (It will be in the core repository next release.)
  4. Well, you can't directly exceed them. Our web checks protect themselves from being in an infinite loop, redirecting in a circle, by imposing this maximum of 10. Which in real life is more than websites should normally subject their users to. (It's not a great user experience in terms of latency, etc to be redirected a bunch of times.) So the best solution would be to remove some of the redirects (why go from A to B to C to D, instead of just A to D?) If there are architectural reasons you can't do so, you could start your LM web check further down the redirect chain.
  5. Hey - I took the liberty at looking at the logs for your account -looks like you didn't actually import the new datasource version. (Which our UI makes easy to do... something we're fixing.) You want to make sure you import from the Exchange, not the repository, then give it the locator code RMP2DL This is a slightly updated version of the one I mention above - a few efficiency improvements in the code. The only substantive change is the filter property to filter in interfaces by description has been changed to the property interface.description.alert_enable as its a bit more descriptive. The bandwidth properties are still ActualSpeed and ActualSpeedUpstream. Let me know if you have any issues.
  6. Published, locator 24KKNG Updated the documentation article above. Please let me know if there are any other issues.
  7. Yep - I made a mistake in the device filtering, so it was only finding dependents that had the depends_on set directly on the device, not those that were inheriting it via groups (although I was sure I tested that...) Anyway, I've found the error, and fixed it, and will publish tomorrow after a bit more testing. Sorry about that..
  8. OK, I can extend this to support Services... I've got a few other things I'm in the middle of, so it may be a week or two...
  9. AS it's currently written, it doesn't support Services as either the dependent or primary. (It could be made to support that, with some extra scripting. Let me know if that's important.) And yes, you can set the depends_on property at the group level (and then override on devices, if needed.)
  10. Sorry - ActualSpeed and ActualSpeedUpstream should be set in Mbps. Should've documented that.
  11. So this set of datasources doesn't directly know anything about connections. It requires the user to set the depends_on property. If A is in Alert, and B depends on A (via the property), then B and all its interfaces will be in SDT.
  12. Yes - it puts the whole device into SDT - so all interfaces, etc.
  13. That looks like you are using the old locator code (as the new one is not v1.0.0). Can you try with locator JZ62NH? That should be what the above article shows -maybe there was some caching....
  14. Fixed -these are now importable. (Note that the locators changed - I edited the article above to the current ones. There was a slight improvement to using the globally unique displayname as the host reference in the depends_on property - the above article also reflects that...)
  15. Huh - thought I put that through the review already... Let me fix that today..
  16. If you have an iframe that the video clip can be referenced in, you can put it in an HTML widget on a dashboard.
  17. Note: as of v100, Instance level properties now work as tokens in alert messages. Development tells me they did prior to v.100 - which I thought I tested, and found the didn't - but in any case they definitely work in v.100.
  18. Agree - this has taken way too long to get into the product officially. (It is in the works, but as Mike said, is at least 6 months away. We're working on improving our processes and efficiencies, too...) In the interim, these two datasources available from the registry with these locators can achieve dependencies on a device level. Feedback appreciated! SDT_Dependent_Devices: locator 24KKNG SDT_Assign_Primary_For_Dependencies: locator NFTHXG Creating Device Dependencies With these two datasources, LogicMonitor supports device dependencies in order to help reduce alert noise. Dependent devices have a primary device. When the primary device reports a specific kind of alert (by default, a ping alert, but this is configurable), then the dependent devices are placed in scheduled downtime. This means that if the dependent devices report alerts, they will not be escalated. Dependent devices will be placed in Scheduled Downtime for 30 minutes at a time. If the primary device is still in alert, the Scheduled Downtime will be refreshed for another 30 minutes, before the existing Scheduled Downtime period expires. Note: when the alerts clear on the primary device, the dependent devices will remain in Scheduled Downtime for the remainder of the existing 30 minute period - this is to allow circuits to re-establish, and alerts to clear, etc. Configuring Device Dependencies Ensure your account has the SDT_Dependent_Devices and SDT_Assign_Primary_For_Dependencies datasources. Import them from the registry using the above locators if necessary. You will need a LogicMonitor API token for a user that has rights to manage the primary and dependent devices. Create two properties on the root level of your LogicMonitor account: and logicmonitor.access.key, and set their values to the API token’s ID and Key, respectively. To create a dependency on device A, so that devices B and C will be automatically placed in scheduled downtime when device A is in alert: Navigate to device A, and determine the device’s displayname as entered in LogicMonitor. Note: this is not the IP/DNS name, but the value of the name field when managing the device. e.g. in the below screen shot, the relevant name is ESXi1 - Dell iDRAC8 Now simply navigate to devices B and C in LogicMonitor, and add the property depends_on to each device, and set it to the value of the displayName of device A. That’s it. Within 30 minutes of the first device set to have device A as a primary device, LogicMonitor will configure itself so that if device A has an alert on the ping datasource, it will place all dependent devices into scheduled downtime for 30 minutes, as described above. (Note: You can cause the reconfiguration to happen immediately if you run Poll Now for the SDT_Assign_Primary_For_Dependencies datasource on one of the dependent devices.) Once the primary device is in an alert that matches the alert conditions (any Ping alert, by default), it will SDT the dependent devices. You will see a property created on the primary device: dependents_sdtd - that contains a list of the devices that were most recently placed in SDT by the dependency action. There will also be another property, dependents_sdt_until that contains the epoch time in which the last set SDT will expire. If the alert condition still exists 5 minutes before the expiration of the SDT, a new SDT will be created. Note that devices that are primary for one set of devices can themselves be dependent on other devices. ( e.g. a remote server can be dependent on a brach office router, but that router may be dependent on a VPN router.) If a dependent device has a depends_on property that is set to a device that does not exist, a warning alert will be raised on that dependent device. (Similarly, there will be warning if the authentication credentials are not set correctly.) Optional - changing the alert conditions for the primary device to trigger dependencies By default, primary devices will trigger SDT for dependent devices if the primary device is in any ping alert (either packet loss or latency) of any level. You can change the conditions that trigger the dependency action by setting the property primaryalert on the primary device. This property can be set to any valid filter supported by the LogicMonitor REST API call that returns alerts for a device. The property is appended to the API query filter=resourceTemplateName: Thus the simple case is to simply set the property primaryalert to another datasource's Displayed As field (not name), to act on alerts about that datasource. Setting property primaryalert to this value will suppress dependent devices’ alerts when the primary has this alert: HTTPS- any alerts about the HTTPS- datasource. HTTPS-,instanceName:HTTPS-443 alerts on the 443 instance of the HTTPS- datasource HTTPS-,instanceName:HTTPS-443,dataPointName:CantConnect alerts on the datapoint CantConnect, on the 443 instance of the HTTPS- datasource. HTTPS-,instanceName:HTTPS-443,dataPointName:CantConnect,severity:4|3 also require that the alerts are of level Error (3) or Critical (4). For details of alert fields that can be used in filtering, see Removing Dependencies The dependency configuration will be automatically removed once there are no devices that have the depends_on property pointing at a primary device - but not until the primary device alerts next. (You can manually remove the properties is_primary_device, dependents_sdt_until and dependents_sdtd to immediately remove the dependency datasource). Feedback appreciated.
  19. I remember the days when 100 broadcast packets per second could hang a 486... Nowadays my inclination would be to just set a warning on "> 1000". Excess non-unicast is still bad, but unlikely to be traffic impacting bad - and if it is, discards and other thresholds should trigger. So that would allow investigation in the case of a legitimately problematic non-unicast level, but not generate alerts for situations that are not impacting things, and would otherwise be considered noise alerts. And we should add a "top 25" graph for inbound non-unicast traffic on all interfaces to our default network dashboard, for people that are inclined to investigate this more closely.... (On our infrastructure, we have 200 non-unicast pps on our busiest 10 G ethernet trunks....) Seem reasonable?
  20. I opened a ticket to allow ILPs to be used as tokens in alert messages - thanks for that. What was wrong with removing the operStatus filter? We're actually thinking of removing that in another update, once we can group interfaces automatically based on status (so you wouldn't have to look in the down ones... What non-unicast threshold would you want? As a percent of unicast, or ...?
  21. Are you still seeing this? What version of collector? We haven't heard this elsewhere.... (there is a small delay - on the order of minutes - for such configuration changes to be pushed down to the collectors...)
  22. Yeah, as Tom suggests, (and you, in your opening post), dual NIC or VLAN collectors is probably the solution. Should work fine - LogicMonitor collectors just use the host's routing table, so no issue there.
  23. There is no way to apply regex expressions to instances that don't exist, as you note. I ran into a similar problem recently, which I solved by adding in a groovy complex datapoint that takes the instance name or description or what-have-you, tests it against a regular expression that is set as a property (so you can set it at group levels, and have it be inherited), and if it matches, evaluates the alert as normal, and if it doesn't, returns a value that makes the alert not trigger. Like in this case, testing agains an interface description, and using a property interface_description_alert_enable to contain the regex: : instance = instanceProps.get("auto.intdescription"); filterText=taskProps.get("interface_description_alert_enable"); if (!(filterText) || (instance) && instance.find(filterText) ) { return stuff that you want to alert on } else { return a value that wont trigger an alert } LMK if you have questions.
  24. With the last release, we finally added the ability to manually set instance level properties through the UI, which lets us solve this issue. There is a new version of the snmp64_If- datasource - this is available in the registry, but not yet in core. Improvements in this datasource: Now we support setting instance level properties through the UI (from the Info Tab for any instance via the Manage icon on the custom properties table), we can solve setting custom speed for interfaces. Setting an instance level property ActualSpeed and/or ActualSpeedUpstream (if different from downstream - if ActualSpeedUpstream is not set, and ActualSpeed is set, ActualSpeed will be used for both upstream and downstream) will override the Speed and BasicSpeed values, used for interface utilization calculation. Another change - Speed and BasicSpeed are now set as discovered ILPs, rather than unchanging datapoints that were collected every collection cycle (minor efficiency gain). Backward compatible interface status filtering. LogicMonitor will by default alert when interfaces change status from up to down. This is helpful for alerts about switch ports that connect to servers or routers, or inter-switch links - but less useful if the ports connect to workstations, that you expect to shut off everyday. In order to limit this behavior to a certain set of ports, you can now set the property interface_description_alert_enable. If a device has this property set, or if it inherits this property, it will only trigger status alerts for interfaces where the interface description matches the regular-expression contained in that property. All other active ports will be discovered and monitored, but not have their status changes (or flapping) alerted on. (If the property is not set, the current behavior of alerting on status changes for all interfaces is maintained.) For example, setting the property interface_description_alert_enable to the value “core|uplink” on a group will cause all network devices in that group to alert for status changes only on interfaces with the words “core” or “uplink” in the interface descriptions. All other interfaces will be monitored, but will not have status alerting enabled. (Other alerts, such as for excessive discards, will still be in effect.) TO exclude all interfaces with the word bridge in the interface description, set the interface_description_alert_enable property to ^((?!bridge).)*$ (That's a regular expression negative lookahead...) All interfaces, except those with bridge in the description, will have status monitored as normal. change in the way the discard percentage is calculated, to not trigger until there are at least 50 discards per second, as well as the relevant percentage of drops. (This used to be 50 unicast packets per second, but that would still cause alerts on the standby interface of bonded interfaces.) These changes are backward compatible, and do not lose any datapoint history. This new datasource is accessible from the registry using the locator code: KYE6HN Note: This datasource has not been through our final internal QA, but is believed reliable (we're running it internally!). It will be improved in a minor way shortly (a future server release will negate the need to collect interface descriptions as an instance level property), and released to core after that - but that change will be backward compatible, for those of you wishing to adopt this early.
  25. This has been available for while - we apparently neglected to update this thread, though, sorry!