Steve Francis

LogicMonitor Staff
  • Posts

    267
  • Joined

  • Last visited

Posts posted by Steve Francis

  1. So if I understand you correctly - you could get the reported interface speed by adding a complex datapoint (RawSpeed, say) with this value:

    if(lt(##auto.BasicSpeed##,4294967295),##auto.BasicSpeed##/1000000,##auto.Speed##)

    Is that what you mean?

  2. It's in core with v102 release. (So next week)

    I renamed the StatusRaw datapoint to OperStatus. (Good idea, thanks.)

    This datasources uses groovy to do the Admin/operational state alerting, as it also does regex matching, so I didn't run into the lack of neq - an oddly never have before.

    I'll open a ticket to get that added..

  3. Well, you can't directly exceed them.

    Our web checks protect themselves from being in an infinite loop, redirecting in a circle, by imposing this maximum of 10. Which in real life is more than websites should normally subject their users to. (It's not a great user experience in terms of latency, etc to be redirected a bunch of times.)

    So the best solution would be to remove some of the redirects (why go from A to B to C to D, instead of just A to D?)  If there are architectural reasons you can't do so,  you could start your LM web check further down the redirect chain.

  4. Hey - I took the liberty at looking at the logs for your account -looks like you didn't actually import the new datasource version. (Which our UI makes easy to do... something we're fixing.)

    You want to make sure you import from the Exchange, not the repository, then give it the locator code RMP2DL

    image.png.06eb6f2785ec7abfe0579b81781cf94a.png

     

    This is a slightly updated version of the one I mention above - a few efficiency improvements in the code.

    The only substantive change is the filter property to filter in interfaces by description has been changed to the property interface.description.alert_enable

    as its a bit more descriptive.

    The bandwidth properties are still ActualSpeed and ActualSpeedUpstream.

     

    Let me know if you have any issues.

  5. Yep - I made a mistake in the device filtering, so it was only finding dependents that had the depends_on set directly on the device, not those that were inheriting it via groups (although I was sure I tested that...)

    Anyway, I've found the error, and fixed it, and will publish tomorrow after a bit more testing.

    Sorry about that..

  6. Agree - this has taken way too long to get into the product officially. (It is in the works, but as Mike said, is at least 6 months away. We're working on improving our processes and efficiencies, too...)

    In the interim, these two datasources available from the registry with these locators can achieve dependencies on a device level.  Feedback appreciated!

    SDT_Dependent_Devices: locator 24KKNG

    SDT_Assign_Primary_For_Dependencies: locator NFTHXG

    Creating Device Dependencies 

    With these two datasources, LogicMonitor supports device dependencies in order to help reduce alert noise.

    Dependent devices have a primary device. When the primary device reports a specific kind of alert (by default, a ping alert, but this is configurable), then the dependent devices are placed in scheduled downtime. This means that if the dependent devices report alerts, they will not be escalated.

    Dependent devices will be placed in Scheduled Downtime for 30 minutes at a time. If the primary device is still in alert, the Scheduled Downtime will be refreshed for another 30 minutes, before the existing Scheduled Downtime period expires. Note: when the alerts clear on the primary device, the dependent devices will remain in Scheduled Downtime for the remainder of the existing 30 minute period - this is to allow circuits to re-establish, and alerts to clear, etc. 

     Configuring Device Dependencies

    Ensure your account has the SDT_Dependent_Devices and SDT_Assign_Primary_For_Dependencies datasources. Import them from the registry using the above locators if necessary.

    You will need a LogicMonitor API token for a user that has rights to manage the primary and dependent devices. Create two properties on the root level of your LogicMonitor account: logicmonitor.access.id and logicmonitor.access.key, and set their values to the API token’s ID and Key, respectively.

    To create a dependency on device A, so that devices B and C will be automatically placed in scheduled downtime when device A is in alert:

    Navigate to device A, and determine the device’s displayname as entered in LogicMonitor. Note: this is not the IP/DNS  name, but the value of the  name field when managing the device.

    e.g. in the below screen shot, the relevant name is ESXi1 - Dell iDRAC8

    5a67ad6f57115_ScreenShot2018-01-23at1_47_05PM.png.3f95bcfca8772bddce92dbeef3081af5.png

    Now simply navigate to devices B and C in LogicMonitor, and add the property depends_on to each device, and set it to the value of the displayName of device A.

    That’s it.

    Within 30 minutes of the first device set to have device A as a primary device, LogicMonitor will configure itself so that if device A has an alert on the ping datasource, it will place all dependent devices into scheduled downtime for 30 minutes, as described above. (Note: You can cause the reconfiguration to happen immediately if you run Poll Now for the SDT_Assign_Primary_For_Dependencies datasource on one of the dependent devices.)

    Once the primary device is in an alert that matches the alert conditions (any Ping alert, by default), it will SDT the dependent devices. You will see a property created on the primary device: dependents_sdtd  - that contains a list of the devices that were most recently placed in SDT by the dependency action. There will also be another property, dependents_sdt_until that contains the epoch time in which the last set SDT will expire. If the alert condition still exists 5 minutes before the expiration of the SDT, a new SDT will be created.

    Note that devices that are primary for one set of devices can themselves be dependent on other devices. ( e.g. a remote server can be dependent on a brach office router, but that router may be dependent on a VPN router.)

    If a dependent device has a depends_on property that is set to a device that does not exist, a warning alert will be raised on that dependent device. (Similarly, there will be warning if the authentication credentials are not set correctly.)

    Optional - changing the alert conditions for the primary device to trigger dependencies

    By default, primary devices will trigger SDT for dependent devices if the primary device is in any ping alert (either packet loss or latency) of any level. You can change the conditions that trigger the dependency action by setting the property primaryalert on the primary device.
    This property can be set to any valid filter supported by the LogicMonitor REST API call that returns alerts for a device.
    The property is appended to the API query filter=resourceTemplateName:
    Thus the simple case is to simply set the property primaryalert to another datasource's Displayed As field (not name), to act on alerts about that datasource.

    Setting property primaryalert to this value

     

    will suppress dependent devices’ alerts when the primary has this alert:

    HTTPS-

    any alerts about the HTTPS- datasource.

    HTTPS-,instanceName:HTTPS-443

    alerts on the 443 instance of the HTTPS- datasource

    HTTPS-,instanceName:HTTPS-443,dataPointName:CantConnect

    alerts on the datapoint CantConnect, on the 443 instance of the HTTPS- datasource.

    HTTPS-,instanceName:HTTPS-443,dataPointName:CantConnect,severity:4|3

    also require that the alerts are of level Error (3) or Critical (4). 

     

    For details of alert fields that can be used in filtering, see https://www.logicmonitor.com/support/rest-api-developers-guide/alerts/about-the-alerts-resource/

    Removing Dependencies

    The dependency configuration will be automatically removed once there are no devices that have the depends_on property pointing at a primary device - but not until the primary device alerts next. (You can manually remove the properties is_primary_device, dependents_sdt_until and dependents_sdtd to immediately remove the dependency datasource).

    Feedback appreciated.

    image.png

  7. I remember the days when 100 broadcast packets per second could hang a 486...

    Nowadays my inclination would be to just set a warning on "> 1000".  Excess  non-unicast is still bad, but unlikely to be traffic impacting bad - and if it is, discards and other thresholds should trigger.

    So that would allow investigation in the case of a legitimately problematic non-unicast level, but not generate alerts for situations that are not impacting things, and would otherwise be considered noise alerts. And we should add a "top 25" graph for inbound non-unicast traffic on all interfaces to our default network dashboard, for people that are inclined to investigate this more closely....

    (On our infrastructure, we have 200 non-unicast pps on our busiest 10 G ethernet trunks....)

    Seem reasonable?

  8. I opened a  ticket to allow ILPs to be used as tokens in alert messages - thanks for that.

    What was wrong with removing the operStatus filter? We're actually thinking of removing that in another update, once we can group interfaces automatically based on status (so you wouldn't have to look in the down ones...

    What non-unicast threshold would you want? As a percent of unicast, or ...?

  9. There is no way to apply regex expressions to instances that don't exist, as you note.

    I ran into a similar problem recently, which I solved by adding in a groovy complex datapoint that takes the instance name or description or what-have-you, tests it against a regular expression that is set as a property (so you can set it at group levels, and have it be inherited), and if it matches, evaluates the alert as normal, and if it doesn't, returns a value that makes the alert not trigger.

    Like in this case, testing agains an interface description, and using a property interface_description_alert_enable to contain the regex:

    :

    instance = instanceProps.get("auto.intdescription");
    filterText=taskProps.get("interface_description_alert_enable");

    if (!(filterText) || (instance) && instance.find(filterText) ) { 

      return stuff that you want to alert on }

    else { return a value that wont trigger an alert }

    LMK if you have questions.

  10. With the last release, we finally added the ability to manually set instance level properties through the UI, which lets us solve this issue. 

    There is a new version of the snmp64_If- datasource - this is available in the registry, but not yet in core.

    Improvements in this datasource:

    • Now we support setting instance level properties through the UI (from the Info Tab for any instance via the Manage icon on the custom properties table), we can solve setting custom speed for interfaces.

    Setting an instance level property ActualSpeed and/or ActualSpeedUpstream (if different from downstream - if ActualSpeedUpstream is not set, and ActualSpeed is set, ActualSpeed will be used for both upstream and downstream) will override the Speed and BasicSpeed values, used for interface utilization calculation.
    Another change - Speed and BasicSpeed are now set as discovered ILPs, rather than unchanging datapoints that were collected every collection cycle (minor efficiency gain).

    • Backward compatible interface status filtering. 

    LogicMonitor will by default alert when interfaces change status from up to down. This is helpful for alerts about switch ports that connect to servers or routers, or inter-switch links - but less useful if the ports connect to workstations, that you expect to shut off everyday.
    In order to limit this behavior to a certain set of ports, you can now set the property interface_description_alert_enable. If a device has this property set, or if it inherits this property, it will only trigger status alerts for interfaces where the interface description matches the regular-expression contained in that property. All other active ports will be discovered and monitored, but not have their status changes (or flapping) alerted on. (If the property is not set, the current behavior of alerting on status changes for all interfaces is maintained.)

    For example, setting the property interface_description_alert_enable to the value “core|uplink” on a group will cause all network devices in that group to alert for status changes only on interfaces with the words “core” or “uplink” in the interface descriptions.
    All other interfaces will be monitored, but will not have status alerting enabled. (Other alerts, such as for excessive discards, will still be in effect.)

    TO exclude all interfaces with the word bridge in the interface description, set the interface_description_alert_enable property to ^((?!bridge).)*$
    (That's a regular expression negative lookahead...) All interfaces, except those with bridge in the description, will have status monitored as normal.

    • change in the way the discard percentage is calculated, to not trigger until there are at least 50 discards per second, as well as the relevant percentage of drops. (This used to be 50 unicast packets per second, but that would still cause alerts on the standby interface of bonded interfaces.)

    These changes are backward compatible, and do not lose any datapoint history.

    This new datasource is accessible from the registry using the locator code: 

    KYE6HN

    Note: This datasource has not been through our final internal QA, but is believed reliable (we're running it internally!). It will be improved in a minor way shortly (a future server release will negate the need to collect interface descriptions as an instance level property), and released to core after that - but that change will be backward compatible, for those of you wishing to adopt this early.