Richard Ortiz

Dependencies or Parent/Child Relationships

Recommended Posts

Thanks for your feedback and use cases - keep'em coming. Dependencies is currently under active research with a release planned in Q3 of 2018.

Edited by Mike Suding

Share this post


Link to post
Share on other sites

I have customers who really need this feature, and they are quite upset to learn the throttling stand-in could cause loss of knowledge about the actual root cause.  This thread has been open since 2013.  Exactly where on the roadmap is this?

Mark

  • Upvote 3

Share this post


Link to post
Share on other sites

What features are being targeted for Q2 2018? Would be good to  some idea so we know if it'll be worth waiting or to build something ourselves now.

Share this post


Link to post
Share on other sites

@Mosh I had not considered building myself, but as you say that, I think I could put together at least a rudimentary solution using the API.  To be effective, however, it would need to either poll very frequently or be triggered by alerts (and this would still be leaky).  Will have to mock something up as the current situation is unbearable for us, let alone our clients who receive alerts from the system...

Share this post


Link to post
Share on other sites

Agree - this has taken way too long to get into the product officially. (It is in the works, but as Mike said, is at least 6 months away. We're working on improving our processes and efficiencies, too...)

In the interim, these two datasources available from the registry with these locators can achieve dependencies on a device level.  Feedback appreciated!

SDT_Dependent_Devices: locator 24KKNG

SDT_Assign_Primary_For_Dependencies: locator NFTHXG

Creating Device Dependencies 

With these two datasources, LogicMonitor supports device dependencies in order to help reduce alert noise.

Dependent devices have a primary device. When the primary device reports a specific kind of alert (by default, a ping alert, but this is configurable), then the dependent devices are placed in scheduled downtime. This means that if the dependent devices report alerts, they will not be escalated.

Dependent devices will be placed in Scheduled Downtime for 30 minutes at a time. If the primary device is still in alert, the Scheduled Downtime will be refreshed for another 30 minutes, before the existing Scheduled Downtime period expires. Note: when the alerts clear on the primary device, the dependent devices will remain in Scheduled Downtime for the remainder of the existing 30 minute period - this is to allow circuits to re-establish, and alerts to clear, etc. 

 Configuring Device Dependencies

Ensure your account has the SDT_Dependent_Devices and SDT_Assign_Primary_For_Dependencies datasources. Import them from the registry using the above locators if necessary.

You will need a LogicMonitor API token for a user that has rights to manage the primary and dependent devices. Create two properties on the root level of your LogicMonitor account: logicmonitor.access.id and logicmonitor.access.key, and set their values to the API token’s ID and Key, respectively.

To create a dependency on device A, so that devices B and C will be automatically placed in scheduled downtime when device A is in alert:

Navigate to device A, and determine the device’s displayname as entered in LogicMonitor. Note: this is not the IP/DNS  name, but the value of the  name field when managing the device.

e.g. in the below screen shot, the relevant name is ESXi1 - Dell iDRAC8

5a67ad6f57115_ScreenShot2018-01-23at1_47_05PM.png.3f95bcfca8772bddce92dbeef3081af5.png

Now simply navigate to devices B and C in LogicMonitor, and add the property depends_on to each device, and set it to the value of the displayName of device A.

That’s it.

Within 30 minutes of the first device set to have device A as a primary device, LogicMonitor will configure itself so that if device A has an alert on the ping datasource, it will place all dependent devices into scheduled downtime for 30 minutes, as described above. (Note: You can cause the reconfiguration to happen immediately if you run Poll Now for the SDT_Assign_Primary_For_Dependencies datasource on one of the dependent devices.)

Once the primary device is in an alert that matches the alert conditions (any Ping alert, by default), it will SDT the dependent devices. You will see a property created on the primary device: dependents_sdtd  - that contains a list of the devices that were most recently placed in SDT by the dependency action. There will also be another property, dependents_sdt_until that contains the epoch time in which the last set SDT will expire. If the alert condition still exists 5 minutes before the expiration of the SDT, a new SDT will be created.

Note that devices that are primary for one set of devices can themselves be dependent on other devices. ( e.g. a remote server can be dependent on a brach office router, but that router may be dependent on a VPN router.)

If a dependent device has a depends_on property that is set to a device that does not exist, a warning alert will be raised on that dependent device. (Similarly, there will be warning if the authentication credentials are not set correctly.)

Optional - changing the alert conditions for the primary device to trigger dependencies

By default, primary devices will trigger SDT for dependent devices if the primary device is in any ping alert (either packet loss or latency) of any level. You can change the conditions that trigger the dependency action by setting the property primaryalert on the primary device.
This property can be set to any valid filter supported by the LogicMonitor REST API call that returns alerts for a device.
The property is appended to the API query filter=resourceTemplateName:
Thus the simple case is to simply set the property primaryalert to another datasource's Displayed As field (not name), to act on alerts about that datasource.

Setting property primaryalert to this value

 

will suppress dependent devices’ alerts when the primary has this alert:

HTTPS-

any alerts about the HTTPS- datasource.

HTTPS-,instanceName:HTTPS-443

alerts on the 443 instance of the HTTPS- datasource

HTTPS-,instanceName:HTTPS-443,dataPointName:CantConnect

alerts on the datapoint CantConnect, on the 443 instance of the HTTPS- datasource.

HTTPS-,instanceName:HTTPS-443,dataPointName:CantConnect,severity:4|3

also require that the alerts are of level Error (3) or Critical (4). 

 

For details of alert fields that can be used in filtering, see https://www.logicmonitor.com/support/rest-api-developers-guide/alerts/about-the-alerts-resource/

Removing Dependencies

The dependency configuration will be automatically removed once there are no devices that have the depends_on property pointing at a primary device - but not until the primary device alerts next. (You can manually remove the properties is_primary_device, dependents_sdt_until and dependents_sdtd to immediately remove the dependency datasource).

Feedback appreciated.

image.png

Share this post


Link to post
Share on other sites

This will surely help a lot of customers, thank you for that.   You cannot add this at the moment - the import throws an error - any idea when it will be available (or similar functionality)?

 

503 : This LogicModule is currently undergoing security review. It will be available for import only after our engineers have validated the scripted elements.

image.png.27cd35270ad7245bf486d82360e1b868.png

 

Share this post


Link to post
Share on other sites

Fixed -these are now importable.

(Note that the locators changed - I edited the article above to the current ones. There was a slight improvement to using the globally unique displayname as the host reference in the depends_on property - the above article also reflects that...)

Share this post


Link to post
Share on other sites

That looks like you are using the old locator code (as the new one is not v1.0.0).

Can you try with locator JZ62NH?

That should be what the above article shows -maybe there was some caching....

Share this post


Link to post
Share on other sites

Safe to assume this stitched relationship would also trickle into putting a node into SDT and having their uplinks (if switches...) also inherit SDT from parent?

Share this post


Link to post
Share on other sites
On 2/9/2018 at 6:14 PM, Steve Francis said:

Yes - it puts the whole device into SDT - so all interfaces, etc.

Great.  Not sure if assumed, but if we put node A into SDT, then the connecting interfaces NOT on this device, but connected to, would also go into SDT?  

Share this post


Link to post
Share on other sites

So this set of datasources doesn't directly know anything about connections.

It requires the user to set the depends_on property.

If A is in Alert, and B depends on A (via the property), then B and all its interfaces will be in SDT.

Share this post


Link to post
Share on other sites

This is awesome. Do these datasources allow a Service to be a parent for bunch of dependent devices and vice versa? 

Can we set the depends_on property at the group level?

Share this post


Link to post
Share on other sites

AS it's currently written, it doesn't support Services as either the dependent or primary. (It could be made to support that, with some extra scripting. Let me know if that's important.)

And yes, you can set the depends_on property at the group level (and then override on devices, if needed.)

 

Share this post


Link to post
Share on other sites

This is great!  I'm testing this now in our environment. 

One thing that would be useful for us is allowing us to suppress alerts when DNS resolution is failing to a particular device.  For example, we have a tunnel between two locations, and several monitors on the other side of the DNS server.  If DNS resolution breaks, all monitors using hostnames are going to break.  It would be nice to only be alerted once for the DNS problem, not the individual devices that are having DNS problems.

Edit: Oh shoot i just noticed this doesn't support services currently.  That is mostly what we'd use this for. Our use-case is that we have a large number of URL monitors that are monitoring from the location of users, to systems running the cloud.  When we have a DNS problem, we get hundreds of alerts all at once for the same issue, which is obviously not ideal.  We would definitely need this to support service monitors.

Edited by Bryan Fehl

Share this post


Link to post
Share on other sites

Hmm... This doesn't seem to be working for me. It appears to be; up until the point an alert fires. 

I've set a group that has the depends_on property set. Then have placed all relevant devices into said group (same upstream device). Everything appears to be working as expected. I see all devices in the group have inherited the depends_on property. And when I look at the primary device, it has indeed got the is_primary_device = true property set. However, when I kick off a test alert for PingLossPercent on the primary device, none of the dependent devices are put into SDT.

Upon investigation I found that the is_primary_device property was removed from the primary device. And, with the test alert still active, when I poll now from one of the dependent devices, I see it get recreated briefly but then disappear again. Once the alert clears, I am able to poll now from a dependent device and see the is_primary_device property get recreated.

???

 

Share this post


Link to post
Share on other sites

Yep - I made a mistake in the device filtering, so it was only finding dependents that had the depends_on set directly on the device, not those that were inheriting it via groups (although I was sure I tested that...)

Anyway, I've found the error, and fixed it, and will publish tomorrow after a bit more testing.

Sorry about that..

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.