mnagel

Members
  • Content Count

    494
  • Joined

  • Last visited

  • Days Won

    87

Posts posted by mnagel

  1. I found yesterday that LM explicitly does not support defining a threshold for an instance group so all instances within automatically inherit from that unless overridden at the instance level.  Compared to the way it actually works, inheriting would be far preferable.  As it stands, you must remember to update the thresholds every time you add an item and you must remember what threshold you used for the group.  I assume this is simply because there is no storage associated with that setting on the group level.  I have seen other topics related to using instance groups for alert routing, but nothing related to this.  Please make this possible so we can manage thresholds for different instances in a smooth and predictable fashion.

     

    Thanks,

    Mark

    • Upvote 1
  2. There are quite a few datasources that require direct access to ESXi hosts when the data is readily available in vCenter.  This is painful since establishing permissions to hosts in most environments is tricky at best (confirmed after extended testing done yesterday) and the documentation provided is not complete.  The problem is if you just point current DS definitions to vCenter (e.g., hardware), data is acquired, but the DS has no per-host data, it just grabs the data for one host and displays it in that subtree.

     

  3. Yes, please allow for filtering of report elements as described above (using same sort of filter as in Active Discovery perhaps?).  Another related idea is to be able to take a dashboard and send it as a report -- if you do all the hard layout work there, sure is a shame not to be able to leverage that in the email rollup reports clients request!

    Thanks,

    Mark

  4. Please enhance reports that can include details on datapoints to permit filtering, for example, based on expressions (e.g., WinVolumeUsage- > 70%) or any other method that might make sense there (something akin to the way filtering is done in Active Discovery.  Top 10 is a decent placeholder for now, but general filtering would be very helpful.

     

    • Upvote 1
  5. Please provide a method to browse what alerts have been sent and by which alert rule if possible.  When a client asks why they received an alert (or if one was sent) it is critical to be able to look up this information.

  6. I was a bit surprised yesterday to find that there is no support (other than basic SNMP interfaces, etc.) for HPE Comware switches, like the HP 5900AF datacenter switch.  This seems like a big gap in coverage for one of the more common switch vendors on the platform that is by all accounts the future (Procurve is still a thing, but HPE is more and more moving toward Comware).  Please add support for HPE Comware switches similar to what we get on the Procurve side!  This would include FRU, sensors, clustering (IRF in the 5900AF, for example) and so on.

     

  7. Several related ideas:

    * please make it so Web Service checks allow detection of expiring SSL certificates, preferably via a parameter in the alert tuning, at least 30 days by default.

    * please adjust the SSL_Certs datasource to also check validity.  The current script is in a JAR file, so hard to see how to adjust on my end -- it may already have what is needed.  We had a cert loaded yesterday with a broken chain, which make it invalid, but the DS happily reported it was expiring in 1110 days during that period.

    * same as above, for F5 VIP certificates, if possible (not clear that is exposed via F5 SNMP results)

    * the web service check should also report validity, but I think it may already (if SSL errors are detected)

    * allow SSL checks against virtual hosts via CN matching and SNI; this would I assume require a special manually configured multi-instance DS like PingMulti?

    Thanks!

    Mark

  8. I agree we need a non-SNMP method to monitor Linux, even if that entails deploying a collector on each server.  Leveraging other tools like collectd would also be nice options.  We have run into various environments where SNMP (even SNMPv3) is verboten due to perceived security concerns.  Even if allowed, manually configuring SNMP on many servers to enable basic data collection seems like a bad plan.

    • Upvote 1
  9. In some cases, it is important to be able to alter the speed of an interface (speed up or down or both) to accurately reflect the connected device speed (which itself may be opaque, like a cable modem that runs at 20/1 on a gigabit port).  I asked this a while back from TS and was told "just change the interface", but when I explained how this is not always feasible (e.g., Cisco ASA), they recommended I open a feature request.  So, here it is -- I would like to be able to set speed up and speed down on an interface, overriding what comes back from ifSpeed.  If I can do this another way (via cloned DS) ,that could work, but I would like to make this as simple as possible since it comes up quite a lot for Internet-edge equipment.

    Thanks,

    Mark

     

  10. For a datasource, we would like to be able to set the alert threshold over more than a single sample.  You can set the number of threshold violations needed for an alert, but this is far different in nature than setting a threshold over a time range.  For example, 60% CPU over 2 hours versus 60% CPU over 10 samples.  You might see CPU fluctuate within that period, preventing an alert, but the average over a longer period is valuable.  Similarly, we would like to get alerts not just on average over a time period, but also on slope over a time period, though perhaps the latter should be a separate request.

    Thanks,

    Mark