Popular Content

Showing content with the highest reputation since 07/24/2019 in all areas

  1. 2 points
    I'm currently working on building out common dashboards for a large number of subgroups of server resources in our environment (we're a multi-tenant managed hosting company - so one for each of our customer's environments). it's a simple dashboard with 5 widgets. I have each of them driving device selection through a token with our internal unique customer id number. The problem will come if I ever need to change anything in the dashboard... while I built a single dashboard initially, then cloned it 50 times... making any incremental change 50 times sounds like I'm guaranteed a day of annoyance in the future. I'd like to see a way to make a template dashboard that can be linked to so you can change a single master dash and have an entire set of child dashboards change. In our example, the server status dashboards for each of our clients.
  2. 1 point
    Oh it gets better :). We had an issue awhile back (still do) that could only be resolved via an internal debug command (update system.ips property) normally run in the collector debug context. This is entirely doable via the API. No MFA required, no IP restriction possible. Chew on that one for a bit...
  3. 1 point
    I agree and raise you -- there should be a general correlation facility. I would be excessively happy right now to even be able to reference the value of a different datapoint in the same datasource in an alert string. The right solution would be to define correlation rules similar to Zabbix (https://www.zabbix.com/documentation/4.2/manual/config/event_correlation) where you would suppress alerts depending on a complex evaluation of any LogicModule result. For events specifically, they themselves need to be bucketed with a "correlation key" and counters with alerts tied to more than just an ephemeral point in time (see SEC for a great simple-ish tool that does this for event streams (https://simple-evcorr.github.io/).
  4. 1 point
    I have a case (certificates) that could use a pair of < , > conditions to handle alerting. There are some certs that need to be in place and expired for the system to work properly and fail certs made against those CAs... most of them have really long expirations on them. I'd like ot raise an alert if the daystoexpire is <30, but >-100. Right now, I'm having to disable alerting manually on thousands of certificates in our environment to enable useful alerting on them. I'll also accept anyone with a good hacky workaround for it... I hate clicking
  5. 1 point
    Sure, then I would remove the threshold on daystoexpire and let that be for information/graphing use. Then create a complex DataPoint with the expression of "daystoexpire" which has the valid range and thresholds for alerting.
  6. 1 point
    My previous answer assumed OR conditions (if hit thresholdA OR thresholdB). If you want to do a AND like condition, which as I re-read the question, might be what you are asking, that may depend on the situation. Here I'm assuming you want to basically consider any Certs that have daystoexpire < -100 to just be ignored in all cases, in other words anything under -100 is invalid. So you can edit the "Valid value range" for daystoexpire to be "-100 to blank". That way you will just get a NaN if it's less then -100, and still have an easily changed main thresholds of < 30.
  7. 1 point
    This same problem exists in much of LM -- encouraging cloning with lack of an inheritance feature is the root cause. I agree this is needed as it is needed for LogicModules and pretty much anything in the system.
  8. 1 point
    If a step fails in a website check, the step description should be produced in the alert. I am very tired of fighting with the system to get it to do the correct/obvious thing and my clients find it ridiculous to have to dig around to know what is actually happening. Please make the computer do the work so we don't have to.
  9. 1 point
    I am not sure exactly how to describe this other than by example. We created an API-based method a while back to control alerting on interfaces based on the interface description. This arose because LM discovered interfaces that would come and go (e.g., laptop ports), and then would alarm about the port being down. With our change, those ports are labeled with a string that we examine to enable or disable alerting. The fly in the ointment is that if an up and monitored port went down due to some change, our clients think they should be able to change the description to influence behavior. Which they should. Unfortunately, because LM will not update the instance description due to the AD filter, the down condition is stuck until either the description is manually changed in LM or until the instance is manually removed in LM. Manual either way, very annoying. My proposal is that there should be a way to update the instance description even if the AD filter triggers. Or a second AD filter for updates to existing instances. I am sure there are gotchas here and perhaps a better way exists. I considered using a propertysource, but I don't think that applies here. The only other option is a fake DS using the API to refresh the descriptions, but then you have to replicate the behavior of many different datasources for interfaces.
  10. 1 point
    As I am sitting here, trying to explain to one of our internal partners, for what seems like the umpteenth time, on how to read an alert threshold expression from a ##THRESHOLD## token--it would be great if there were individual message tokens for each of the thresholds. Something like ##WARNINGTHRESHOLD##, ##ERRORTHRESHOLD##, and ##CRITICALTHRESHOLD## that should render the comparison operator and that respective threshold value, example--- This way, I can be more clear as to what this string of numbers actually mean in this type of fashion
  11. 1 point
    Please make it so that when configuring a website monitor that has multiple Steps, we can set it so that it only alerts if all steps fail. In other words, if any one of the steps passes, then everything is still okay. For example, I have a primary URL for an API endpoint, and a secondary URL. As long as either is available I don't need an alert. Only if both steps fail, then I want an alert.
  12. 1 point
    When there is a legitimate reason for disabling alerts for a device, it would be very useful to be able to leave a note as to why (and by whom). This would prevent confusion with teams, where the case of "why would this be disabled" would come up frequently. For example, there is a known bug with a certain version combination of ESXi and HPE servers that triggers a false-positive hardware alert internally, so we disable alerts for that instance on servers that meet the criteria as we encounter them. Or, some QNAPs will give false-positive alerts that their disk is full when in fact it is "full" due to a RAIN configured as a LUN (we thus rely on the server alerting when the iSCSI volume is actually full). However, another technician may log in and flip alerting for these instances back on, assuming it was a mistake or something, and then we would get flooded with these false-positive alerts, prompting technicians to look into them; as you can see, this causes a loop of wasted time. Simply putting a note associated with the "Alerting Off / On" switch and tagging it with the user invoking it would easily solve issues like this. Something like what is shown for Acknowledgements would be adequate. Perhaps even an admin option to require a note or not?
  13. 1 point
    Matthew, Please let me know when this is out. None of our Cisco equipment works. IN the meantime, we have started using a different syslog system that works fine.