Allen Chan

Members
  • Content Count

    40
  • Joined

  • Last visited

  • Days Won

    5

Community Reputation

7 Neutral

About Allen Chan

  • Rank
    Community Whiz Kid

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. LM Team, Any plans to support polling intervals < 1 min (ie 30 secs, 15 secs)? Various teams in my org has request such functionality.
  2. Our NOC is complaining there are a lot of transient no data alarms that are hard for the Ops teams to troubleshoot. Please allow administrators to set consecutive polls of no data before alarming to decrease alerts and to engage Ops team for sustained issues
  3. Anyone know of a good and safe utility to test WMI from a logicmonitor windows collector to windows servers? I know the collector has features in debug mode to test wmi but i usually like to validate with 3rd party tools as a 2nd source of information. I currently use an utility WMI Query tool by ben coleman but it is hosted on semi-shady sites that i kind of worry about visiting. If anyone has good experience with another utility, please let me know. Thanks
  4. Logicmonitor team, Any response on this request?
  5. This is a problem for us too. We cannot easily tell what server some of the lines are related to if there are a huge number of datapoints. Please fix ASAP
  6. Mike, are you saying that if we set Alert trigger interval (consecutive polls), then the no data alert will honor that as well? IE if trigger interval is set to 2, the no data alarm would only happen if the collector receives no data after three consecutive polls??
  7. Steve, does the fact that you call your solution a hack mean that you guys are working on a better way to do this?
  8. We have network devices that has hundreds if not thousands of interfaces. We do not let LM automatically delete instance when they go down because typically servers do not reboot a lot. It is a kind of an event that needs to be informed. We once in awhile need to go clean up the instances on these network devices. Scrolling through 26 pages of instances to delete the ones in alarm is tedius task. Feature request is to allow us to add a filter to display instances that are in alarm. Then we can check the empty box to select all and do a group delete. This would save a lot of time for us.
  9. Most monitoring systems has a way to suspend monitoring of a host. Use case: We had an outage and the Ops team blamed monitoring. They asked to stop polling to prove their theory. Unfortunately, right now the only way is to delete the host from monitoring or hack at it by changing the IP to some fake IP. Neither are solutions. Please add this basic feature.
  10. Based on the scaling collectors page, it sounds like bumping threads is the start of scaling of collector capacity. It would be nice to provide metrics based on # of configured threads and # of used threads for the popular collection types. With this information, we can see when we start getting close to running out of the collection type threads and add more.
  11. I have brought this up before and was shot down with the "works as designed". We 100% agree with this statement "Second, when a alert crosses a threshold the second time a week after the original acknowledgement (as we saw in my first post) I think it is safe to assume that should be considered a new "alert session." We have cases with the following conditions: 1. alert triggers on warning threshold 2. NOC acks with "monitoring" 3. alert crosses error threshold 4. NOC escalates to SME 5. NOC acks with "escalating to SME" 5. alert crosses critical threshold 6. NOC acks with "incident created. Management informed" 7. SME remediates just enough to move the alert down to warning 8. SME informs NOC issue fixed 9. NOC closed incident and resumes watching the alert page 10. alert crosses error threshold 11. No notification 12. alert crosses critical threshold 13. No notification 14. server crashes 15. People ask why no alert.... As a monitoring service, over communication is 100x more acceptable than a server crashing.
  12. We used to have ability to poll a lot longer than the current times. Please bring back polling intervals of 6 hours 12 hours 1 day.
  13. Instead of every single host, wouldnt it be more efficient to iterate through the active datasources (those that have associated hosts > 0) and print out the details needed? Then we get a concise list of monitored items ( that relate to our infrastructure) and it is up to the administrators to explain the appliesTo. IE datasource 1 appliesTo datapoint1 description threshold datapoint2 description threshold .... datapointN description threshold