Mahlon Greene

Members
  • Content Count

    15
  • Joined

  • Last visited

  • Days Won

    2

Community Reputation

6 Neutral

About Mahlon Greene

  • Rank
    Community Whiz Kid

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. I see a need in the design to alert on deviation from rolling average: example 1: Temperature in hardware is based on fixed baseline (default or manual adjusted) or based on fixed Delta. In real world application it would Make a LOT more sense to alert on Deviation from a 5 day or 30 day rolling average Temp of the box. Reason is, units alarm on the weekends because the office shuts off the AC during the summer. or they alert During the week 9-5 because in the winter the offices crank the heat. All of these ignore nuance of RANGE and Average expectation for the location...The alerting should just be how FAR outside the average Range for the site is. My Nashville facility hovers from 56 to 59 all week. I have it set on 57 so I get alerts at least once a weekend. I could move it to 59...but that's a band-aid. The REAL solution would be to have the software TRACK the last 30 days, and alert when we're outside the NORM for that location. furthermore....with hardware it is not the specific temps that kill the hardware....its the RATE at which the temp changes. so, the alerts SHOULD be based on the average range the system has seen in the last 30 days, and alert ONLY when the rate of change accelerates...but I imagine THAT request would be more challenging to reduce to an algorithm. Example 2: PING times.....I have sites where the Latency range is EXTREME (Mumbai, Johannesburg, Taipei etc...) I'd wished the PING would track the 30 day range and common deviation from norm and alert when the sites see latency that is way outside the expected fluctuation range. 30ms typical 90% of the time + 200-500ms spikes 10% of the time. when Ping times hit 300 ms for more then 10% of the last hour of sampling....then notify warning to inform of change in TREND....not fixed threshold in immediate sample
  2. Treating all events equally is a very bad policy. A dropped ping, is very different from a syslog message about parity failure in a RAID or DBReplication failure alert. Note in MOST (if not all) situations where we use Syslog....the effects reported would NEVER clear....at all without a tech going in and doing work. the software sending a "cleared" email is the opposite of crying-wolf.....it's crying "all clear" when the wolf is right there chewing on your leg. I'm very frustrated that I have to explain this twice....the customer that suggested it doesn't understand what syslog "is". The engineers that honored the flawed request did that customer...and the rest of us...a grave disservice. You've forced me to train my people to ignore certain messages. The work around we have now is to filter all your Cleared messages: Still, not sure why you didn't lead with that advisement. You can't convince me to change my opinion on this topic : it was a poor design decision you've made here. In time, I am very certain you will see why I am right.
  3. Syslog issues: 1. Being bound to only the two RFC for syslog is near sighted: syslog / timestamp / formatting should be more flexible. 2. the biggest concern I have is that Syslog should reflect the time stamp of the COLLECTOR'S NIC at the time the syslog packet ARRIVES at the collector....not the syslog / timestamp of the system sending the message : this is especially important with systems where clock settings or NTP are currently failing......alerting is based on the time stamp : if the time stamp says Jan 1st 2001 12:01am becasue the CMOS battery on the unit failed......than we NEVER see those syslog messages due to alerting range.
  4. Syslog Issues: #1. The person who asked to have SYSLOG present a "cleared" message.....CLEARLY does not understand that a SYSLOG is NOT A tracked condition like an OID value is....it is a SINGLE SPOT in time....and event that "happened" and does NOT "clear" as you can't change the past. #2. The programmers HONORING that (deeply flawed) request frustrates me to no end.....team, I get the mantra "the customer is always right" .....except when they're wrong it is in EVERYONE's best interest if you retrain the un-skilled users in what a baseline understanding should be. I have no tolerance for bad design making it into development when people should know better. #3. You should have provided those of us who know better, a way to OPT OUT of these bad design decisions.
  5. Check to see if the SNMP "Host Agent" was running at all during that time: or if the System was running DRF at the time of the problem. When dealing with CUCM : one constant I see is that DRF stops and starts services at will.....so all alerting should be disabled during DRF scheduling. - M
  6. I have another post elsewhere additionally requesting this. GLAD to see great minds think alike .
  7. I'd like to see the "Time Range" selection effect directly what ALERTS are show, what "Raw Data" shows, and what the "Graph" shows in a unified way. - M
  8. I have a number of events with lost ping, where the data source shows that ping is failing...but it happens WHILE I"m on both the collector and the end-target host, and I'm running manual pings between them with no drops at all. The only solve I've found is to reboot the collector that has the issue..
  9. Is there a pre-package data source or service that runs constant TraceRt between Collector and host, and graphs the results?
  10. I’d like to re-initiate this bug report. The Uptime resetting counters at 497 days or 469 days (historical) I just had a similar false alarm telling me that my devices rebooted, when they did not. Please have the DEV team review this specific monitor and determine how the system can display 497+ days “uptime” --------------------------------- ________________________________________ SEP 11, 2015 | 01:56PM CDT Original message ________________ wrote: Support team at logic monitor, Is it possible to request adjustment to the "Uptime" data source monitor so that it does not alert when the counter resets from 11111111111111111111111111111111 to 00000000000000000000000000000001 The developer was aware enough of the event cause to code explanation in to the system alert: could the alert be altered to not-alert when the counter resets? - ________________ From: ________________ Sent: Friday, September 11, 2015 2:44 PM To: Subject: SC# Error: 6348 ________________ is reporting it has only been up for 0.43 minutes Hello ________________ , We have received the following monitoring alert and a ticket #6348 has been created to track your issue. An engineer is assigned and is working to resolve this issue. Thank you. We are investagating if the VM really did reboot or if this alert is coming up for a different reason: ________________is reporting it has only been up for 0.43 minutes, as of 2015-09-11 14:28:48 EDT. If this was an unexpected reboot, please investigate the system logs. NOTE: if ________________has been up for 469 days without a reboot, this alert will trigger due to a counter wrap in the host. In this case, you may disregard this alert. (But the host is probably due for an OS update.) For any inquiries please contact our NOC at support@highpoint.com<mailto:support@highpoint.com> or call 1-855-485-8324 (TECH). Regards, ________________ NOC Support Engineer
  11. For opening....the old reporting was way better: The new one forces save before you run the report : WHY? The new one did NOT incorporate any really meaningful improvements a. it doesn't allow us to customize on the fly....the report is what it is....in 2016 that alone is pretty disappointing. b. it doesn't allow drag-drop of graphical content: The "no graphs" in reports, to me, is a major hang up. c. slow......very slow........selecting what you want......and then watching the bottom half of the screen paint in.....because there is no way to DISABLE the real-time view: Please separate the two types of reporting "real-time viewing" from "batch-run-reporting" so we have a lighter interface and get more done at a faster pace. d. I (attempt to) use this thing to track SYSLOG noise in a datasource we call "Syslog Archiving & Reporting" : in the old version...I'd run my report and get back 55,000 rows + NO PROBLEM .....took 2 minutes to export it to CVS and then I could run my macro in excel locally to format, graph and report on trends. just go ahead and TRY that with the new reports module. Last time I attempted it returned 8000 rows in a time frame of 2 weeks where I KNEW we have more then 75,000 rows.....and this is JUST ONE DATASOURCE/DATAPOINT that collects ONE TYPE of Syslog from a group of 335 hosts for 4 clients. I shudder to think what the effect would be when I add the next 2 big clients we on-board this fall
  12. ADDITIONAL note: I'd like a ONE click button option to export the current view's background data straight to .CSV I'm seriously unhappy with the flow of the new reporting modules and I run through it (saving a copy or overwriting the settings) 20 times before I get EXACTLY what I want the data to be in my csv. If I could pull up an event source, or a Graph view....>SEE the data that I want there for the Time frame I am searching....and hit ONE button (one time...one click) and have the system export THAT specific data to .csv formatted out put, it would stream line my weekly, daily and monthly reporting efforts immensely.
  13. It is possible to get EVENT Sources off the left menu bar and on to its OWN tab on the right side pane? I.E. between Graphs and SDT.
  14. Feature Request : SDT now allows a Date/time recurring SDT schedule. We are starting to use this option now. We would like to have the feature expanded to allow the Weekday only or Weekend only granularity.
  15. Team, we are requesting the ability to measure data point changes as a percentage change from prior poll output. for example if Number of hosts were 93 on last poll, but are only 32 hosts in the current poll: that would be a shift of 65.6% (.655913978 *100 and rounded up) We would like the ability to alert if there is a 20% or 30% change in poll output. T LogicMonitor _ Post a Public Question.html