mnagel

Members
  • Content Count

    325
  • Joined

  • Last visited

  • Days Won

    62

Everything posted by mnagel

  1. Most of the JMX datasources I've seen use Groovy, which might be a better way to go (or only way until/if this is changed in the JMX data type). I checked the full list of references to jmx: from one of our portals -- list below. Mark C3p0- CassandraColumnFamilyStore - CassandraCommitLog CassandraCompactionManager CassandraConcurrent - Cassandra JVM Garbage Collection - Cassandra JVM Heap and Threads and Uptime - Cassandra JVM Memory Pools - Cassandra Keyspace Cache - JVM Garbage Collection- JVM Memory Pools- JVM status- LogicMonitor_Collector_ActiveDiscoveryTasks LogicMonitor_Collector_BufferDataConsumers LogicMonitor_Collector_BufferDataReporter LogicMonitor_Collector_ConfigCollectingTask LogicMonitor_Collector_DataCollectingTasks LogicMonitor_Collector_DNSResolving LogicMonitor_Collector_EventSourceCollectionTasks LogicMonitor_Collector_GetConfPerformance LogicMonitor_Collector_GlobalStats LogicMonitor_Collector_Heartbeat LogicMonitor_Collector_InternalServiceFetcher LogicMonitor_Collector_JVMGarbageCollection LogicMonitor_Collector_JVMMemoryPools LogicMonitor_Collector_JVMStatus LogicMonitor_Collector_LoggerDaemon LogicMonitor_Collector_NetflowMetrics LogicMonitor_Collector_Netscan LogicMonitor_Collector_NetScanTasks LogicMonitor_Collector_NIOBufferPool LogicMonitor_Collector_PollNow LogicMonitor_Collector_PropertySourceScriptExecution LogicMonitor_Collector_ReportCacheQueue LogicMonitor_Collector_ReporterTask LogicMonitor_Collector_SPSE LogicMonitor_Collector_SSEJVM LogicMonitor_Collector_ThreadCPUUsage LogicMonitor_Collector_ThreadUsage LogicMonitor_Collector_Throttler Tomcat Cache- Tomcat Datasources- Tomcat Executor- Tomcat JVM Garbage Collection- Tomcat JVM Memory Pools - Tomcat JVM status- Tomcat Requests- Tomcat Sessions- Tomcat_Sessions Tomcat Threads- Zookeeper-
  2. So how long does beta generally take until released? I still see nothing in the DS repo for Ruckus. I am going to start with the module for LibreNMS and build something on my own.
  3. This is what our tool handles -- uses the API to download, then the checkin to git is done, which generates the reports. I had published this via github a while back, just ran a refresh with code changes we've made in the interim, plus a change suggested by someone here to abstract the property name we use to split changes per client. https://github.com/willingminds/lmapi-scripts
  4. I am not sure if you are talking about LMConfig or device (resource) settings or something else. For LMConfig, we had to solve this by using the API to download configurations, then committing to a git repo (local gitlab instance) with post-commit hooks for notifications. Sadly, this reveals numerous flaws in the LMConfig process, which we have to work around by detecting and skipping bad updates prior to saving for repo commit. For device settings, we have a similar method where we regularly download devices and other endpoints from the API. If you mean yet another thing, like actual target device configuration details not handled by LMConfig, then LMConfig may be a good option for tracking those (but may require a custom module or two).
  5. I just cannot bring myself to paste the same complex code into multiple LogicModule scripts, leaving little land mines scattered randomly. I was working today on a general template for using the API from within LogicModules using code I found scattered around different modules (we keep backups of everything, making it somewhat easy to search for those). Just a few things I noticed: * all the code is different * nothing I found so far accounts for API rate limiting * various inefficiencies exist in at least some of what I found The correct solution to all of this is to make a library feature available so we can maintain Groovy functions and such in one place, calling them from LogicModule scripts. It is very sad to see how little re-use is possible within the framework at all levels, and this one is especially bad in terms of maintenance and things breaking easily when changes are made in the API backend.
  6. Oh it is, but it is definitely a non-obvious side-effect of disabling alerts and re-enabling. I frequently get the feeling different aspects of LM were written by summer interns :).
  7. Right now, ACK and SDT work, but miss important functionality. Please consider addressing all of these: * ACK should be able to expire (critical issues that should not be lost forever, or to set a maximum expected recovery time period -- not possible with SDT). * ACK should be able to clear if a worse condition occurs (in Nagios, this is a non-sticky ACK) * ACK and SDT notices should be shipped to custom email integrations (this one is a bug as far as I am concerned)
  8. I have raised this numerous times with my CSM, account manager, etc. but it does not seem to be getting traction. It is a lawsuit waiting to happen, so it really deserves attention. Right now, the trigger for LMConfig is entirely arbitrary depending on who wrote which code. Most often it is based on ssh.user and ssh.pass being defined in a device. The problem with this is there are other reasons to have those properties (e.g., Err-Disabled port detection), so you can enable LMConfig across many devices and incur a large cost (especially with the new contract terms) without intent. It should be required to check off an "Enable LMConfig" option at the device or group level, and similarly for any other premium feature. Minimally, all the configsources should be changed to have a"enable_lmconfig" or similar property required in the Applies To logic.
  9. There are at least two reasons why not to use LMConfig. First is cost -- it is a premium feature and as applicable as it might be here, it is insane to invoke an extra charge to get this basic concept implemented. Second (more important) is that LM does not actually tell you what changed. We work around this via the API to download, commit to a git repo and use a hook to get email on changes. That could also work, but again seems like a lot to ask of users. The file storage method could work, but if there is a collector failover or change you lose state. Building redis or similar into the toolset would help with this sort of thing.
  10. I am trying to get an eventsource that reports when the firmware version has changed (this is something other tools "just do"). To do this, my "applies to" for auto.firmware_version works great, but then the script needs to use this logic: if auto.firmware_version != auto.firmware_version.prev then generate event that says "firmware version has changed from old to new" set auto.firmware_version.prev to auto.firmware_version end I imagine I could use the API for the "set" operation, but using the API in logicmodules always makes me cringe due to lack of library support. I detest maintaining the same code across many different modules as it is error-prone. If there could be a hostProps.set method, that would be very helpful. I understand this could be dangerous, so if it must have the same restrictions as propertysources, I can live with that.
  11. I just tried this as well and it is definitely cumbersome. There is no completion when you start with ! and if you use completion, there is no opportunity to prepend the !. I would hope with such a major revamp that a complex expression editor would be part of the upgrade
  12. Tossed this together today to track throughput license usage on platforms that license maximum levels (e.g., ISR4K) as the impact of exceeding this can be otherwise tricky to identify. Definitely could use more work, but a decent starting point. 7ZYRDH
  13. The key here is "if BGP was supported...". What if it is not? Do you think it would be given this specific case? I think it could be (i.e., peering topology identified), but to the extent it is not (or anything else is not), I think we need a way to reflect the dependency without serious programming effort to avoid alarm storms. I guess we have something to chat about next time we meet
  14. I believe this is now possible with Service Insight. Unfortunately, that is an expensive premium feature targeted at Kubernetes and such. This use case can also be handled, but should be part of the base product. We have other basic use cases, like total PRI channels in use across multiple voice gateways. I have conveyed this concern to anyone who will listen and have had some hopeful feedback, but no change yet.
  15. We continue to do battle with LM when alerts trigger due to dependent resource outages. I know the topology mapping team is working on alert suppression, but I am not convinced that will solve all problems regardless of how well they succeed. We really need a way to setup dependencies within logic modules and it should not need dozens of lines of API code each time (most of which should be made available as a library function IMO). One fresh specific example -- site with multiple firewalls in a VPN mesh running BGP. One firewall goes down, then all other firewalls report BGP is down. We care about BGP down, so we have alerts trigger escalation chains. It should be possible to define a dependency in the datapoint that suppresses the alert if the remote peer IP is in a down state. There is no way to express this in LM right now and that leads to many alerts in a batches, and that leads to numb customers who ignore ALL alerts.
  16. Just a word of caution -- we found long ago that using groups for taxonomy creates massive security problems if you also want to grant different users access to functional groups (e.g., SQL admins to SQL servers). With RBAC as it is, if a device is in two or more such groups, you cannot give access that way without giving access to all groups the device is in (this is apparently not considered a bug). There really needs to be an option to mark a group as a security group to avoid this. In the meantime, we have moved from static groups almost entirely to dynamic groups. Our biggest problem before was this one -- using location-based groups to organize devices and to avoid setting the location string many times. Now we use custom propertysources to set a location property value to define a dynamic location group, and that group gets the location string. As far as your issue, I assume you could recurse to get the data, but definitely there should be a way to do this in one shot, just like inherited properties.
  17. @Mike Suding Wanted to try this, but I guess it is very complicated -- still pending :(.
  18. We find at times the need to monitor usage on one device interface but show traffic information from another source. For example, we may get a utilization alarm from the physical crossconnect on an external switch to the ISP, but we have no useful traffic data (or no data) on that switch. The next step would be to go to traffic details on downstream devices, like firewalls. It would be helpful to have a "Related To" URL list available to avoid manual navigation each time. Ideally, this would be in the UI and available in alert tokens.
  19. In this case, yes. I never noticed myself, but can see why someone might take the instructions literally. I just hate too-strict systems that error out like this and frustrate users unnecessarily. We also link LM to ticketing in some cases, but found when it is done via email integration (easier with the ticketing system we use), LM made the decision that ACK and SDT notices are not sent via custom email integration, no way to fix short of development changes. Really need at least some folks over there focusing on the basics -- some of the new advanced stuff is nice, but poor alert handling (not this one specifically, which is annoying but at least can be worked around) is a shame.
  20. The current command-by-mail (when allowed, which is ONLY with the builtin mail transport) is a bit misleading especially to those not already familiar with LM. You may reply to this alert with these commands: - ACK (comment) - acknowledge alert - NEXT - escalate to next contact - SDT X - schedule downtime for this alert on this host for X hours. - SDT datasource X - SDT for all instances of the datasource on this host for X hours - SDT host X - SDT for entire host for X hours I had a customer literally put in: - ACK still working with century link because, well, that is what it says to do. Please fix so it is more clear, or fix the response handler to account for this use case. As always, the computer should be doing the work here, not offloading to busy humans.
  21. Sadly, no feedback at all on this from LM and it is a huge issue -- this is something we cannot workaround on our own, the tokens for step descriptions and other related details must be added to avoid sending useless information to our clients.
  22. Right now, we get only Resource counts from the LogicMonitor_Collector_GlobalStats datasource. We need to be able to show our clients their usage on ALL chargeable elements, including Website checks, LMCloud, LMConfig, etc. I have cobbled together something via the API to try to track this offline, but we need to clearly show clients what they have via a dashboard widget and right now the only one we can show is Devices (Resources). At the same time, please setup a way to define inputs to that via standard properties that indicates a client's subscription level, is possible. We hack around this now with a datasource that pulls in a property defined at their top-level folder.
  23. I am definitely going to check that out, Mike -- always wished we could have that in LM! To answer the original question, there is a way to detect SNMP access failures, but it relies on you knowing something that is not well documented. If you check the Uptime datapoint, you will find it has no threshold, but will generate a warning on "No Data", so: We always have rules like the above for each of our clients as well as similar rules for WMI failure. I have recommended previously that datapoints with "No Data" alerts are indicated in the tuning page along with regular thresholds.
  24. Yes, it is much better now, but still room for improvement (isn't that always true?). Dynamic tables do not support text variable display -- this would be super helpful to create lists of inventory items, device versions, etc. Instead you have to use the report function, which just makes it less accessible for casual review. Other products like SolarWinds can report changes in variables like firmware version, for which there is no option in LM short of the premium LMConfig feature, and that doesn't actually report changes unless you use an custom API script.
  25. Oh it gets better :). We had an issue awhile back (still do) that could only be resolved via an internal debug command (update system.ips property) normally run in the collector debug context. This is entirely doable via the API. No MFA required, no IP restriction possible. Chew on that one for a bit...