mnagel

Members
  • Posts

    515
  • Joined

  • Last visited

  • Days Won

    99

Everything posted by mnagel

  1. It was published to the Exchange as H4T9GH, but it is basically what LM support provided with some tweaks. As an Event Source, it has the same poor behavior as all Event Sources, that is, you cannot practically ACK them, only add SDT. It also is not universal since there are different ways to get this info on different platforms. I like the idea of converting to a DS version with instances like the first post mentioned, and of course we are all still waiting for that promised core LM release real soon now :).
  2. Please! I created a facility for this years ago with Nagios via callbacks in our notification template processor (actual templates with conditionals, etc.), but that would be tricky here. You need basically a trigger to run callbacks or similar here when alerts fire with the results placed into a token. My guess is it won't happen unless it can be monetized somehow :(.
  3. https://lmgtfy.app/?q=sql+IndexSearches+metric
  4. I think I have just gotten used to full or partial table scans and then run all the complex bits on the client side. Wasteful of resources and more time-intensive, but ya gotta do what ya gotta do.
  5. Something that strikes me as I delve into this further is that filter syntax is largely undocumented. There are examples in the legacy docs, nothing really in Swagger. I have figured it out mostly via trial and error and the occasional support ticket. It is clear you can use AND logic with comma-separated components, but is is not clear if you can reference the same LHS multiple times. The is really no indication you can implement OR via filter short of glob expression matching. The only documentation I can find on filters specifically relate to limitations added for embedded special characters in the v2 API. Perhaps the API team could document the various common parameters in Swagger or elsewhere?
  6. I was thinking of was that the filter is not valid -- you cannot match only on values. Well you can, but it is then detached from the property name and could match many properties. You need to match on name and value together. filter=customProperties.name:PROPNAME,customProperties.value:PROPVALUE The /device/groups idea is a good one if you are not matching on wildcards, like in this case (though you could use two passes to get an ID list, then iterate). We have found the sometimes that is necessary due to lack of endpoints (e.g., there is no direct way to map a device datasource instance ID back to the datasource ID), but if you can use one query to do your work you should try to do so.
  7. The good news is it seems dev has finally released new modules that are more configurable, but I have not looked at how much complexity was shifted to propertysources and how maintainable that will end up being. They tied to ssh.user/ssh.pass still, though, so you will still run the risk of incurring costs unexpectedly if you use those for non-LMConfig reasons (like errdisabled port detection). I think it is possible to disable LMConfig modules in the subtree alert tuning, though, so that may mitigate the risk. OSS tools do a much better job than LM did previously, hoping this brings some parity (and fixes for non-change thrashing we see all the time). I would never dream of editing a 1200+ line module and then have to merge changes into updates later.
  8. Excellent point -- the other functions we need could likely be satisfied with an API user having manager and not admin role. I will see if we can leverage the library to avoid needing an admin API user -- thanks! That said, it should still be possible to bind an allowlist to any API user to limit the attack surface. I as well can dream...
  9. I have been aware of the debugger method for some time -- was not familiar with the secret debugger library, but you can access the debugger similarly via the API. So.... sleep well knowing that any set of leaked admin API keys could expose your entire network to remote attack via arbitrary PowerShell scripts executed via the debugger API. I was forced at the time to use that method to set the system.ips list to fix NetFlow ingestion for Palo Alto firewalls at the 5000 series or higher. No alternate method of binding device NetFlow export has yet been provided. Recognizing how dangerous this was, I asked about having certain API calls like this locked to an allow list, but that went nowhere. I have also tried changing Windows collector service accounts to use the Performance Monitoring group rather that Domain Admins (especially after the SolarWinds hack), but I found too many things broke so had to move back. Even today well after the damage done during the SolarWinds hack due to lateral movement from compromised servers, LM collector installation instructions still include "If this Collector is monitoring other Windows systems in the same domain, run the service as a domain account with local administrator permissions." Tick, tick tick...
  10. My recommendation? Stay away from any wizards LM provides. This stuff happens here and with the "simple" netscan setup, you end up with a bunch of nonsense top-level groups if you are not careful. I think there should be a knob in the portal settings to disable wizards...
  11. No, you are correct -- datasources store time series numerical data only. Various datasources tie themselves into knots trying to workaround this limitation via datapoint legends. I recommended a while back adding a per-datapoint enum facility so those could be properly displayed in charts as meaningful strings, especially since legends sometimes get so long and don't wrap that you literally have to open the DS source code to find out what a value means. I never saw even a peep from LM on that sensible fix, sadly.
  12. Typically this is done via autodiscovery, but if you add manual instances you can manually define properties for the instances. For the AD method, each instance is generated with the normal fields followed by an optional list of property/value strings. Assuming AD is run often enough, those strings should be current (more or less) for reference in custom alert messages via unconditional token substitution. You can also use PropertySources to add auto properties if you want to do that without editing an existing datasource. If you need examples, Arista_Sensor_Fans is one of many datasources that generates auto properties. Or, look at almost any PropertySource module. I would not add any manual properties to automatic instances as those would likely vanish at some point.
  13. There already is one, you don't need to add it. But, you do need to dig around to find where those are (grumble). Our rule:
  14. You cannot use a straight SNMP check for this, but you can use a SNMP via Groovy to enumerate the disks and generate the sum as a datapoint value. There are many datasources that access SNMP via Groovy you can use as examples -- a quick search of our backups shows HP_Chassis_MemoryModules among many others. You will want to focus on this OID - http://oidref.com/1.3.6.1.2.1.25.3 Mark
  15. Sure, you can use Service Insight for this, but it is a premium feature, which is using an expensive mallet to handle something that should be available without that extra cost. Or, there should be a Service Insight light for this stuff, leaving the costly part for the intended enhanced features of Service Insight (like Kubernetes). My recommendation on this was to extend cluster alerts so you could at least match up instances. My use case at the time was to detect an AP offline on a controller cluster. There is no way to do this without SI, which as you say is complex, and it is an extra cost. We need stuff like this in the base product.
  16. There is a way to do this, but it is not well-documented and there is no UI exposure for "No Data" alerts, you have to dig around the module sources to find them (because it is very hard to put an indicator in the alert tuning thresholds I guess). We have standard alerts on 2 datapoints that have No Data alerts and no other alerts. The first is for "Host Uptime" -> SNMP_HostUptime_Singleton -> Uptime and the second is for Uptime- -> * -> UpTime. If a host stops responding to SNMP, those will trigger. We keep them near the end of our alert policy to generally report to our team across all clients.
  17. What you want is a dynamic template processor, but all we have is simple token substitution and no indication that will ever change (I have asked repeatedly for years). You can route alerts to an external integration, which is how we handle transformation of tokens, but you lose some stuff when you do that depending on how you integrate. For example, we use external email integration into our ticket system with a filter that handles the transformation, but custom email integrations do not get the same handling as builtin email for certain things (e.g., you do not get ACK or SDT notices).
  18. My two cents -- I gave up on using syslog and most other eventsources a long time ago due to lack of basic correlation features. At the time, Cisco logs weren't even parsed correctly in our client environments and it took forever to get that dealt with. We now use SumoLogic for log processing since then since we can run queries on the data over time and get meaningful results (and if needed, tie to LM via the SumoLogic API). LM also realized the existing stuff was a bit limited so bought a company and added LMLogs as a premium addon. That is fine, but adding some basic ability to correlate "regular" events (even just counts over time based on custom cross-event ID extraction) should be included in the base license. We still use it for Windows event logs to have the extra info visible, but we always have to warn folks not to bother ACK'ing any that generate email since that does nothing meaningful. I have asked that the ACK functionality be removed for eventsources as well (SDT still is helpful).
  19. So this has been an issue for us a lot -- everything was tossed into the topology umbrella for alert suppression with no easy way to manually create dependencies. There are many topologies that are simply not discoverable, like multipoint/mesh WAN topologies and really anything not handled by topology sources. The good news is that some kind support tech provided me a Manual_Topology module that linked various devices manually that eluded auto-discovery. The bad news is it is awkward and leverages hardcoded device names and MAC addresses. But, it is possible. IMO the UI and/or API should be extended to support manual links. It is a last resort of course, but there are common cases where it is the only resort.
  20. Check Mike Suding's blog page -- lots of cool stuff, including this. A bit old, but probably still works :). http://blog.mikesuding.com/2016/09/20/restart-a-service-alert-if-restart-fails/ As far as the debugger, yeah -- that stuff freaks me out a lot given that LM more or less requires Domain Admins on collectors (really should be Performance Monitoring Users, especially after the recent SolarWinds incident). You can run those debugger commands from the API as well, even more scary.
  21. I 100% agree this is needed -- we have to hack around this all the time with escalation chains that have one or more empty stages, and still that does not prevent alerts from registering in the system. But this is just one case that would be trivial to solve with DS inheritance, something I have been pushing for well over four years now. The issue with creating new DSes is they are then freestanding clones, meaning each must now be maintained independently (and this is commonly pushed by support as a solution, sadly). If we could just get inheritance done (not just for DSes, but that would be the highest impact) it would be easy to make a copy that does what you want with changes only to parameters you desire while still getting the benefit of updates on the parent module and minimal maintenance requirements. It would be important that child module applies-to expressions are automatically excluded from the parent chain, too. A related change for alerts that would not be solved by inheritance but I had also benefited from in our previous tool is threshold calculation over time. For example, I don't care if CPU is high on a Windows server for a few minutes, but I do care if it is high for an hour. I also need to know if the average is high over a period of time when the actual level may be oscillating during that period and LM would not generate any alerts otherwise). With Nagios we did this by calling back to the pnp4nagios RRD data to calculate averages, slopes, etc. This could be done in LM if using the API from within modules was supported properly, but I refuse to go there until there is library support within the module system.
  22. I guess not ASAP: This LogicModule is currently undergoing security review. It will be available for import only after our engineers have validated the scripted elements. I guess I will check back at some indeterminate date in the future .
  23. Thank you! I have been asking for this via "proper" channels for some time with no results -- will try it out ASAP as I have an 8320 cluster waiting. FWIW, I recommend using a standard property name alongside the ssh.user/ssh.pass (e.g., lmconfig.enabled) to allow disabling this premium feature at the group (client) level when it has not been subscribed to. I know it is an uphill battle to get those all fixed, but I sure wish it could be done. We still cannot use the new AD and DHCP modules due to lack of ability to disable LMConfig per client.
  24. Almost certainly there is code as Palo Alto checks virtually always require API access. Review has seemed in most cases I have been involved with to be a mostly ad hoc process (or if not, definitely opaque). I suggested in one of our UI/UX meetings that there be a "Request Review" button or similar to create or escalate a request for security review. As a bonus, use a ticketing system (this would be welcome for feedback as well, which as I understand generates internal-only tickets). A unified customer visible ticket system for feedback and module review would be very helpful.
  25. Been there, done that -- you can't reference those in widgets, sadly. You have to just create your own datasource that sets values equal to the properties and then reference those. First time I ran into this I wanted to chart device usage against subscription levels, latter of which was a property. In ours, the collector is a Groovy script that does nothing (not sure why that was how we did it, but it works). The CDP is just equal to ##property## in each case. It is Groovy mode, but the code is literally just that.