mnagel

Members
  • Posts

    505
  • Joined

  • Last visited

  • Days Won

    94

Everything posted by mnagel

  1. No, you are correct -- datasources store time series numerical data only. Various datasources tie themselves into knots trying to workaround this limitation via datapoint legends. I recommended a while back adding a per-datapoint enum facility so those could be properly displayed in charts as meaningful strings, especially since legends sometimes get so long and don't wrap that you literally have to open the DS source code to find out what a value means. I never saw even a peep from LM on that sensible fix, sadly.
  2. Typically this is done via autodiscovery, but if you add manual instances you can manually define properties for the instances. For the AD method, each instance is generated with the normal fields followed by an optional list of property/value strings. Assuming AD is run often enough, those strings should be current (more or less) for reference in custom alert messages via unconditional token substitution. You can also use PropertySources to add auto properties if you want to do that without editing an existing datasource. If you need examples, Arista_Sensor_Fans is one of many datasources that generates auto properties. Or, look at almost any PropertySource module. I would not add any manual properties to automatic instances as those would likely vanish at some point.
  3. There already is one, you don't need to add it. But, you do need to dig around to find where those are (grumble). Our rule:
  4. You cannot use a straight SNMP check for this, but you can use a SNMP via Groovy to enumerate the disks and generate the sum as a datapoint value. There are many datasources that access SNMP via Groovy you can use as examples -- a quick search of our backups shows HP_Chassis_MemoryModules among many others. You will want to focus on this OID - http://oidref.com/1.3.6.1.2.1.25.3 Mark
  5. Sure, you can use Service Insight for this, but it is a premium feature, which is using an expensive mallet to handle something that should be available without that extra cost. Or, there should be a Service Insight light for this stuff, leaving the costly part for the intended enhanced features of Service Insight (like Kubernetes). My recommendation on this was to extend cluster alerts so you could at least match up instances. My use case at the time was to detect an AP offline on a controller cluster. There is no way to do this without SI, which as you say is complex, and it is an extra cost. We need stuff like this in the base product.
  6. There is a way to do this, but it is not well-documented and there is no UI exposure for "No Data" alerts, you have to dig around the module sources to find them (because it is very hard to put an indicator in the alert tuning thresholds I guess). We have standard alerts on 2 datapoints that have No Data alerts and no other alerts. The first is for "Host Uptime" -> SNMP_HostUptime_Singleton -> Uptime and the second is for Uptime- -> * -> UpTime. If a host stops responding to SNMP, those will trigger. We keep them near the end of our alert policy to generally report to our team across all clients.
  7. What you want is a dynamic template processor, but all we have is simple token substitution and no indication that will ever change (I have asked repeatedly for years). You can route alerts to an external integration, which is how we handle transformation of tokens, but you lose some stuff when you do that depending on how you integrate. For example, we use external email integration into our ticket system with a filter that handles the transformation, but custom email integrations do not get the same handling as builtin email for certain things (e.g., you do not get ACK or SDT notices).
  8. My two cents -- I gave up on using syslog and most other eventsources a long time ago due to lack of basic correlation features. At the time, Cisco logs weren't even parsed correctly in our client environments and it took forever to get that dealt with. We now use SumoLogic for log processing since then since we can run queries on the data over time and get meaningful results (and if needed, tie to LM via the SumoLogic API). LM also realized the existing stuff was a bit limited so bought a company and added LMLogs as a premium addon. That is fine, but adding some basic ability to correlate "regular" events (even just counts over time based on custom cross-event ID extraction) should be included in the base license. We still use it for Windows event logs to have the extra info visible, but we always have to warn folks not to bother ACK'ing any that generate email since that does nothing meaningful. I have asked that the ACK functionality be removed for eventsources as well (SDT still is helpful).
  9. So this has been an issue for us a lot -- everything was tossed into the topology umbrella for alert suppression with no easy way to manually create dependencies. There are many topologies that are simply not discoverable, like multipoint/mesh WAN topologies and really anything not handled by topology sources. The good news is that some kind support tech provided me a Manual_Topology module that linked various devices manually that eluded auto-discovery. The bad news is it is awkward and leverages hardcoded device names and MAC addresses. But, it is possible. IMO the UI and/or API should be extended to support manual links. It is a last resort of course, but there are common cases where it is the only resort.
  10. Check Mike Suding's blog page -- lots of cool stuff, including this. A bit old, but probably still works :). http://blog.mikesuding.com/2016/09/20/restart-a-service-alert-if-restart-fails/ As far as the debugger, yeah -- that stuff freaks me out a lot given that LM more or less requires Domain Admins on collectors (really should be Performance Monitoring Users, especially after the recent SolarWinds incident). You can run those debugger commands from the API as well, even more scary.
  11. I 100% agree this is needed -- we have to hack around this all the time with escalation chains that have one or more empty stages, and still that does not prevent alerts from registering in the system. But this is just one case that would be trivial to solve with DS inheritance, something I have been pushing for well over four years now. The issue with creating new DSes is they are then freestanding clones, meaning each must now be maintained independently (and this is commonly pushed by support as a solution, sadly). If we could just get inheritance done (not just for DSes, but that would be the highest impact) it would be easy to make a copy that does what you want with changes only to parameters you desire while still getting the benefit of updates on the parent module and minimal maintenance requirements. It would be important that child module applies-to expressions are automatically excluded from the parent chain, too. A related change for alerts that would not be solved by inheritance but I had also benefited from in our previous tool is threshold calculation over time. For example, I don't care if CPU is high on a Windows server for a few minutes, but I do care if it is high for an hour. I also need to know if the average is high over a period of time when the actual level may be oscillating during that period and LM would not generate any alerts otherwise). With Nagios we did this by calling back to the pnp4nagios RRD data to calculate averages, slopes, etc. This could be done in LM if using the API from within modules was supported properly, but I refuse to go there until there is library support within the module system.
  12. I guess not ASAP: This LogicModule is currently undergoing security review. It will be available for import only after our engineers have validated the scripted elements. I guess I will check back at some indeterminate date in the future .
  13. Thank you! I have been asking for this via "proper" channels for some time with no results -- will try it out ASAP as I have an 8320 cluster waiting. FWIW, I recommend using a standard property name alongside the ssh.user/ssh.pass (e.g., lmconfig.enabled) to allow disabling this premium feature at the group (client) level when it has not been subscribed to. I know it is an uphill battle to get those all fixed, but I sure wish it could be done. We still cannot use the new AD and DHCP modules due to lack of ability to disable LMConfig per client.
  14. Almost certainly there is code as Palo Alto checks virtually always require API access. Review has seemed in most cases I have been involved with to be a mostly ad hoc process (or if not, definitely opaque). I suggested in one of our UI/UX meetings that there be a "Request Review" button or similar to create or escalate a request for security review. As a bonus, use a ticketing system (this would be welcome for feedback as well, which as I understand generates internal-only tickets). A unified customer visible ticket system for feedback and module review would be very helpful.
  15. Been there, done that -- you can't reference those in widgets, sadly. You have to just create your own datasource that sets values equal to the properties and then reference those. First time I ran into this I wanted to chart device usage against subscription levels, latter of which was a property. In ours, the collector is a Groovy script that does nothing (not sure why that was how we did it, but it works). The CDP is just equal to ##property## in each case. It is Groovy mode, but the code is literally just that.
  16. Yes, and LM actually agreed with me and others (eventually) and fixed this in v133. And then they broke it sometime after that, no ETR that I am aware of.
  17. FWIW, having also come originally from Nagios, I miss the ability to transmit arbitrary string data back via alerts. Some of this can be emulated with auto properties, but those can be set only during discovery not collection. I posted a feature request previously to allow definition of enums that can be bound to datapoints (global values and overridden values within specific datasources/datapoints). these could then be used to avoid the current awkward legend method and actually show the intended purpose of DP values where needed via tokens. Imagine a line that showed the actual meaning of the current value instead of a long truncated legend line that makes you dig around for what it means. I also think it should be possible to improve the property menus to leverage more advanced typing and UI. For example, a property might be just a string as now (preferably with better input box control), or it might be a radio button, selection menu, etc. so that folks using properties can easily find what is supported and what values/ranges are allowed. This also would be something where those hints would be defined within logicmodules primarily, but it should be possible to define them more generally (at least the typing/UI definitions, which could then be bound to properties that are used within modules. This is not strictly related to the topic, but is about readability and usability so I tossed it in there, too :).
  18. Eventsources don't support embedded Powershell, though they certainly should. You can upload a script though. That said, eventsources are also almost entirely unsuited for monitoring, more like additional information to see along with monitoring. Among other things, you cannot ACK them in a meaningful way due to lack of correlation across eventsource results. I'm sure the yet-another-premium-module LMLogs will fix all those problems, though.
  19. You can do this under Alert Tuning at the group level. There is no similar option for specific devices short of editing the applies to code.
  20. I would not hold your breath -- I have had to fight just to get and keep SPF enabled on our email. Regardless, even if you could use the builtin alerts with a distinct From address, you would still have portal links embedded in the message that reveal it is LogicMonitor. You could do what we do and submit everything via a custom email integration (or a web integration via an API handler), then handle the data any way you like. In our case, we feed the tokens into an actual template system to format messages using conditional logic and all that stuff missing in the LM blind token substitution method available normally. We feed that transformed result into a ticket, but obviously it could be handled many different ways at that point, including re-routing via email with proper headers. The downside of a custom integration is that there is a bug -- certain things LM simply does not send to those that are sent with the builtin integration (e.g., ACK and SDT notices). I have asked about this and in theory it might be fixed one day, but it has not been in well over a year since reported.
  21. Here are at least two items that need to be added to make the dashboard token feature more useful: adjust widgets that cannot use tokens so they can (e.g., Alerts, Netflow, etc.) allow arbitrary tokens to be inserted as needed within widget fields (e.g., device patterns, instance patterns, etc.) A concrete example of the latter came upon me this morning. We have multiple locations with similar equipment for which we want to display Internet usage details, one set per dashboard (cgraph and netflow widgets). The edge device names vary as do the uplink ports to the ISPs in each location. Cloning this dashboard solves virtually nothing as every single widget still requires editing. If the tokens could be used, these dashboards could be cloned without the manual editing other than filling in the necessary tokens. In some cases the tokens are insertable, but most fields do not allow them. In this case, I defined various tokens like isp_1_name, isp_1_edge_device, isp_1_edge_port, etc. but could use them in very limited ways ultimately making the exercise pointless. As with many things, we can at least workaround this with the API (at least I believe I could with some effort), but it would be much more accessible to folks if handled within the UI.
  22. Many examples of using WMI from Groovy, none that select from Win32_Service, but should be simple enough to adjust the query. See Microsoft_LyncServer_StorageService as one example.
  23. The normal way I monitor services is via AD, but you would end up with a new instance wildvalue each time it was changed if you use the normal option (WMI-based datasource). If you use Groovy script DS instead, you could strip the PID portion to build the wildvalue so that the data is stable. There should be some examples of that in the existing datasource repo, need to dig around....
  24. Yeah, brings back horrible memories of me requesting repeatedly the documentation on how to pass parameters and getting the most insane response from support :).
  25. Best we have been able to do here is a script leveraging the API to download as many endpoints as we are able to access with checkin to a git repo. Works, but needs frequent tweaking as things change on the backend. Having a way to revert to a previous snapshot or similar would be very handy. My script came about originally after I implemented alert rule resequencing with an error and lost some rules. My latest incarnation of this script has an option to check the items as well for problems (e.g., broken widgets).