mnagel

Members
  • Posts

    519
  • Joined

  • Last visited

  • Days Won

    100

Everything posted by mnagel

  1. @Sarah Terry Just watched your Level Up presentation on dashboard tokens, which I am familiar with. The idea presented to change tokens on the fly to change the view is nice, but requiring dashboard editing makes it inaccessible to regular users. A fix for that would be to add a token settings dropdown to allow regular users to adjust tokens from a list of pre-canned values that admins can manage. This would also avoid changing the underlying tokens in a persistent manner, enabling context changes for anyone using the dashboard.
  2. I have spent much time the past few years grappling with how to handle alerting within LM. The "LM way" is to not send too many actual alerts (via email, etc.) and instead review aggregates on dashboards. But, that is a mindless repetitive task people should not have to do to pick up on problems, and flooding inboxes is the only other option. My suggestion is to implement a method within widgets to alert when data contained exceeds thresholds (or is abnormal). As a very specific example, if I set a widget to show all core switch interface error rates, I would want to set an alert for the widget itself to let folks know it needs investigation (rather than have all interfaces individually alarm). I understand that cluster alerts could handle some of this in a very rough manner (I have found cluster alerts rarely can be used due to limitations in how they are defined), but having a widget be able to alert when conditions are met would be above and beyond cluster alerts. In the port errors example, I might set the condition to "one or more ports with at least 1% errors in or out" for core switches and "3 or more ports with at least 2% errors in or out" for access switches. A widget in "look at me" state could also be indicated in the dashboard menu for drill-down purposes. That state should also be something that can be used in alert rules, which would then represent the rollup condition in the widget instead of many alerts for its various datapoints.
  3. This is a specific case of the more general needed feature of linked (inherited) clones -- this applies to anything that can be cloned so that inheritance and overrides can be leveraged. My original request on this many years back applied to LogicModules (datasources primarily), but the same applies here and to virtually any primary object type that can be cloned. I have been pushing for this repeatedly across the years. Instead of linked clones for modules we now have SMR via Exchange -- much better than blind replacement, but linked clones would be far superior (including this dashboard inheritance use case).
  4. There are many gaps like that -- you also cannot search by properties in the resource pane. My workaround is to leverage our API-based endpoint backup script to find things with grep. Clunky, but helps. Most often I am looking for code examples.
  5. This is a specific case of the more general "RBAC and groups are not sufficient to support an MSP model", which I have been trying to get fixed for years. There needs to be structural support for multiple clients, not bolted on as is currently done. I never use the wizard, didn't realize it did this was how it worked :).
  6. I just checked and it looks like currently all the commands require no special privileges, but also not all may be appropriate for every Linux flavor. I know a few spotchecks show some will not work on EL6 (which, to be fair, is EOL later this year). Linux_SSH_BlockDevicePerformance: def command = \"cat /proc/diskstats\"; Linux_SSH_CPUCores: def command = 'cat /proc/cpuinfo' Linux_SSH_CPUCores: def command = 'cat /proc/stat' Linux_SSH_CPUMemory: def command = 'vmstat -s -S K; echo -n \"Cores:\";nproc --all; echo -n \"load:\"; uptime' Linux_SSH_Filesystems: def command = \"`which df` -P\" Linux_SSH_NetworkInterfaces: def command = \"cat /proc/net/dev\" Linux_SSH_NetworkTCPUDP: def command = '`which netstat` -s' Linux_SSH_ServiceStatus:def command = 'systemctl list-units --all --type=service --plain' Linux_SSH_SystemClock: def command = 'date +%s' Linux_SSH_TCPUDP: def command = 'nstat -a -j' Linux_SSH_Uptime: def command = 'echo -n \"Uptime:\"; cat /proc/uptime'
  7. If the box does respond to SNMP, then it will never discover the Linux_SSH property even if you define credentials because addCategory_Linux_SSH only applies if the system has no categories or only "collector" (which seems like an error). This may explain your AD problems if any category was added to those devices for any reason. Once that is detected, the various modules will work. I am not sure if root is required for all of the modules, but I expect it is for at least a few and as you say, this is not documented anywhere explicitly. Since it must be defined at the device level, you would need to bind your public key to the root account even if only one of the modules requires root (we generally use public key access only for Linux systems). I would prefer to see sudo supported in all the modules so that a non-root account can be used with restricted command access controlled by sudo. All that said, I have not tested these fully to see if we can get away with a regular user. We have none of these yet in use in any of our portals, but I am curious now so will be trying a few things :).
  8. Right -- I tried the same and it looks like reports don't handle ILPs. They certainly should -- probably will need to escalate to LM to get it fixed, and they may say it is a feature request :).
  9. Have you tried the alert threshold report? You cannot be very granular in which type of threshold you select, but if you export to CSV and open with Excel or equiv, you could probably create a data filter on anything that has a critical threshold defined (based on how many words are in the threshold). Or just write a script to dump the CSV lines that match the critical threshold pattern. It may take a while to execute :). I just tried it with HTML (by mistake) and the page started to render, then crashed the tab.
  10. You should be able to add the property to a custom column, but in my case it ends up with no data. In my case, I use the friendly name as the instance name if possible, otherwise the thumbprint, so my friendly names show in the report. I can't show auto.windowscerts.dnsnames, though. Report adds it as a column, but no values show. Seems like ILPs are not valid for reports, which feels like a bug to me, though I am sure I will be told it is a feature request :).
  11. Ooh, shiny! Yes, there should be a on/off button to limit to just applicable devices. If you need to find something new, just slide to off.
  12. By report, I assume you mean "alert message"? If so, the property name should work as a token. For me, it was just generally empty so would need to structure the message to account for that as best as possible since there are no conditional output controls like for template systems.
  13. As far as stars, sure. I would probably want a more general option to restrict views to include only technologies we need, though. Following to know when there are updates would be very nice. Right now, it kinda happens in batches :).
  14. I have something like this in Exchange. I did not include the friendlyname in the alert message, but it is there as an ILP and could be used easily. I just found fairly often it was not defined. We have an update pending on this module since we found some certs are refreshed very often (e.g., daily) and need to add code to exclude those from discovery (or at least, from alerting). KPNWGW
  15. I long ago despaired of ever doing anything with LMConfig modules since without OOP and library support, each is provided as a 1000+ line blob. I assume in the backend, developers have a portal-like harness to work in that does not involve editing in the UI as we must. If I ever did try to fix anything, the changes would be wiped on the next update (I have and they are). Exchange makes it a bit more palatable, but with that much code, the safe import process could still be pretty painful. I am much more used to the idea of core overridable features in a library with a profile for each device type (e.g., Oxidized) or like our own getconfig script I wrote long ago where I only had to override 3-4 methods for new device types (how to login, what the prompt looks like, how to disable paging, etc.). I would reference even RANCID, but would not like to see that method used in LM :). Still, RANCID pulls much more useful details for devices -- adding more detail here means editing the blob.
  16. I have been trying to figure out a way to auto-remove categories added by PropertySources when they are no longer applicable (e.g., when something stops being an IIS server, SQL server, DHCP server, etc.). I have found recently this is an annoying problem as PropertySources-added categories live forever until manually removed and the modules cannot subtract AFAIK. OTOH, the method of adding a category is undocumented -- I only know about it from review of modules provided by LM, so perhaps there is a method that interprets a negation operator (like !) to remove the listed category? I can think of other options, but they would require replicating what a PropertySource does and use the debugger (via API) to change system.categories. Would prefer to avoid that as it would be pretty complex and using the debugger that way always makes me cringe when I consider the security implications .
  17. I hoped since ArubaOS-CX is similar to HPE Procurve, I could just use the existing ConfigSource, but it times out in discovery. I am generally willing to jump in and code solutions, but the current from-scratch monolithic coding methodology used for ConfigSources makes it effectively impossible for regular folks to do, so.... please add a new ArubaOS-CX module or extend HPE Procurve to support that flavor. I have a pair of 8320's not yet in production I am able to get developers into.
  18. weather is in there, which would be fine. My recollection was that OWM was free for basic access with a query limit, and the query limit could be kept to with the new caching feature so I will definitely revisit it now. Found it: 60 calls/minute 1,000,000 calls/month It sounds like a lot, but without caching.... yeah :).
  19. Exactly, I just stopped since I did not want to sign up for a much larger account than needed. I think there are other attributes I wanted to track, like precipitation, but pretty much that was put off since I could not restrict calls to once per zip code for each hour or so. We used to get that information (using caching) with Nagios from the Wunderground API and insert detail into alerts via our notification templating system. My philosophy has always been to include as much relevant information in alerts as possible to support Lazy Admin mode. A bit harder to do here without conditional templating, but I rebuilt that system into our ticketing system via an inbound transform. The main thing we lose is the ability to run callbacks to get stuff like top 5 process details, etc. Still, we only use that in some cases and most clients get alerts from LM, so still wishing for real templating one day :).
  20. Nice! I was trying to build out a DS for weather API checks some time back, but I run into the same issue I do with Cisco PSIRT API checks -- no way to cache results short of local files to avoid excessive usage due to duplicate calls. I guess I will just have to see what I can do with local files. Not sure if a SQLite binding is included with the provided Groovy libraries. If not, that could help.
  21. FWIW, here are our catchall rules for those and similar items.
  22. We have standard rules for this that rely on the no-data alert for the uptime datapoint, which otherwise has no alert thresholds. Finding which datapoint is appropriate is harder than it ought to be and I agree a dedicated troubleshooter DS would greatly simplify things. Right now we have several alert rules per client to capture the problem adequately.
  23. It is possible via various methods to setup a limited access read-only account, which we generally try to do, but understood completely regardless. To the extent LM relies on SSH it has been problematic on our systems for various reasons (inadvertently enabling LMConfig is one, lack of public key support is another).
  24. With Exchange should get somewhat better, child accounts should improve. I recommended in a recent UI/UX discussion that it be possible to keep modules in sync via Exchange across multiple child accounts. Without that, they become very painful to deal with. It also seems every year when we renew our agreement, features are disabled without notice on child accounts (I am currently devoting a bunch of coding time to detect that has happened after silently losing LMConfig the third year in a row -- that one in particular I can determine because /setting/configsources returns no results when it happens).