• Posts

  • Joined

  • Last visited

  • Days Won


Everything posted by mnagel

  1. Last time I considered this, I ended up selecting OneLogin to resolve. I ended up not needing it, but I think it will do the job with domain matching used to select the correct SAML integration. I do agree each client should have a separate SAML profile based on username (e.g., domain part of email address) without paying extra, but that is not possible as it stands. I have been pushing as hard as I can for making MSP handling within LM be more comprehensive. Mark
  2. Not sure how far you want to go, but there is a plugin for Nagios called check_x224 that you could leverage (if for ideas if nothing else). If you want to login fully unattended, I found a few options on that recently in this thread
  3. I just tossed together some code to do the bare minimum for alert rules, which is to validate the module references for each. That turned out to be harder than expected as I quickly realized module matches are based on display name for some and name for others, and the patterns allowed in the module field in rules is not the same as what is handled by the filter API option. Fast forward, now the script checks for any matches on any module properly. Sync'ed this initial version and other script updates to our git repo (
  4. Please add an ability to produce an internal crosscheck page. This should include, but not be limited to: misconfigured alert rules (referencing dead modules, modules whose names have changed, etc.) misconfigured widgets in dashboards (referencing dead modules or datapoints, etc.) any other place where string value references are used by UI elements instead of internal IDs, resulting in inconsistent and/or dead references I keep trying to find ways to detect this before our clients do (at the worst time, always). I have some draft code for dead widget detection I came up with via trial and error, but this overall problem is something that should be solved within the system itself.
  5. I reported this via a ticket a few weeks ago. Here was the response:
  6. We have generally attacked this issue with priority range conventions (so far, each client has been NNXXX where NN is the client number and XXX is the rule number (changing soon to NNXXXX). We have one script for a while to renumber rules into a new range, and we are working on a way to ensure standard rules are in place for all clients. As @Stuart Weenignotes, Alert Rules are one of the monolithic areas for which we cannot delegate access -- having some way to partition them within the existing RBAC mechanism would be welcome (along with other monolithic settings, like escalation chain, etc.). Being able to have alert templates that could then be filled in with group-level properties would be welcome. Being able to clone rules would be welcome. Being able to select multiple severities in one rule would reduce the need for cloning :). Ultimately, as an MSP it is very hard to maintain consistency in rule design without scripting, which I assume is one of the main goals of this request.
  7. Yeah, presumably this is Groovy, so match code (without ambiguity as I have seen in some modules) would tend to look like property.split(',').contains(value). That said, please please please fix property management in the UI so long strings are manageable. I have many other property-related wishes :).
  8. mnagel


    I have done the same often, with marking those as v3 in our API library module. As far as some of the original discussion, what might help is building out consistent dynamic groups within each customer, which is how we have approached this problem. Then it is always something like Customers/XXX/Automatic Groups/Linux Servers, etc. The input to the script is an array formatted like: { group => 'Windows Servers', appliesTo => 'isClient%s() && isWindows()', }, The key in this is to setup an Applies To function that allows you to identify the client, then we can create the dynamic groups for each client via this script. We currently run it manually for each specific client, but it would not be that much harder to automatically run it for all clients in a loop. We use the preferred collector naming scheme (which has triggered problems because apparently some things are not recalculated when changes happen like a collector replacement, but we are basically hosed if that does not work since you can't use properties). The meat of each of those functions (*) is "system.prefcollectordesc =~ "\\.domain$" and we organize clients into specific domain structures for the collector descriptions. Note: we use Perl for most of our API automation and I have been shunned a few times in these forums for it, but I don't care :). The one time I got mad was when someone in dev claimed my API result was not going to be looked at because Perl not supported. Really. (*) I have a hard time calling something a function that does not support parameters. In this case, we could have one actual function with the client name as a parameter and life would be even simpler.
  9. We have an eventsource for this, originally provided by LM support and tuned a bit since then. It requires SSH access to devices since Cisco lacks a MIB for detecting errdisabled ports (at least in the general case). This in turn means you must be very careful about deploying the eventsource since defining ssh.user/ssh.pass would otherwise bring other things into scope you may not want, like LMConfig (we added an alternate name format to control for that). Lastly, since it is an eventsource you cannot practically acknowledge issues. If you used syslog integration with LM (we do not as it is too limited), you could perhaps feed device logs to the collector and detect them that way without SSH. Another option would be use the same logic as the eventsource and create a datasource, generating datapoint values based on the errdisable cause picked up from the command. I just published what we have now as H4T9GH. Since it is in Groovy, it will need to be reviewed.
  10. As many are likely aware, there was a major Internet outage this morning due to a fault within the Level3/CenturyLink backbone. We lost a lot of data as a result, most obviously from Meraki API fetches that failed during the outage. In some cases, such as this one, the data could be backfilled after the outage, but the LM architecture is designed for near-realtime polling and does not currently support this. My request is thus, if the data involved CAN be fetched after the fact, I would like some option to enable backfilling to cover the lost data at that time.
  11. Looks like rdesktop has been supplanted by other tools like vinagre and freerdp. I found this example for testing RDP authentication with the latter, though I also found that the version shipped with CentOS 7 has a bug that requires an X display even when not required (as in this case). xfreerdp --ignore-certificate --authonly -u user -p pass host Long discussion on the bug and possible solutions:
  12. I just found this, but no idea if it works yet (and article is a bit dated):
  13. That would be awesome! I was hoping to find something, preferably in Java so it might be able to work via Groovy, but have not succeeded so far. I did find a commercial tool ( If that would work, you could tied it into LM via the SQL results database they mention (perhaps other options). I am still looking around to see if something more affordable (preferably free) exists. With Nagios, we used check_x224 to verify the RDP server was providing correct protocol responses, not just listening on port 3389. Something like that might be at least a step toward what you are trying to accomplish.
  14. Do you happen to know if the event log data is fetched once per ES or is it done once and then filtered via each? The latter would be best, but this is not discussed anywhere I can find. Even so, if one log is growing fast and we want to skip it, the default method has no way to do that. We are considering a Groovy or PowerShell replacement to restrict data pulled prior to filtering.
  15. @Michael Rodrigues Cool, thanks! Now I just need to figure out what to do about my new discovery on how all Windows Event types operate as table scans. Weirdly, we have never had an obvious impact from this until today after all this time (it may explain some issues we have had before, though) Basically, someone had a server spewing ~120MB of events in the selection window and now I know more than I did before today about how this data is collected. It would be nice to be able to narrow the query up front than repeatedly pull all logs and then filter them! Ticket 212001 if you want to poke your head in :).
  16. We discarded the default modules for Windows events long ago after realizing their filtering was unusable (events are identified by event source AND event ID, not just event ID as assumed by the default modules). Our modules use a regex matching both event source and ID to fix, and we reference multiple properties so there can be filters defined generally and for specific cases. This allows higher level values to be overridden if needed, or to extend those with lower level values, as needed. I recently updated these to add 2 more filter properties so we can extend or override with better granularity (labeled universal, org, global and local). Exchange: R7JXYE System: FAAYZ7 Application: 94ML93 There is more detail in the technical notes (as much as I could fit before hitting undocumented and obscure field length restrictions). These were just marked for public sharing, so will need security review as they are using Groovy. One more point -- we do have some global hardcoded filters in at least one of the modules. If that is a problem for anyone, we could add a new property to enable those, leaving them disabled by default.
  17. Yes, understood, but most often it would be beneficial to allow (with some sort of rapid add/remove iteration blocker) and you must go through contortions to get around the restriction. Like copying a property to an auto.X property via a propertysource. In this scenario, it would behave the same, just perhaps slower :). I would be fine if the system disabled all dynamic groups associated with a device until the add/remove rate dropped to zero.
  18. Looks like a bug and some sort of error message that was never supposed to be shown to users. I recommend opening a support ticket. There are a bunch of undocumented limits in the system, but nothing there seems like it would trigger unless you have a resolution loop maybe?
  19. As far as I recall, you cannot define a dynamic group using inherited properties. We have had to do contortions to get around that restriction as well. You might be forced to create a propertysource that assigns auto.XX properties and then use that to define the dynamic group. I am not sure why this restriction exists, never have received a satisfactory answer.
  20. I have always used the first method, and it is actually documented to work unambiguously (until I checked, I thought perhaps it evaluated it and could be false if the value is 0). The exists() function says that it will check the values of all properties and is true if one or more have that value. I don't know when that would be useful :). Function: exists("<property value>") This function returns TRUE if the specified value is assigned to any of the resource’s properties. Function: <property name> Any property name can be referenced as an AppliesTo function. When used alone, it returns TRUE for any resource which has a value set for the specified property. It can be used with operators for comparison purposes.
  21. You may be right, I just could not see how that would make sense given the way LM does it or why it would be useful :).
  22. See lm-get-configs and the run-lm-get-configs wrapper. There are a fair number of workarounds in the main script due to various problems with module behavior. I am currently battling an apparent API bug where the query we use (basically, sort in reverse by version and provide the first result) triggers a bizarre "too many predicates" error. Sent that back to dev when they wanted to wash their hands of it because our API code is in Perl and is "unsupported". To use the API module, you need a .lmapi file in the caller's home directly with one set of credentials per portal in YAML. For example: --- companies: willingminds: access_id: '****' access_key: '****' The wrapper runs the script and checks into git -- not necessary, but we want to track changes and have post-commit hooks to get email reports on what changed. This was the original reason we wrote it, but it is also super helpful to be able to scan all configs at once with grep or with template-validation tools, etc.
  23. It will use whatever you provide. If it is a name and the collector can resolve the name, then it should work. I just looked at the code and I don't see where it would have emitted sftp:// at all -- that is a URL format and it wants just the hostname (or IP). If you included sftp:// in the hostname, please remove it :). def session = jsch.getSession(user, host, port.toInteger()); // Get a session session.setPassword(pass) // Set the password to use for the session
  24. PropertySources generally run only once per day or if triggered manually (I don't think they yet have an execution interval you can define, though I'm told that will be true someday). However, you can run a WMI query looking for just a specific service as part of the query itself, you don't have to run a full table scan and then examine the results in the code. If you do want to enumerate all services, then you might consider having that one PropertySource generate all the service-based categories you would need. It is not as modular, but is more efficient.
  25. If that is literally what came out of the script, it sounds like the hostname is being confused with the password. If that is not the issue, I would add debugging statements to the code and exit 1 to ensure they are printed when you run Poll Now.