mnagel

Members
  • Content Count

    484
  • Joined

  • Last visited

  • Days Won

    81

Everything posted by mnagel

  1. We have an eventsource for this, originally provided by LM support and tuned a bit since then. It requires SSH access to devices since Cisco lacks a MIB for detecting errdisabled ports (at least in the general case). This in turn means you must be very careful about deploying the eventsource since defining ssh.user/ssh.pass would otherwise bring other things into scope you may not want, like LMConfig (we added an alternate name format to control for that). Lastly, since it is an eventsource you cannot practically acknowledge issues. If you used syslog integration with LM (we do not as it is
  2. As many are likely aware, there was a major Internet outage this morning due to a fault within the Level3/CenturyLink backbone. We lost a lot of data as a result, most obviously from Meraki API fetches that failed during the outage. In some cases, such as this one, the data could be backfilled after the outage, but the LM architecture is designed for near-realtime polling and does not currently support this. My request is thus, if the data involved CAN be fetched after the fact, I would like some option to enable backfilling to cover the lost data at that time.
  3. Looks like rdesktop has been supplanted by other tools like vinagre and freerdp. I found this example for testing RDP authentication with the latter, though I also found that the version shipped with CentOS 7 has a bug that requires an X display even when not required (as in this case). xfreerdp --ignore-certificate --authonly -u user -p pass host Long discussion on the bug and possible solutions: https://serverfault.com/questions/878870/how-to-test-rdp-credentials-in-command-line-without-x-server-installed
  4. I just found this, but no idea if it works yet (and article is a bit dated): https://singularity.be/2008/03/28/using-rdesktop-to-script-windows/
  5. That would be awesome! I was hoping to find something, preferably in Java so it might be able to work via Groovy, but have not succeeded so far. I did find a commercial tool (https://www.rdpsoft.com/products/remote-desktop-canary/). If that would work, you could tied it into LM via the SQL results database they mention (perhaps other options). I am still looking around to see if something more affordable (preferably free) exists. With Nagios, we used check_x224 to verify the RDP server was providing correct protocol responses, not just listening on port 3389. Something like that
  6. Do you happen to know if the event log data is fetched once per ES or is it done once and then filtered via each? The latter would be best, but this is not discussed anywhere I can find. Even so, if one log is growing fast and we want to skip it, the default method has no way to do that. We are considering a Groovy or PowerShell replacement to restrict data pulled prior to filtering.
  7. @Michael Rodrigues Cool, thanks! Now I just need to figure out what to do about my new discovery on how all Windows Event types operate as table scans. Weirdly, we have never had an obvious impact from this until today after all this time (it may explain some issues we have had before, though) Basically, someone had a server spewing ~120MB of events in the selection window and now I know more than I did before today about how this data is collected. It would be nice to be able to narrow the query up front than repeatedly pull all logs and then filter them! Ticket 212001 if you want to
  8. We discarded the default modules for Windows events long ago after realizing their filtering was unusable (events are identified by event source AND event ID, not just event ID as assumed by the default modules). Our modules use a regex matching both event source and ID to fix, and we reference multiple properties so there can be filters defined generally and for specific cases. This allows higher level values to be overridden if needed, or to extend those with lower level values, as needed. I recently updated these to add 2 more filter properties so we can extend or override with better gran
  9. Yes, understood, but most often it would be beneficial to allow (with some sort of rapid add/remove iteration blocker) and you must go through contortions to get around the restriction. Like copying a property to an auto.X property via a propertysource. In this scenario, it would behave the same, just perhaps slower :). I would be fine if the system disabled all dynamic groups associated with a device until the add/remove rate dropped to zero.
  10. Looks like a bug and some sort of error message that was never supposed to be shown to users. I recommend opening a support ticket. There are a bunch of undocumented limits in the system, but nothing there seems like it would trigger unless you have a resolution loop maybe?
  11. As far as I recall, you cannot define a dynamic group using inherited properties. We have had to do contortions to get around that restriction as well. You might be forced to create a propertysource that assigns auto.XX properties and then use that to define the dynamic group. I am not sure why this restriction exists, never have received a satisfactory answer.
  12. I have always used the first method, and it is actually documented to work unambiguously (until I checked, I thought perhaps it evaluated it and could be false if the value is 0). The exists() function says that it will check the values of all properties and is true if one or more have that value. I don't know when that would be useful :). Function: exists("<property value>") This function returns TRUE if the specified value is assigned to any of the resource’s properties. Function: <property name> Any property name can be referenced as an AppliesTo function. Wh
  13. You may be right, I just could not see how that would make sense given the way LM does it or why it would be useful :).
  14. https://github.com/willingminds/lmapi-scripts See lm-get-configs and the run-lm-get-configs wrapper. There are a fair number of workarounds in the main script due to various problems with module behavior. I am currently battling an apparent API bug where the query we use (basically, sort in reverse by version and provide the first result) triggers a bizarre "too many predicates" error. Sent that back to dev when they wanted to wash their hands of it because our API code is in Perl and is "unsupported". To use the API module, you need a .lmapi file in the caller's home directly with
  15. It will use whatever you provide. If it is a name and the collector can resolve the name, then it should work. I just looked at the code and I don't see where it would have emitted sftp:// at all -- that is a URL format and it wants just the hostname (or IP). If you included sftp:// in the hostname, please remove it :). def session = jsch.getSession(user, host, port.toInteger()); // Get a session session.setPassword(pass) // Set the password to use for the session
  16. PropertySources generally run only once per day or if triggered manually (I don't think they yet have an execution interval you can define, though I'm told that will be true someday). However, you can run a WMI query looking for just a specific service as part of the query itself, you don't have to run a full table scan and then examine the results in the code. If you do want to enumerate all services, then you might consider having that one PropertySource generate all the service-based categories you would need. It is not as modular, but is more efficient.
  17. If that is literally what came out of the script, it sounds like the hostname is being confused with the password. If that is not the issue, I would add debugging statements to the code and exit 1 to ensure they are printed when you run Poll Now.
  18. The main metric you would care about is related to time since last TCN per VLAN. I have a DS that will get this for the default VLAN, but had a lot of trouble with any others due to lack of context support in LM. I have been told there is context support in the 28.500 collector and later, but have not yet had a chance to test. I published what I have for now as 969G49, but the new version will have to be done as a Groovy script to leverage the context feature.
  19. The strings are host properties, so set them on the collector you want to run this from. Those would have to be bound to a collector host. As written, that supports only a single remote SFTP test. If you wanted to do more, you would need to rewrite that to handle instances either manual or active discovery. I do the latter often even with a manual property list as it is the only way to define automatic instance properties. It may be possible to do this via an internal "website" check, but I have not tried going full groovy on those yet :). Even then, each would be a separate copy of
  20. This is simple enough, except there are two issues: the default alert subject is annoying and confusing to clients -- it is easily fixed, but.... recovery alerts don't use the custom alert subject so you still get the annoying version for recoveries (unless you need them to ensure ticket closure). Luckily, for reboots you don't usually want a recovery alert -- just one alert to let you know it happened. Our catchall is this, with specific versions for each client: Our custom version of the alert subject is "##HOST## Rebooted ##VALUE## seconds ago"
  21. For SNMP faults (assuming that is the issue here), we have some standard rules in all the clients we manage: This seems to do the trick most of the time, but I am sure there are cases were are still missing. Those datapoints have "No Data" enabled and are not used for thresholds otherwise, making them unambiguous.
  22. Right. "no data" should be on par with critical, error and warning so the threshold can be overridden properly if needed for specific devices/groups. It is very hard to know without digging into each DP definition when "no data" will even alert (no indication in the tuning page) and it is definitely unclear when you can unambiguously check (most times at least, the "no data" status applies to a datapoint that otherwise has no competing alert threshold). I also run into embarrassing situations where data acquisition has silently failed due to collection faults -- LM has been adding more troubl
  23. I intend to extend this to include more track-worthy account attributes such as Expired, Locked, etc, but to start I wanted to enable expiration tracking for domain admin accounts as we can get caught off-guard on those when they happen unexpectedly. This involved creating a new propertysource that tags domain controllers with one or more categories tied to their FSMO roles, then for the PDCEmulator role (arbitrarily chosen, mainly wanted to pick just one), scans the Domain Admins group list and reports days until expiration. No graphs or thresholds yet, will be extending soon. May also gen
  24. We export pretty much everything regularly via the API and check into a git repo. This allows us to track changes (always fun to find LM devs testing module changes ) and more importantly, allows us to grep for things in modules. Among other things, this simplifies finding code examples for starting points on new modules. Our export/backup module does have per endpoint filters since we have found a number of fields that are ephemeral (e.g., timestamps) and must be suppressed to avoid checkin thrashing. Our current export list includes the following. There are a few other features in this
  25. The datasource provided by LM does not handle superscopes (multiple subnets per scope). I wrote one that does handle superscopes and works properly. I don't have a way to monitor across split scopes (portions handled by different servers), but I think that if one of those independently was filling up you would still want to get an alert so it should not matter. With superscope monitoring, you will know only when all the subnet IPs are running out. I will see if I can get that one published in LM Exchange.