mnagel

Members
  • Content Count

    494
  • Joined

  • Last visited

  • Days Won

    87

Posts posted by mnagel

  1. Last time I considered this, I ended up selecting OneLogin to resolve. I ended up not needing it, but I think it will do the job with domain matching used to select the correct SAML integration.  I do agree each client should have a separate SAML profile based on username (e.g., domain part of email address) without paying extra, but that is not possible as it stands.  I have been pushing as hard as I can for making MSP handling within LM be more comprehensive.

    Mark

    • Like 1
    • Upvote 2
  2. Not sure how far you want to go, but there is a plugin for Nagios called check_x224 that you could leverage (if for ideas if nothing else). If you want to login fully unattended, I found a few options on that recently in this thread

     

  3. I just tossed together some code to do the bare minimum for alert rules, which is to validate the module references for each. That turned out to be harder than expected as I quickly realized module matches are based on display name for some and name for others, and the patterns allowed in the module field in rules is not the same as what is handled by the filter API option. Fast forward, now the script checks for any matches on any module properly. Sync'ed this initial version and other script updates to our git repo (https://github.com/willingminds/lmapi-scripts).

  4. Please add an ability to produce an internal crosscheck page. This should include, but not be limited to:

    misconfigured alert rules (referencing dead modules, modules whose names have changed, etc.)
    misconfigured widgets in dashboards (referencing dead modules or datapoints, etc.)
    any other place where string value references are used by UI elements instead of internal IDs, resulting in inconsistent and/or dead references

    I keep trying to find ways to detect this before our clients do (at the worst time, always).  I have some draft code for dead widget detection I came up with via trial and error, but this overall problem is something that should be solved within the system itself.

    • Upvote 1
  5. I reported this via a ticket a few weeks ago.  Here was the response:

    Quote

    Thank you for your observation and yes this issue is currently being investigated and there is a schedule fix for v141.

    The dashboard time range is not working as expected. For a workaround to download data, expand the graph and change the time range. 

    Download the data to csv and it will be the correct data. 

    However for data from June, you will have to set to last 3 months in the expanded graph time range.

     

    • Like 1
    • Upvote 1
  6. We have generally attacked this issue with priority range conventions (so far, each client has been NNXXX where NN is the client number and XXX is the rule number (changing soon to NNXXXX). We have one script for a while to renumber rules into a new range, and we are working on a way to ensure standard rules are in place for all clients.  As @Stuart Weenignotes, Alert Rules are one of the monolithic areas for which we cannot delegate access -- having some way to partition them within the existing RBAC mechanism would be welcome (along with other monolithic settings, like escalation chain, etc.).  Being able to have alert templates that could then be filled in with group-level properties would be welcome. Being able to clone rules would be welcome. Being able to select multiple severities in one rule would reduce the need for cloning :). 

    Ultimately, as an MSP it is very hard to maintain consistency in rule design without scripting, which I assume is one of the main goals of this request.

    • Upvote 1
  7. On 9/3/2020 at 4:59 AM, Vitor Santos said:

     

    About the API capabilities, just to add my thoughts to Stuart reply (which you should adopt in first place).I was requested to develop a few scripts that make use of LM. There's some features that aren't documented but, you can make use of your browser programmer tools in order to actually see the different API requests (on the GUI). That helped me a lot & I was able to code some calls that weren't documented (even making use of API v3).

    Bear in mind those aren't supported by LM but, they might help you if you've urgent needs (like they helped me).

    Thank you!

    I have done the same often, with marking those as v3 in our API library module.

    As far as some of the original discussion, what might help is building out consistent dynamic groups within each customer, which is how we have approached this problem.  Then it is always something like Customers/XXX/Automatic Groups/Linux Servers, etc.  The input to the script is an array formatted like:

        {
            group => 'Windows Servers',
            appliesTo => 'isClient%s() && isWindows()',
        },

    The key in this is to setup an Applies To function that allows you to identify the client, then we can create the dynamic groups for each client via this script.  We currently run it manually for each specific client, but it would not be that much harder to automatically run it for all clients in a loop.  We use the preferred collector naming scheme (which has triggered problems because apparently some things are not recalculated when changes happen like a collector replacement, but we are basically hosed if that does not work since you can't use properties).  The meat of each of those functions (*) is "system.prefcollectordesc =~ "\\.domain$" and we organize clients into specific domain structures for the collector descriptions.

    Note: we use Perl for most of our API automation and I have been shunned a few times in these forums for it, but I don't care :).  The one time I got mad was when someone in dev claimed my API result was not going to be looked at because Perl not supported. Really.

    (*) I have a hard time calling something a function that does not support parameters.  In this case, we could have one actual function with the client name as a parameter and life would be even simpler.

  8. We have an eventsource for this, originally provided by LM support and tuned a bit since then.  It requires SSH access to devices since Cisco lacks a MIB for detecting errdisabled ports (at least in the general case).  This in turn means you must be very careful about deploying the eventsource since defining ssh.user/ssh.pass would otherwise bring other things into scope you may not want, like LMConfig (we added an alternate name format to control for that). Lastly, since it is an eventsource you cannot practically acknowledge issues.  If you used syslog integration with LM (we do not as it is too limited), you could perhaps feed device logs to the collector and detect them that way without SSH. Another option would be use the same logic as the eventsource and create a datasource, generating datapoint values based on the errdisable cause picked up from the command. 

    I just published what we have now as H4T9GH. Since it is in Groovy, it will need to be reviewed.

    • Like 1
  9. As many are likely aware, there was a major Internet outage this morning due to a fault within the Level3/CenturyLink backbone.  We lost a lot of data as a result, most obviously from Meraki API fetches that failed during the outage.  In some cases, such as this one, the data could be backfilled after the outage, but the LM architecture is designed for near-realtime polling and does not currently support this.  My request is thus, if the data involved CAN be fetched after the fact, I would like some option to enable backfilling to cover the lost data at that time.

    • Like 1
  10. Looks like rdesktop has been supplanted by other tools like vinagre and freerdp.  I found this example for testing RDP authentication with the latter, though I also found that the version shipped with CentOS 7 has a bug that requires an X display even when not required (as in this case).

    xfreerdp --ignore-certificate --authonly -u user -p pass host

    Long discussion on the bug and possible solutions: https://serverfault.com/questions/878870/how-to-test-rdp-credentials-in-command-line-without-x-server-installed

    
     

     

    • Like 1
  11. 5 minutes ago, mnagel said:

    That would be awesome! I was hoping to find something, preferably in Java so it might be able to work via Groovy, but have not succeeded so far.  I did find a commercial tool (https://www.rdpsoft.com/products/remote-desktop-canary/). If that would work, you could tied it into LM via the SQL results database they mention (perhaps other options). 

    I am still looking around to see if something more affordable (preferably free) exists. 

    With Nagios, we used check_x224 to verify the RDP server was providing correct protocol responses, not just listening on port 3389.  Something like that might be at least a step toward what you are trying to accomplish.

    I just found this, but no idea if it works yet (and article is a bit dated):

    https://singularity.be/2008/03/28/using-rdesktop-to-script-windows/

    • Like 1
  12. That would be awesome! I was hoping to find something, preferably in Java so it might be able to work via Groovy, but have not succeeded so far.  I did find a commercial tool (https://www.rdpsoft.com/products/remote-desktop-canary/). If that would work, you could tied it into LM via the SQL results database they mention (perhaps other options). 

    I am still looking around to see if something more affordable (preferably free) exists. 

    With Nagios, we used check_x224 to verify the RDP server was providing correct protocol responses, not just listening on port 3389.  Something like that might be at least a step toward what you are trying to accomplish.

    • Like 1
  13. 1 hour ago, mnagel said:

    @Michael Rodrigues Cool, thanks!  Now I just need to figure out what to do about my new discovery on how all Windows Event types operate as table scans. Weirdly, we have never had an obvious impact from this until today after all this time (it may explain some issues we have had before, though) 

    Basically, someone had a server spewing ~120MB of events in the selection window and now I know more than I did before today about how this data is collected. It would be nice to be able to narrow the query up front than repeatedly pull all logs and then filter them!  Ticket 212001 if you want to poke your head in :).

    Do you happen to know if the event log data is fetched once per ES or is it done once and then filtered via each?  The latter would be best, but this is not discussed anywhere I can find.  Even so, if one log is growing fast and we want to skip it, the default method has no way to do that.  We are considering a Groovy or PowerShell replacement to restrict data pulled prior to filtering.

  14. @Michael Rodrigues Cool, thanks!  Now I just need to figure out what to do about my new discovery on how all Windows Event types operate as table scans. Weirdly, we have never had an obvious impact from this until today after all this time (it may explain some issues we have had before, though) 

    Basically, someone had a server spewing ~120MB of events in the selection window and now I know more than I did before today about how this data is collected. It would be nice to be able to narrow the query up front than repeatedly pull all logs and then filter them!  Ticket 212001 if you want to poke your head in :).

  15. We discarded the default modules for Windows events long ago after realizing their filtering was unusable (events are identified by event source AND event ID, not just event ID as assumed by the default modules). Our modules use a regex matching both event source and ID to fix, and we reference multiple properties so there can be filters defined generally and for specific cases.  This allows higher level values to be overridden if needed, or to extend those with lower level values, as needed. I recently updated these to add 2 more filter properties so we can extend or override with better granularity (labeled universal, org, global and local).

    • Exchange: R7JXYE
    • System: FAAYZ7
    • Application: 94ML93

    There is more detail in the technical notes (as much as I could fit before hitting undocumented and obscure field length restrictions). These were just marked for public sharing, so will need security review as they are using Groovy.

    One more point -- we do have some global hardcoded filters in at least one of the modules. If that is a problem for anyone, we could add a new property to enable those, leaving them disabled by default.

  16. 26 minutes ago, Stuart Weenig said:

    You cannot use inherited properties for dynamic grouping. The reason for this is that an inherited property could cause a device to go into a group, which might change the inherited property (because now it's inheriting from a new additional group). This could cause the device to no longer match, causing it to be removed from the group. The removal would reset the property to its original state that matches the dynamic group rule, causing it to be added back in, causing it to inherit the property that removes it from the group. You get an infinite loop of adding then removing from the group.

    Yes, understood,  but most often it would be beneficial to allow (with some sort of rapid add/remove iteration blocker) and you must go through contortions to get around the restriction. Like copying a property to an auto.X property via a propertysource.  In this scenario, it would behave the same, just perhaps slower :).  I would be fine if the system disabled all dynamic groups associated with a device until the add/remove rate dropped to zero.

  17. As far as I recall, you cannot define a dynamic group using inherited properties.  We have had to do contortions to get around that restriction as well.  You might be forced to create a propertysource that assigns auto.XX properties and then use that to define the dynamic group.  I am not sure why this restriction exists, never have received a satisfactory answer.

  18. I have always used the first method, and it is actually documented to work unambiguously (until I checked, I thought perhaps it evaluated it and could be false if the value is 0).  The exists() function says that it will check the values of all properties and is true if one or more have that value.  I don't know when that would be useful :).

    Function: exists("<property value>")

    This function returns TRUE if the specified value is assigned to any of the resource’s properties.

    Function: <property name>

    Any property name can be referenced as an AppliesTo function. When used alone, it returns TRUE for any resource which has a value set for the specified property. It can be used with operators for comparison purposes.

     

  19. 3 hours ago, Stuart Weenig said:

    Just to be sure, you're not talking about downloading from the device to LM. You're talking about downloading from LM to your laptop.  I thought the first until I saw @mnagel's response.

    You may be right, I just could not see how that would make sense given the way LM does it or why it would be useful :).

  20. https://github.com/willingminds/lmapi-scripts

    See lm-get-configs and the run-lm-get-configs wrapper.  There are a fair number of workarounds in the main script due to various problems with module behavior.  I am currently battling an apparent API bug where the query we use (basically, sort in reverse by version and provide the first result) triggers a bizarre "too many predicates" error. Sent that back to dev when they wanted to wash their hands of it because our API code is in Perl and is "unsupported".

    To use the API module, you need a .lmapi file in the caller's home directly with one set of credentials per portal in YAML.  For example:

    ---
    companies:
        willingminds:
            access_id: '****'
            access_key: '****'

    The wrapper runs the script and checks into git -- not necessary, but we want to track changes and have post-commit hooks to get email reports on what changed.  This was the original reason we wrote it, but it is also super helpful to be able to scan all configs at once with grep or with template-validation tools, etc.

  21. 9 minutes ago, SmokinIndo said:

    I just used the asterisks to hide the name of my sftp site. What it actually printed out was the name of the sftp site that I used for the host property. I tried debugging with print statements, and it looks like the code fails at session.connect(). Am I supposed to be using an IP address for the host? Or the actual name? Do I need to make sure something is installed on my collector for the script to make the sftp connection? THanks for your help. 

    It will use whatever you provide.  If it is a name and the collector can resolve the name, then it should work.  I just looked at the code and I don't see where it would have emitted sftp:// at all -- that is a URL format and it wants just the hostname (or IP).  If you included sftp:// in the hostname, please remove it :).

        def session = jsch.getSession(user, host, port.toInteger()); // Get a session
        session.setPassword(pass) // Set the password to use for the session
  22. PropertySources generally run only once per day or if triggered manually (I don't think they yet have an execution interval you can define, though I'm told that will be true someday).

    However, you can run a WMI query looking for just a specific service as part of the query itself, you don't have to run a full table scan and then examine the results in the code. If you do want to enumerate all services, then you might consider having that one PropertySource generate all the service-based categories you would need.  It is not as modular, but is more efficient.

  23. 32 minutes ago, SmokinIndo said:

    Hi mnagel. I set the host properties on the collector, and when I test the script, I return the following: com.jcraft.jsch.JSchException: java.net.UnknownHostException: sftp://********  

    That sftp site is what I'm using for the sftp.site host property. That's the host I always use to connect to my sftp site, but it looks like it's not working. For science, I checked to make sure that I could connect to the sftp site directly from my collector host, and I successfully able to.

    If that is literally what came out of the script, it sounds like the hostname is being confused with the password.  If that is not the issue, I would add debugging statements to the code and exit 1 to ensure they are printed when you run Poll Now.