mnagel

Members
  • Content Count

    484
  • Joined

  • Last visited

  • Days Won

    81

Posts posted by mnagel

  1. I would not hold your breath -- I have had to fight just to get and keep SPF enabled on our email.  Regardless, even if you could use the builtin alerts with a distinct From address, you would still have portal links embedded in the message that reveal it is LogicMonitor.  You could do what we do and submit everything via a custom email integration (or a web integration via an API handler), then handle the data any way you like.  In our case, we feed the tokens into an actual template system to format messages using conditional logic and all that stuff missing in the LM blind token substitution method available normally.  We feed that transformed result into a ticket, but obviously it could be handled many different ways at that point, including re-routing via email with proper headers.  The downside of a custom integration is that there is a bug -- certain things LM simply does not send to those that are sent with the builtin integration (e.g., ACK and SDT notices). I have asked about this and in theory it might be fixed one day, but it has not been in well over a year since reported.

    • Upvote 1
  2. Here are at least two items that need to be added to make the dashboard token feature more useful:

    • adjust widgets that cannot use tokens so they can (e.g., Alerts, Netflow, etc.)
    • allow arbitrary tokens to be inserted as needed within widget fields (e.g., device patterns, instance patterns, etc.)

    A concrete example of the latter came upon me this morning.  We have multiple locations with similar equipment for which we want to display Internet usage details, one set per dashboard (cgraph and netflow widgets).  The edge device names vary as do the uplink ports to the ISPs in each location.  Cloning this dashboard solves virtually nothing as every single widget still requires editing.  If the tokens could be used, these dashboards could be cloned without the manual editing other than filling in the necessary tokens.  In some cases the tokens are insertable, but most fields do not allow them.  In this case, I defined various tokens like isp_1_name, isp_1_edge_device, isp_1_edge_port, etc. but could use them in very limited ways ultimately making the exercise pointless.

    As with many things, we can at least workaround this with the API (at least I believe I could with some effort), but it would be much more accessible to folks if handled within the UI.

    • Like 1
  3. 5 minutes ago, mnagel said:

    The normal way I monitor services is via AD, but you would end up with a new instance wildvalue each time it was changed if you use the normal option (WMI-based datasource).  If you use Groovy script DS instead, you could strip the PID portion to build the wildvalue so that the data is stable.  There should be some examples of that in the existing datasource repo, need to dig around....

    Many examples of using WMI from Groovy, none that select from Win32_Service, but should be simple enough to adjust the query.  See Microsoft_LyncServer_StorageService as one example.

    • Like 1
    • Upvote 1
  4. The normal way I monitor services is via AD, but you would end up with a new instance wildvalue each time it was changed if you use the normal option (WMI-based datasource).  If you use Groovy script DS instead, you could strip the PID portion to build the wildvalue so that the data is stable.  There should be some examples of that in the existing datasource repo, need to dig around....

  5. 2 minutes ago, Stuart Weenig said:

    AppliesTo Functions is a misnomer. They should be AppliesTo Aliases.

    Yeah, brings back horrible memories of me requesting repeatedly the documentation on how to pass parameters and getting the most insane response from support :).

  6. 6 minutes ago, dcyriac said:


    Wish for a feature to capture the LogicMonitor settings and state of all the resources, resource credentials, websites, mapping, reports, exchanges and settings.

    Perhaps an offline back so that we can restore monitoring for environments that were deleted without having to go through the scans or process of adding devices individually.

    Or to revert LogicMonitor to a state before or after an account impacting change.

    Best we have been able to do here is a script leveraging the API to download as many endpoints as we are able to access with checkin to a git repo.  Works, but needs frequent tweaking as things change on the backend. Having a way to revert to a previous snapshot or similar would be very handy.  My script came about originally after I implemented alert rule resequencing with an error and lost some rules.  My latest incarnation of this script has an option to check the items as well for problems (e.g., broken widgets).

  7. Since I can't wait for this, we now have code to grab widget data for all supported widget types incorporated into our existing backup script (pulls virtually anything I can from the API into a Git repo regularly).  Most issues can be detected via exception (non-200 status code), some require a bit more analysis (no data in any line in a cgraph, for example). Working reasonably well now for the first phase, which is to be aware of busted widgets before we are embarrassed during client review. Next phase will be to analyze data more specifically to the context (once I figure out how to represent the widget check requirements).

  8. I am incorporating this into my resource check script.  For now, the first item I am testing is Windows_DHCP, which requires that the DHCP Server role is installed (we have an auto.winfeatures property that is populated by a PropertySource. I don't recall where we got it, somewhere in these forums IIRC :).  The PS code is simple:

    hostname=hostProps.get("system.hostname")
    my_query="Select NAME from Win32_serverfeature"
    def session = WMI.open(hostname);
    import com.santaba.agent.groovyapi.win32.WMI
    def result = session.queryAll("CIMv2", my_query, 15);
    println "WinFeatures=" + result.NAME

     

    If this list does not include DHCP Server and we have Windows_DHCP assigned, it will trigger a warning.  I plan to extend this to catch more stale categories.

    My check script tests a bunch of things, including lack of any FQDN or expected FQDNs, lack of NetFlow data (the new heartbeat datasource is not helpful there as it does not care if valid data arrives), and other stuff that can go wooorng.

  9. It seems like a bug to me that if you assign Thresholds in a role that you are allowed to edit thresholds, but not toggle alerts or notifications on and off.  Please either include that within Thresholds, or add another permission to enable that without having to assign full Manager permissions.

    • Upvote 1
  10. Unfortunately, they will need to make instance groups into actual groups first, which I have requested many times.  As it stands they are basically limited tags where you can only have a single tag for any instance and then you can apply thresholds to the tag, but those don't apply to new items tagged the same way.  This is why they throw up the warning about how new instances won't get thresholds automatically.  We've had to workaround that with API scripts to refresh those instance group tags and thresholds when new volumes are added to servers (as just one example). I assume if they ever are made into actual groups, properties would be part of that.

    • Upvote 3
  11. I would start here: https://openvpn.net/community-resources/management-interface/

    There is an example of how to use this (not in an LM-friendly way) at https://kifarunix.com/how-to-monitor-openvpn-connections-using-openvpn-monitor-tool/

    There is also an example (a bit dated) on how to expose data via SNMP here: https://github.com/Phhere/openvpn-snmp

    The problem seems to be a general lack of standard monitoring since OpenVPN runs on so many platforms.  If check_nrpe works, perhaps just punt and use that via an LM script datasource :).

    • Like 1
  12. It sounds like you are expecting a function to be able to take arguments, because "function".  I did as well many years ago when I wanted to create a single isClient function with the client name as an argument, but found after a very painful support ticket that they absolutely are not functions, just macros.

    You would need to write a separate version for each check.  In this case, you would have to add a new function like this and a new one for each version:

    system.virtualization=~"VMware" && auto.version_number =~"6\\.5\\.0*"

     

    • Like 2
  13. I would like to have SSL expiration profiles I can apply to websites with automatic selection based on the CA. This is (for now) based on the fact that Let's Encrypt certificates tend to have short expiration intervals, but it seems like a good general solution to discover stuff automatically as elsewhere within LM.

    Thanks,
    Mark

    • Upvote 1
  14. We have for some time used LucidChart to present network diagrams to our clients via an iframe in a widget. The Topology Mapping feature has its place, but it is not a substitute for diagramming. Among other things, TM has no support for most WAN topologies.

    I had not looked into it much before, but LucidChart has a Data Service API and in theory, various device details could be replicated via that method (https://www.lucidchart.com/pages/api-data-service).  It will take me some time to digest that and figure out how that would be useful for presenting information from LM, but it sure looks like it could do it.

    • Like 1
  15. @Sarah Terry Just watched your Level Up presentation on dashboard tokens, which I am familiar with. The idea presented to change tokens on the fly to change the view is nice, but requiring dashboard editing makes it inaccessible to regular users. A fix for that would be to add a token settings dropdown to allow regular users to adjust tokens from a list of pre-canned values that admins can manage. This would also avoid changing the underlying tokens in a persistent manner, enabling context changes for anyone using the dashboard.

    • Like 1
  16. I have spent much time the past few years grappling with how to handle alerting within LM.  The "LM way" is to not send too many actual alerts (via email, etc.) and instead review aggregates on dashboards. But, that is a mindless repetitive task people should not have to do to pick up on problems, and flooding inboxes is the only other option.  My suggestion is to implement a method within widgets to alert when data contained exceeds thresholds (or is abnormal).  As a very specific example, if I set a widget to show all core switch interface error rates, I would want to set an alert for the widget itself to let folks know it needs investigation (rather than have all interfaces individually alarm).  I understand that cluster alerts could handle some of this in a very rough manner (I have found cluster alerts rarely can be used due to limitations in how they are defined), but having a widget be able to alert when conditions are met would be above and beyond cluster alerts.  In the port errors example, I might set the condition to "one or more ports with at least 1% errors in or out" for core switches and "3 or more ports with at least 2% errors in or out" for access switches.  A widget in "look at me" state could also be indicated in the dashboard menu for drill-down purposes.  That state should also be something that can be used in alert rules, which would then represent the rollup condition in the widget instead of many alerts for its various datapoints.

    • Like 1
  17. This is a specific case of the more general needed feature of linked (inherited) clones -- this applies to anything that can be cloned so that inheritance and overrides can be leveraged. My original request on this many years back applied to LogicModules (datasources primarily), but the same applies here and to virtually any primary object type that can be cloned. I have been pushing for this repeatedly across the years. Instead of linked clones for modules we now have SMR via Exchange -- much better than blind replacement, but linked clones would be far superior (including this dashboard inheritance use case).

  18. 22 hours ago, DanB said:

    Hello, is there a way to search for a specific datasource metric from w/in the Datasources page(s)? For instances in our previous tool they have online help and every probe listed every single metric that the probe could poll and description of that metric from the specific technology it was probing. I don't see a way to do that in LogicMonitor. 

    Example I am looking for Exchange DAG copy queue length metric but searching for that metric "queue" "DAG" "copy" no results show up. The DataSource page search just seems to look at the DS names. It doesn't look at the defined datapoints defined w/in the DS. 

     

    There are many gaps like that -- you also cannot search by properties in the resource pane.  My workaround is to leverage our API-based endpoint backup script to find things with grep. Clunky, but helps.  Most often I am looking for code examples.

    • Like 1
  19. 10 minutes ago, Mike Moniz said:

    Please stop having the wizard add snmp and esxi and other properties to the root group when using the Add Device Wizard or respect RBAC permissions for users running the wizard.

    We try to use SNMP v3 when possible with all our customers and that doesn't uses the snmp.community property. But if someone uses the wizard for a completely different customer for v2c, it sets snmp.community on root and via inheritance to all other customer's devices and it breaks them. We or our customers then get a bunch of false No Data alerts as LM switch over to using v2c, even with v3 creds provided or our attempts to force v3 with snmp.version. ESXi creds on root can also cause a problem because we sometimes use a domain account for vcenter access, so it looks like "customer/username" and then we end up leaking customer names and usernames to any customer who can look at any info page.

    Thanks!

     

    This is a specific case of the more general "RBAC and groups are not sufficient to support an MSP model", which I have been trying to get fixed for years.  There needs to be structural support for multiple clients, not bolted on as is currently done.

    I never use the wizard, didn't realize it did this was how it worked :).

  20. 5 minutes ago, mnagel said:

    If the box does respond to SNMP, then it will never discover the Linux_SSH property even if you define credentials because addCategory_Linux_SSH only applies if the system has no categories or only "collector" (which seems like an error). This may explain your AD problems if any category was added to those devices for any reason.

    Once that is detected, the various modules will work.  I am not sure if root is required for all of the modules, but I expect it is for at least a few and as you say, this is not documented anywhere explicitly.  Since it must be defined at the device level, you would need to bind your public key to the root account even if only one of the modules requires root (we generally use public key access only for Linux systems).  I would prefer to see sudo supported in all the modules so that a non-root account can be used with restricted command access controlled by sudo.  All that said, I have not tested these fully to see if we can get away with a regular user.  We have none of these yet in use in any of our portals, but I am curious now so will be trying a few things :).

    I just checked and it looks like currently all the commands require no special privileges, but also not all may be appropriate for every Linux flavor. I know a few spotchecks show some will not work on EL6 (which, to be fair, is EOL later this year).

    Linux_SSH_BlockDevicePerformance:    def command = \"cat /proc/diskstats\";
    Linux_SSH_CPUCores:    def command = 'cat /proc/cpuinfo'
    Linux_SSH_CPUCores:    def command = 'cat /proc/stat'
    Linux_SSH_CPUMemory:    def command = 'vmstat -s -S K; echo -n \"Cores:\";nproc --all; echo -n \"load:\"; uptime'
    Linux_SSH_Filesystems:    def command = \"`which df` -P\"
    Linux_SSH_NetworkInterfaces:    def command = \"cat /proc/net/dev\"
    Linux_SSH_NetworkTCPUDP:    def command = '`which netstat` -s'
    Linux_SSH_ServiceStatus:def command = 'systemctl list-units --all --type=service --plain'
    Linux_SSH_SystemClock:    def command = 'date +%s'
    Linux_SSH_TCPUDP:    def command = 'nstat -a -j'
    Linux_SSH_Uptime:    def command = 'echo -n \"Uptime:\"; cat /proc/uptime'

     

  21. 21 minutes ago, DanB said:

    Hi Mike, maybe I'm not explaining enough.

    The only thing discovered on by LM after applying the ssh.user\pass properties

    image.png.98ac91b64c5ca23f95a5fb172a6f45fc.png 

    is nothing but the very basic metrics
    image.png.6318dc1ac47a8def39fdc915343be119.png

    There's no CPU/Disk/Memory, etc...

    I'm asking if the user we created 'lmsvc' does it have to be part of the root group since after running "Active Discovery" again with the properties applied LM still doesn't find anything from any DS still. This is just a new local user on this box.

     

     

    If the box does respond to SNMP, then it will never discover the Linux_SSH property even if you define credentials because addCategory_Linux_SSH only applies if the system has no categories or only "collector" (which seems like an error). This may explain your AD problems if any category was added to those devices for any reason.

    Once that is detected, the various modules will work.  I am not sure if root is required for all of the modules, but I expect it is for at least a few and as you say, this is not documented anywhere explicitly.  Since it must be defined at the device level, you would need to bind your public key to the root account even if only one of the modules requires root (we generally use public key access only for Linux systems).  I would prefer to see sudo supported in all the modules so that a non-root account can be used with restricted command access controlled by sudo.  All that said, I have not tested these fully to see if we can get away with a regular user.  We have none of these yet in use in any of our portals, but I am curious now so will be trying a few things :).