mnagel

Members
  • Content Count

    494
  • Joined

  • Last visited

  • Days Won

    87

Posts posted by mnagel

  1. @Sarah Terry Just watched your Level Up presentation on dashboard tokens, which I am familiar with. The idea presented to change tokens on the fly to change the view is nice, but requiring dashboard editing makes it inaccessible to regular users. A fix for that would be to add a token settings dropdown to allow regular users to adjust tokens from a list of pre-canned values that admins can manage. This would also avoid changing the underlying tokens in a persistent manner, enabling context changes for anyone using the dashboard.

    • Like 1
  2. I have spent much time the past few years grappling with how to handle alerting within LM.  The "LM way" is to not send too many actual alerts (via email, etc.) and instead review aggregates on dashboards. But, that is a mindless repetitive task people should not have to do to pick up on problems, and flooding inboxes is the only other option.  My suggestion is to implement a method within widgets to alert when data contained exceeds thresholds (or is abnormal).  As a very specific example, if I set a widget to show all core switch interface error rates, I would want to set an alert for the widget itself to let folks know it needs investigation (rather than have all interfaces individually alarm).  I understand that cluster alerts could handle some of this in a very rough manner (I have found cluster alerts rarely can be used due to limitations in how they are defined), but having a widget be able to alert when conditions are met would be above and beyond cluster alerts.  In the port errors example, I might set the condition to "one or more ports with at least 1% errors in or out" for core switches and "3 or more ports with at least 2% errors in or out" for access switches.  A widget in "look at me" state could also be indicated in the dashboard menu for drill-down purposes.  That state should also be something that can be used in alert rules, which would then represent the rollup condition in the widget instead of many alerts for its various datapoints.

    • Like 1
  3. This is a specific case of the more general needed feature of linked (inherited) clones -- this applies to anything that can be cloned so that inheritance and overrides can be leveraged. My original request on this many years back applied to LogicModules (datasources primarily), but the same applies here and to virtually any primary object type that can be cloned. I have been pushing for this repeatedly across the years. Instead of linked clones for modules we now have SMR via Exchange -- much better than blind replacement, but linked clones would be far superior (including this dashboard inheritance use case).

  4. 22 hours ago, DanB said:

    Hello, is there a way to search for a specific datasource metric from w/in the Datasources page(s)? For instances in our previous tool they have online help and every probe listed every single metric that the probe could poll and description of that metric from the specific technology it was probing. I don't see a way to do that in LogicMonitor. 

    Example I am looking for Exchange DAG copy queue length metric but searching for that metric "queue" "DAG" "copy" no results show up. The DataSource page search just seems to look at the DS names. It doesn't look at the defined datapoints defined w/in the DS. 

     

    There are many gaps like that -- you also cannot search by properties in the resource pane.  My workaround is to leverage our API-based endpoint backup script to find things with grep. Clunky, but helps.  Most often I am looking for code examples.

    • Like 1
  5. 10 minutes ago, Mike Moniz said:

    Please stop having the wizard add snmp and esxi and other properties to the root group when using the Add Device Wizard or respect RBAC permissions for users running the wizard.

    We try to use SNMP v3 when possible with all our customers and that doesn't uses the snmp.community property. But if someone uses the wizard for a completely different customer for v2c, it sets snmp.community on root and via inheritance to all other customer's devices and it breaks them. We or our customers then get a bunch of false No Data alerts as LM switch over to using v2c, even with v3 creds provided or our attempts to force v3 with snmp.version. ESXi creds on root can also cause a problem because we sometimes use a domain account for vcenter access, so it looks like "customer/username" and then we end up leaking customer names and usernames to any customer who can look at any info page.

    Thanks!

     

    This is a specific case of the more general "RBAC and groups are not sufficient to support an MSP model", which I have been trying to get fixed for years.  There needs to be structural support for multiple clients, not bolted on as is currently done.

    I never use the wizard, didn't realize it did this was how it worked :).

  6. 5 minutes ago, mnagel said:

    If the box does respond to SNMP, then it will never discover the Linux_SSH property even if you define credentials because addCategory_Linux_SSH only applies if the system has no categories or only "collector" (which seems like an error). This may explain your AD problems if any category was added to those devices for any reason.

    Once that is detected, the various modules will work.  I am not sure if root is required for all of the modules, but I expect it is for at least a few and as you say, this is not documented anywhere explicitly.  Since it must be defined at the device level, you would need to bind your public key to the root account even if only one of the modules requires root (we generally use public key access only for Linux systems).  I would prefer to see sudo supported in all the modules so that a non-root account can be used with restricted command access controlled by sudo.  All that said, I have not tested these fully to see if we can get away with a regular user.  We have none of these yet in use in any of our portals, but I am curious now so will be trying a few things :).

    I just checked and it looks like currently all the commands require no special privileges, but also not all may be appropriate for every Linux flavor. I know a few spotchecks show some will not work on EL6 (which, to be fair, is EOL later this year).

    Linux_SSH_BlockDevicePerformance:    def command = \"cat /proc/diskstats\";
    Linux_SSH_CPUCores:    def command = 'cat /proc/cpuinfo'
    Linux_SSH_CPUCores:    def command = 'cat /proc/stat'
    Linux_SSH_CPUMemory:    def command = 'vmstat -s -S K; echo -n \"Cores:\";nproc --all; echo -n \"load:\"; uptime'
    Linux_SSH_Filesystems:    def command = \"`which df` -P\"
    Linux_SSH_NetworkInterfaces:    def command = \"cat /proc/net/dev\"
    Linux_SSH_NetworkTCPUDP:    def command = '`which netstat` -s'
    Linux_SSH_ServiceStatus:def command = 'systemctl list-units --all --type=service --plain'
    Linux_SSH_SystemClock:    def command = 'date +%s'
    Linux_SSH_TCPUDP:    def command = 'nstat -a -j'
    Linux_SSH_Uptime:    def command = 'echo -n \"Uptime:\"; cat /proc/uptime'

     

  7. 21 minutes ago, DanB said:

    Hi Mike, maybe I'm not explaining enough.

    The only thing discovered on by LM after applying the ssh.user\pass properties

    image.png.98ac91b64c5ca23f95a5fb172a6f45fc.png 

    is nothing but the very basic metrics
    image.png.6318dc1ac47a8def39fdc915343be119.png

    There's no CPU/Disk/Memory, etc...

    I'm asking if the user we created 'lmsvc' does it have to be part of the root group since after running "Active Discovery" again with the properties applied LM still doesn't find anything from any DS still. This is just a new local user on this box.

     

     

    If the box does respond to SNMP, then it will never discover the Linux_SSH property even if you define credentials because addCategory_Linux_SSH only applies if the system has no categories or only "collector" (which seems like an error). This may explain your AD problems if any category was added to those devices for any reason.

    Once that is detected, the various modules will work.  I am not sure if root is required for all of the modules, but I expect it is for at least a few and as you say, this is not documented anywhere explicitly.  Since it must be defined at the device level, you would need to bind your public key to the root account even if only one of the modules requires root (we generally use public key access only for Linux systems).  I would prefer to see sudo supported in all the modules so that a non-root account can be used with restricted command access controlled by sudo.  All that said, I have not tested these fully to see if we can get away with a regular user.  We have none of these yet in use in any of our portals, but I am curious now so will be trying a few things :).

  8. Have you tried the alert threshold report?  You cannot be very granular in which type of threshold you select, but if you export to CSV and open with Excel or equiv, you could probably create a data filter on anything that has a critical threshold defined (based on how many words are in the threshold). Or just write a script to dump the CSV lines that match the critical threshold pattern.

    It may take a while to execute :).  I just tried it with HTML (by mistake) and the page started to render, then crashed the tab.

  9. 1 minute ago, Dominique said:

    Hello,

    No a specific report through Alert Report ...

    https://xxxx.logicmonitor.com/santaba/uiv3/report/download.jsp?1600885411693

    Thanks,

    Dom

    You should be able to add the property to a custom column, but in my case it ends up with no data.  In my case, I use the friendly name as the instance name if possible, otherwise the thumbprint, so my friendly names show in the report.  I can't show auto.windowscerts.dnsnames, though.  Report adds it as a column, but no values show. Seems like ILPs are not valid for reports, which feels like a bug to me, though I am sure I will be told it is a feature request :).

  10. 3 minutes ago, Stuart Weenig said:

    I'm just wishing for the moon now, but perhaps if there were some sort of background process that could run all the AppliesTo for all modules in the Exchange against current known properties? That way you could see what modules at least apply to your devices. Many DSs would not apply because they would require PropertySources to actually run, but it'd be a start.

    Ooh, shiny!  Yes, there should be a on/off button to limit to just applicable devices.  If you need to find something new, just slide to off.

    • Like 1
  11. 2 minutes ago, Dominique said:

    Hello,

    The friendly name is in the field properties in the alert.... How to make it appearing in the report...

    Thanks,

    Dom

    By report, I assume you mean "alert message"?  If so, the property name should work as a token.  For me, it was just generally empty so would need to structure the message to account for that as best as possible since there are no conditional output controls like for template systems.

  12. I have something like this in Exchange.  I did not include the friendlyname in the alert message, but it is there as an ILP and could be used easily.  I just found fairly often it was not defined.  We have an update pending on this module since we found some certs are refreshed very often (e.g., daily) and need to add code to exclude those from discovery (or at least, from alerting).

    KPNWGW

  13. 29 minutes ago, Stuart Weenig said:

    I think someone got carried away building that module and forgot SOLID (particularly SRP):

    singleresponsibilityprinciple.jpg

    I long ago despaired of ever doing anything with LMConfig modules since without OOP and library support, each is provided as a 1000+ line blob.  I assume in the backend, developers have a portal-like harness to work in that does not involve editing in the UI as we must.  If I ever did try to fix anything, the changes would be wiped on the next update (I have and they are). Exchange makes it a bit more palatable, but with that much code, the safe import process could still be pretty painful.  I am much more used to the idea of core overridable features in a library with a profile for each device type (e.g., Oxidized) or like our own getconfig script I wrote long ago where I only had to override 3-4 methods for new device types (how to login, what the prompt looks like, how to disable paging, etc.).  I would reference even RANCID, but would not like to see that method used in LM :).  Still, RANCID pulls much more useful details for devices -- adding more detail here means editing the blob.

  14. I have been trying to figure out a way to auto-remove categories added by PropertySources when they are no longer applicable (e.g., when something stops being an IIS server, SQL server, DHCP server, etc.). I have found recently this is an annoying problem as PropertySources-added categories live forever until manually removed and the modules cannot subtract AFAIK. OTOH, the method of adding a category is undocumented -- I only know about it from review of modules provided by LM, so perhaps there is a method that interprets a negation operator (like !) to remove the listed category?

    I can think of other options, but they would require replicating what a PropertySource does and use the debugger (via API) to change system.categories.  Would prefer to avoid that as it would be pretty complex and using the debugger that way always makes me cringe when I consider the security implications :(.

    • Upvote 1
  15. I hoped since ArubaOS-CX is similar to HPE Procurve, I could just use the existing ConfigSource, but it times out in discovery.  I am generally willing to jump in and code solutions, but the current from-scratch monolithic coding methodology used for ConfigSources makes it effectively impossible for regular folks to do, so.... please add a new ArubaOS-CX module or extend HPE Procurve to support that flavor.  I have a pair of 8320's not yet in production I am able to get developers into.

  16. 5 minutes ago, Stuart Weenig said:

    Yeah, precipitation was something I couldn't get for free.

    FWIW, this is the json that my weather ds fetches:

    
    {
      "base": "stations",
      "clouds": {
        "all": 75
      },
      "cod": 200,
      "coord": {
        "lat": 30.58,
        "lon": -97.86
      },
      "dt": 1600371953,
      "id": 0,
      "main": {
        "feels_like": 301.38,
        "humidity": 58,
        "pressure": 1016,
        "temp": 301.86,
        "temp_max": 303.15,
        "temp_min": 300.37
      },
      "name": "Leander",
      "sys": {
        "country": "US",
        "id": 5739,
        "sunrise": 1600345037,
        "sunset": 1600389261,
        "type": 1
      },
      "timezone": -18000,
      "visibility": 10000,
      "weather": [
        {
          "description": "broken clouds",
          "icon": "04d",
          "id": 803,
          "main": "Clouds"
        }
      ],
      "wind": {
        "deg": 360,
        "gust": 8.2,
        "speed": 5.7
      }
    }

     

    weather is in there, which would be fine.  My recollection was that OWM was free for basic access with a query limit, and the query limit could be kept to with the new caching feature so I will definitely revisit it now. Found it:

    60 calls/minute
    1,000,000 calls/month

    It sounds like a lot, but without caching....  yeah :).

  17. Exactly, I just stopped since I did not want to sign up for a much larger account than needed.  I think there are other attributes I wanted to track, like precipitation, but pretty much that was put off since I could not restrict calls to once per zip code for each hour or so.  We used to get that information (using caching) with Nagios from the Wunderground API and insert detail into alerts via our notification templating system. My philosophy has always been to include as much relevant information in alerts as possible to support Lazy Admin mode.  A bit harder to do here without conditional templating, but I rebuilt that system into our ticketing system via an inbound transform.  The main thing we lose is the ability to run callbacks to get stuff like top 5 process details, etc. Still, we only use that in some cases and most clients get alerts from LM, so still wishing for real templating one day :).

  18. Nice!  I was trying to build out a DS for weather API checks some time back, but I run into the same issue I do with Cisco PSIRT API checks -- no way to cache results short of local files to avoid excessive usage due to duplicate calls.  I guess I will just have to see what I can do with local files.  Not sure if a SQLite binding is included with the provided Groovy libraries.  If not, that could help.

  19. 16 minutes ago, jakemontgomery said:

    There are devices with no alerts beyond warning for "No Data" that are monitored primarily off of SNMP polling. I have discovered a device in which SNMP hasn't been functioning with SNMP for months and it has major implications for us.

    We really could use a "SNMP Troubleshooter" that functions much like the vendor specific troubleshooters that already exist. To name one specifically, the VMware_LM_Troubleshooter.

    I'm honestly a little worried of how many devices will end up triggering from this once it's enabled, but it would be a great tool to ensure we are monitoring what we are expecting to be monitored.

    We have standard rules for this that rely on the no-data alert for the uptime datapoint, which otherwise has no alert thresholds. Finding which datapoint is appropriate is harder than it ought to be and I agree a dedicated troubleshooter DS would greatly simplify things.  Right now we have several alert rules per client to capture the problem adequately.

  20. 18 minutes ago, jbibler said:

     

    Thank you for the info. I don't believe I would be comfortable adding the SSH info to our system, just too big of a security issue and the err-disabled status problems we see are few and far between. 

    We do have a syslog upgrade in place, so i'll investigate that option going forward. Appreciate the response.

    Josh B.

    It is possible via various methods to setup a limited access read-only account, which we generally try to do, but understood completely regardless. To the extent LM relies on SSH it has been problematic on our systems for various reasons (inadvertently enabling LMConfig is one, lack of public key support is another). 

  21. 5 minutes ago, Stuart Weenig said:

    AFAIK, the only way to do this is with child accounts (i think).

    With Exchange should get somewhat better, child accounts should improve.  I recommended in a recent UI/UX discussion that it be possible to keep modules in sync via Exchange across multiple child accounts. Without that, they become very painful to deal with. It also seems every year when we renew our agreement, features are disabled without notice on child accounts (I am currently devoting a bunch of coding time to detect that has happened after silently losing LMConfig the third year in a row -- that one in particular I can determine because /setting/configsources returns no results when it happens).

    • Like 2