Antony Hawkins

LogicMonitor Staff
  • Content Count

  • Joined

  • Last visited

Everything posted by Antony Hawkins

  1. Ah, sorry, they got marked private during the LM Exchange roll-out. I've asked the team to review and release them, and have saved new versions as explicitly public. Once they're cleared I'll add them to a package so there should be a single import.
  2. Hi Muiz, Try now. Thanks for pointing this out! Antony
  3. I created this a while back for a customer who wanted to view and alert on license expiry within their PaloAlto firewalls. It uses the same API key and API connection methods as our core PaloAlto_FW_* LogicModules, and calls the '<request><license><info></info></license></request>' endpoint. Monitors for days until expiry (or, whether the license is perpetual, or has already expired). Alert thresholds in this version are as requested by the original customer, at 60 and 30 days remaining, and for "has already expired". v1.0.0: FGMCN9
  4. Updated to use API v2 and handle rate limiting: v1.1.0: HGZENK
  5. From a customer request: Two datasources that track slot occupancy in Dell DRAC blade devices. One lists each slot, one just gives a vacant/occupied count. Sadly, the OIDs exposed by the device give virtually nothing of use for alerting (e.g., there is no reflection of the health of whatever's in any slot), but the value case here is for capacity planning. These will give you instant insight as to the occupancy or otherwise of your devices across all or any parts of your estate, so you'll be able to see quickly whether you already have space to add those six new blades you've determined that you need, whether you can consolidate into fewer lumps of tin, etc. Dell_DRAC_SlotOccupancy: FAY4PX (v1.0.0) (Multi-instance, lists details per slot as instance properties, including occupant type, service tag, etc) Dell_DRAC_SlotOccupancyCounts: RJDT3Z (v1.0.0) (Single-instance, returns counts of total, occupied and vacant slots) Example dashboard view, highlighting low occupancy:
  6. A couple of DataSources for your vCenters, that will show you counts of VMs per unique OS, or per OS family. They will look a bit like this: just i ...and... With further tweaking you could easily have these specifically pull out and therefore alert on the presence of "out of support" OSs (Windows 2000 and 2003, as per above screenshot!). Interestingly, quite often VMWare will list the family as 'null' while a full OS name is reported. I have no idea why, but that's what comes back from the API. Credentials are the same esx.user and esx.pass that you'll already have set for your vCenters, so just import the LogicModules and they'll apply and work. VMware_vCenter_GuestOSNameCounts: NPPDLT (v1.1.0) VMware_vCenter_GuestOSFamilyCounts: JP3KG6 (v1.1.0)
  7. This one came up when a customer pondered how they'd know if ConfigSources weren't finding instances on devices they should be (typically this would be due to an absence of valid credentials, for example). This PropertySource relies on API credentials (set as properties apiaccessid.key and apiaccesskey.key) and checks devices for any ConfigSource that applies to that device, but which has zero instances. As a PropertySource it *only* writes properties, which will look a bit like this: Obviously this doesn't trigger any alerting as written, but you can very very easily write a datasource that simply returns the auto.missing_configsources_count value and then alert on anything non-zero. v1.0.0: ZGEG67 Note: If you're tempted to try the same with DataSources, remember that they're nothing like as clear-cut - a resource may have all sorts of DataSources applied to it (e.g. IIS for a Windows server) that may quite correctly have zero instances discovered.
  8. Problem: How do you know how many collection tasks are failing to return data on any given device? You could set "no data" alerting, but that's fraught with issues. An SNMP community can give access to some parts of the OID tree and not others, so unless you set "no data" alerts on every SNMP metric or DataSource (DO NOT DO THIS!!) you might not see an issue. If you do do this, be prepared for thousands of alerts when SNMP fails on one switch stack... Here are a suite of three LogicModules that cause the collector to run a '!tlist' (task list) debug command per monitored resource, which produces a summary output of task types being attempted on the resource, counts of those task types, and counts of how many have some or all metrics returning 'NaN' (no data). As the collector is running the scripts, no credentials are needed. Unusually, I've used a PropertySource to do the work of Active Discovery, because right now the Groovy import used isn't available in AD scripts and an API call (and therefore credentials) would have been necessary. Additionally, creating a property for instances gives further abilities to the DataSources in terms of comparing what the collection scripts find vs what they were expecting to find, meaning they can "fill in the blanks" and identify a need to re-run Active Discovery. There are then two DataSources, one returning counts and NaN counts per task type, and the other returning total counts and NaN counts, plus counts of task types not yet discovered by the PropertySource (i.e., Active Discovery is needed - don't worry, that'll sort itself out with the daily Auto Properties run). There are no alert thresholds set as presented here, and the reasons are various. Firstly there's no differentiation between tasks that have *some* NaN values and tasks with *all* NaN values. That would demand massively more (unfeasibly more) scripting. Therefore it's a bit fuzzier than just being able to say "Zero is fine, anything else is bad". Secondly, some DataSources sometimes have some NaN values without this indicating any sort of failure. Every environment is different so what we're looking for here is patterns, trends, step changes, that sort of thing - these metrics would be ideal presented in top-N graphs in a dashboard, at least until you get a feel for what's "normal" in your environment. This will help guide you to resources with high percentages of tasks returning no data without generating alert noise. Enjoy... PropertySource: "NoData_Tasks_Discovery": v1.3: NPEMD9 DataSources: "NoData_Tasks_By_Type": v1.3: N6PXZP "NoData_Tasks_Overall": v1.3: 3A4LAJ Substantial kudos goes to @Jake Cohen for enlightening me to the fact that the TlistTask import existed and these were therefore possible. Standing on the shoulders of giants, and all that. NB. Immediately after a collector restart, the NoData counts and percentages will likely drop to zero, because while the collector will know the tasks it's going to run, none of them have failed since the restart because they haven't been attempted yet. Therefore, don't set delta alerts. It might look a bit like this in a dashboard for total tasks per resource: Or for a specific task type on a resource: Yes, I have a lot of NaN on some of my resources, thanks to years of experimenting. I probably ought to tidy that up, and now I can see where I need to concentrate my efforts...
  9. @JSmith try now, should be good to go. 👍 Hope you find it useful.
  10. Currently, you cannot pull Website (web and ping check) data into Service Insights. These LogicModules permit you to do just that... STEP ONE: To add individual Website Checks (as opposed to a group of checks) you'll need to identify each check's ID number; as currently these are not visible in the UI I've written this PropertySource to assist: Website_Display_ID, locator code: DK4CJR You can apply this to any collector-monitored resource, or run its script externally. You'll guess from its AppliesTo field that I've applied it to an existing Service Insight in my account, that has a collector assigned to it. This will find the ID of each Website check from the LogicMonitor API, and write those back as a property,, to each Website check, to enable you to easily see the IDs (This will not affect other existing properties, thanks to the use of the PATCH API method): This PropertySource demands resource properties apiaccessid.key and apiaccesskey.key be set on the applied resource; these must be the ID and Key of a LogicMonitor API token pair that has management rights to all Website checks (as the script will be making changes to these). Website check group IDs can be identified within the LogicMonitor URL if you 'focus' on a group - no need for any clever scripting: STEP TWO: Next, the datasources, of which there are four (because Web and Ping checks return different values, and overall and per-checkpoint metrics also differ within a check type): Websites_PingCheck_Overall: 2RZ6XF Websites_PingCheck_PerLocation: FLMNT2 Websites_Website_Overall: 364ZHF Websites_Website_PerLocation: TA4ZGJ All four of these call the LogicMonitor API for whichever Web and/or Ping checks you so choose, and find the checkpoints in use (within Active Discovery) and pull the existing metrics back from our TSDB for each of these. These also can be used on any collector-monitored resource; as you can assign a collector (and therefore failover) to a Service Insight, I recommend doing that rather than picking a device (even a collector device) that may be transient, even if only in the longer term. All four of these have AppliesTo rules of: && lmapi.websiteread.key && (website.IDs.csv || website.GroupIDs.csv) and lmapi.websiteread.key are properties to set on whichever resource you choose to attach these DataSources to; these should be the ID and Key of a LogicMonitor API token pair the has read access only to the web checks that are relevant to the Service Insight in question. If you use credentials that permit access to all Website checks, anyone with admin rights to the Service Insight (but not to the entire platform) could guess at Website check IDs other than those they are permitted to see. In some environments this will not be a concern; in an MSP customer it could be (if you permit your customers to edit their own Service Insights). You'll notice that in this case the ID is left in plain text (as a .id property not .key) to make identification of the token in use possible. You'll also need to set at least one of the other two properties - website.IDs.csv would be a comma-separated list of individual Website check IDs (as revealed by the PropertySource), whilst website.GroupIDs.csv would be a similar list of Website check groups, as seen in their URLs. Either or both can be set: These will cause the DataSources to apply; Active Discovery will then find the names and checkpoints for each Website check defined, both individually and within groups (any overlaps will result in one discovery only). The collection scripts will then bring in the most recent metrics for each Website check on the defined polling interval - NOTE that as built, the DataSources all poll on a five-minute interval, to match the default / most commonly-used Website check interval. If your Website checks are at a different interval, please tailor the DataSources to suit. If you have a range of intervals defined, you can clone/replicate these or use the [DataSourceName].pollinginterval property (see: provided the interval is consistent across the checks being brought in to the Service Insight. If you have the DataSources polling at different intervals to your Website checks, you'll either miss data or have duplicated values. At this point, you have a replica of your selected Website checks data visible under these DataSources, under whichever resource you've set the properties on. Even if this is a Service Insight, this doesn't make the Service Insight consider them without one more step. STEP THREE: You can now configure the Service Insight to consider the metrics from these DataSources, either by adding the DataSources as Individual Instances, or within the broader Members field: Their metrics are then available to be added to the Service Insight for consideration towards the overall Service health: DISCLAIMER: As with everything within this area of the community forum, this is a customised extension of the LogicMonitor platform and not a core feature. It is therefore NOT SUPPORTED by LM Support. Scripts are not vetted to the "Gold Standard" of our core LogicModules. If you have questions or issues with this customisation, please comment here and I will do my best to respond in a timely manner.
  11. This comes up occasionally as a customer request - can we count zombie processes on Linux servers? Yes we can... KYNDEH This DataSource relies on SSH connectivity to the server, and therefore the Linux SSH PropertySource (in core), and once connected runs command: 'ps axo pid=,stat=' ...and then runs through the table counting all the occurrences of 'Z' in the stat column.
  12. Ah, fair point - although that error message strongly suggests a credentials (or lack of) issue, which is a fairly generic SQL troubleshooting issue. Are other SQL (JDBC) datasources applying and working for the server? If not then there's a credentials / access issue (standard support); if they are then it is indeed back to @Mike Suding to take a look at this script specifically.
  13. @krishna I suggest you contact Support directly, through the in-platform chat option. They'll be able to check credential properties, etc. are correctly set up.
  14. Hi @Misha Stamenkovic, please try again now.
  15. Hi Misha, I've asked someone to take a look into this, thanks for pointing it out.
  16. Update: It sent me a text message the other day to tell me Apple had released a new iMac... 😊 NB. The text widget in the middle of the dashboard is populated and updated using an adaptation of my ConfigSource that writes to a dashboard:
  17. A hack of the Microsoft_SQLServer_SystemJobs datasource that will alert you in the event that the available credentials cannot gather SQL System Jobs. In brief, it attempts the same SQL query that the original DataSource runs, but creates no instances on a success - in the event of a failure, it will generate one instance whose description will be the error message, and one datapoint will be applied that will trigger a warning alert after a couple of minutes. It'll look a bit like this where the SQL query fails: Note this *only* tries the query for System Jobs ('select * from msdb.dbo.sysjobs') and I created this only when I noticed we were getting SQL database data, but not system jobs, from some customer devices. v1.1.0: 33H94M
  18. @Joe Williams the alerts count paging works differently to other calls. I don't know why, but it does: Therefore, your recursion to fetch additional alerts should run if the 'total' is a negative number. Something a bit like this (NOT complete code): // Enclosure to GET alerts for a Group def GETGroupAlerts(groupWildvalue,filterString='',offsetPassed=0) { /* ... define url including size, fields, filters etc and make API call for alerts. Initial offset will be zero as per default passed parameter. Hardcode size to be 1000 as this is the maximum number of results the API will return from one call. */ /* ... actually use the above to make the API call... */ // Parse the API response and put the results into a map, something like: if (code == 200) { // 200 response code (OK), meaning credentials are good. Slurp... def allResponse = new JsonSlurper().parseText(responseBody); def alertCount =; // LOOP THROUGH RESULTS: allResponse.items.each { alert -> alertsMap << [ ( : [ severity : alert.severity, sdted : alert.sdted, acked : alert.acked, ], ]; } if(alertCount < 0) { /* // DEBUG println 'we ought to go get some more...'; println 'alertCount: ' + alertCount; println 'size: ' + size; println 'offset: ' + offset; println 'size + offset: ' + (size + offset); // END DEBUG /**/ alertsMap << GETGroupAlerts(groupWildvalue,filterString,(size + offset)); } } return alertsMap; } //---------------------------------------------------------------------------------------- Whenever you finally get a response with a positive 'total' number, you're at the end of the alerts list, the recursion will stop, and you'll have one alertsMap object with all the alerts in it, which you can then do whatever you like with. Note the above bits of code are from a script that uses the API v2 data structure. Note also, the hacked out chunks above are nothing like a complete script. Note graph values match Alerts tab values:
  19. @Joe Williams I'll ask the DataSources team to clear it. However, no need to wait - you can recreate it yourself in-platform. Go to the PropertySources page and start a new PropertySource; make sure you have 'Embedded Groovy Script' selected and paste in those three lines of code (above). Apply it to 'isDevice()' and give it a suitable name, and save it.
  20. Great stuff, happy to have been a help. Feel free to call out our Support Engineer by name, so I can give them some kudos too!
  21. Hi, as you've found (and as commented in the original post), DataSource scripts are run by the collector so you'll always get the collector's (or its network's) external IP. For a per-device result you could create a remote Powershell script for Windows devices and an SSH script for Linux devices that connected to the device assigned, and ran a suitable http request on the device, and grabbed that output. At that point the result would be for the monitored device rather than the collector, although of course if all the devices were on one network that only had one public-facing IP, that IP will still be the value returned from a Google search.
  22. @Athique Ahmed you may also want to take a look at a ConfigSource I created that writes to dashboard text widgets: Clearly there'd need to be a different base script to do the SQL query as opposed to gathering a collector .conf file, but all the rest of the logic should "just work".
  23. A while back I published some very simple ConfigSources to monitor your collector .conf files: Here's an adaptation that writes the various collected configs to a dashboard, writing each of the config outputs to a text widget. Notes: THIS IS A PROOF OF CONCEPT. No warranty is given or implied (value of your investments may go down as well as up, check with your health professional before taking this medicine, etc). Please test before deploying! As with all data within LogicMonitor (or any system), be aware of access rights of users - in this case to whatever Dashboard(s) the config data will be presented on. Be sure to configure your Roles and Users such that only users who have legitimate need to see this data can access whatever Dashboard(s) you send it to. This uses the REST API v1 to verify the target dashboard exists or create it if it doesn't, and also to create / update the text widgets. It will therefore need an API token for an account with management permission for the relevant Dashboard(s), with ID and Key values set as device properties apiaccessid.key and apiaccesskey.key. All of the API interaction is contained with a groovy checkpoint, rather than within the config collection script, so this could very simply be copied into other ConfigSources. The same logic could be used in other LogicModules, such as to write non-numeric outputs of SQL queries or any data collection methods to dashboards. While this provides no history retention as written, it will show current / most recent values. Within the script you can define the desired Dashboard path, e.g. 'Collector Configs/Groovy Check' (default as presented here), Dashboard name (hostDisplayName is the default), widget name format (hostDisplayName: wildvalue) and other initial parameters such as widget colour scheme, description, etc. This is written for REST API v1. One day I may get around to updating it for v2, for greater efficiency, but today is not that day. Tomorrow is not looking likely either. Dashboard text widgets do have a maximum character limit (65,535 characters). I don't think I've seen a collector config near to or in excess of this, so I have no idea whether a larger config from another device would be truncated or whether the widget creation would fail. Other widgets on the dashboard are unaffected by this script creating and updating widgets; likewise later manual changes to widget size, colours, etc should be respected; updates should be to the text content of the widgets only, so the target dashboard could contain other data from the device. For example, it might look a bit like this: Known issues: On the first config collection for a multi-instance ConfigSource like this, and where the target dashboard does not already exist, only one widget will be created in the dashboard. This is because all instances collect more or less simultaneously, and each determines the dashboard is not initially present. Each, therefore, attempts to create the dashboard and as soon as the first instance does so, the others will fail as they cannot create a dashboard that (now) already exists. This could be coded around with a simple delay / re-check on failure, but I haven't had time, and the second config collection will create all expected widgets without issue. Additionally, if you create the dashboard first, this issue will not occur.