Antony Hawkins

LogicMonitor Staff
  • Content Count

    164
  • Joined

  • Last visited

Everything posted by Antony Hawkins

  1. Problem: How do you know how many collection tasks are failing to return data on any given device? You could set "no data" alerting, but that's fraught with issues. An SNMP community can give access to some parts of the OID tree and not others, so unless you set "no data" alerts on every SNMP metric or DataSource (DO NOT DO THIS!!) you might not see an issue. If you do do this, be prepared for thousands of alerts when SNMP fails on one switch stack... Here are a suite of three LogicModules that cause the collector to run a '!tlist' (task list) debug command per monitored resource, which produces a summary output of task types being attempted on the resource, counts of those task types, and counts of how many have some or all metrics returning 'NaN' (no data). As the collector is running the scripts, no credentials are needed. Unusually, I've used a PropertySource to do the work of Active Discovery, because right now the Groovy import used isn't available in AD scripts and an API call (and therefore credentials) would have been necessary. Additionally, creating a property for instances gives further abilities to the DataSources in terms of comparing what the collection scripts find vs what they were expecting to find, meaning they can "fill in the blanks" and identify a need to re-run Active Discovery. There are then two DataSources, one returning counts and NaN counts per task type, and the other returning total counts and NaN counts, plus counts of task types not yet discovered by the PropertySource (i.e., Active Discovery is needed - don't worry, that'll sort itself out with the daily Auto Properties run). There are no alert thresholds set as presented here, and the reasons are various. Firstly there's no differentiation between tasks that have *some* NaN values and tasks with *all* NaN values. That would demand massively more (unfeasibly more) scripting. Therefore it's a bit fuzzier than just being able to say "Zero is fine, anything else is bad". Secondly, some DataSources sometimes have some NaN values without this indicating any sort of failure. Every environment is different so what we're looking for here is patterns, trends, step changes, that sort of thing - these metrics would be ideal presented in top-N graphs in a dashboard, at least until you get a feel for what's "normal" in your environment. This will help guide you to resources with high percentages of tasks returning no data without generating alert noise. Enjoy... PropertySource: "NoData_Tasks_Discovery": v1.3: NPEMD9 DataSources: "NoData_Tasks_By_Type": v1.3: N6PXZP "NoData_Tasks_Overall": v1.3: 3A4LAJ Substantial kudos goes to @Jake Cohen for enlightening me to the fact that the TlistTask import existed and these were therefore possible. Standing on the shoulders of giants, and all that. NB. Immediately after a collector restart, the NoData counts and percentages will likely drop to zero, because while the collector will know the tasks it's going to run, none of them have failed since the restart because they haven't been attempted yet. Therefore, don't set delta alerts. It might look a bit like this in a dashboard for total tasks per resource: Or for a specific task type on a resource: Yes, I have a lot of NaN on some of my resources, thanks to years of experimenting. I probably ought to tidy that up, and now I can see where I need to concentrate my efforts...
  2. @JSmith try now, should be good to go. 👍 Hope you find it useful.
  3. Currently, you cannot pull Website (web and ping check) data into Service Insights. These LogicModules permit you to do just that... STEP ONE: To add individual Website Checks (as opposed to a group of checks) you'll need to identify each check's ID number; as currently these are not visible in the UI I've written this PropertySource to assist: Website_Display_ID, locator code: DK4CJR You can apply this to any collector-monitored resource, or run its script externally. You'll guess from its AppliesTo field that I've applied it to an existing Service Insight in my account, that has a collector assigned to it. This will find the ID of each Website check from the LogicMonitor API, and write those back as a property, auto.website.id, to each Website check, to enable you to easily see the IDs (This will not affect other existing properties, thanks to the use of the PATCH API method): This PropertySource demands resource properties apiaccessid.key and apiaccesskey.key be set on the applied resource; these must be the ID and Key of a LogicMonitor API token pair that has management rights to all Website checks (as the script will be making changes to these). Website check group IDs can be identified within the LogicMonitor URL if you 'focus' on a group - no need for any clever scripting: STEP TWO: Next, the datasources, of which there are four (because Web and Ping checks return different values, and overall and per-checkpoint metrics also differ within a check type): Websites_PingCheck_Overall: 2RZ6XF Websites_PingCheck_PerLocation: FLMNT2 Websites_Website_Overall: 364ZHF Websites_Website_PerLocation: TA4ZGJ All four of these call the LogicMonitor API for whichever Web and/or Ping checks you so choose, and find the checkpoints in use (within Active Discovery) and pull the existing metrics back from our TSDB for each of these. These also can be used on any collector-monitored resource; as you can assign a collector (and therefore failover) to a Service Insight, I recommend doing that rather than picking a device (even a collector device) that may be transient, even if only in the longer term. All four of these have AppliesTo rules of: lmapi.websiteread.id && lmapi.websiteread.key && (website.IDs.csv || website.GroupIDs.csv) lmapi.websiteread.id and lmapi.websiteread.key are properties to set on whichever resource you choose to attach these DataSources to; these should be the ID and Key of a LogicMonitor API token pair the has read access only to the web checks that are relevant to the Service Insight in question. If you use credentials that permit access to all Website checks, anyone with admin rights to the Service Insight (but not to the entire platform) could guess at Website check IDs other than those they are permitted to see. In some environments this will not be a concern; in an MSP customer it could be (if you permit your customers to edit their own Service Insights). You'll notice that in this case the ID is left in plain text (as a .id property not .key) to make identification of the token in use possible. You'll also need to set at least one of the other two properties - website.IDs.csv would be a comma-separated list of individual Website check IDs (as revealed by the PropertySource), whilst website.GroupIDs.csv would be a similar list of Website check groups, as seen in their URLs. Either or both can be set: These will cause the DataSources to apply; Active Discovery will then find the names and checkpoints for each Website check defined, both individually and within groups (any overlaps will result in one discovery only). The collection scripts will then bring in the most recent metrics for each Website check on the defined polling interval - NOTE that as built, the DataSources all poll on a five-minute interval, to match the default / most commonly-used Website check interval. If your Website checks are at a different interval, please tailor the DataSources to suit. If you have a range of intervals defined, you can clone/replicate these or use the [DataSourceName].pollinginterval property (see: https://www.logicmonitor.com/support/devices/device-datasources-instances/can-i-customize-data-collection-intervals-for-a-device/) provided the interval is consistent across the checks being brought in to the Service Insight. If you have the DataSources polling at different intervals to your Website checks, you'll either miss data or have duplicated values. At this point, you have a replica of your selected Website checks data visible under these DataSources, under whichever resource you've set the properties on. Even if this is a Service Insight, this doesn't make the Service Insight consider them without one more step. STEP THREE: You can now configure the Service Insight to consider the metrics from these DataSources, either by adding the DataSources as Individual Instances, or within the broader Members field: Their metrics are then available to be added to the Service Insight for consideration towards the overall Service health: DISCLAIMER: As with everything within this area of the community forum, this is a customised extension of the LogicMonitor platform and not a core feature. It is therefore NOT SUPPORTED by LM Support. Scripts are not vetted to the "Gold Standard" of our core LogicModules. If you have questions or issues with this customisation, please comment here and I will do my best to respond in a timely manner.
  4. This comes up occasionally as a customer request - can we count zombie processes on Linux servers? Yes we can... KYNDEH This DataSource relies on SSH connectivity to the server, and therefore the Linux SSH PropertySource (in core), and once connected runs command: 'ps axo pid=,stat=' ...and then runs through the table counting all the occurrences of 'Z' in the stat column.
  5. Ah, fair point - although that error message strongly suggests a credentials (or lack of) issue, which is a fairly generic SQL troubleshooting issue. Are other SQL (JDBC) datasources applying and working for the server? If not then there's a credentials / access issue (standard support); if they are then it is indeed back to @Mike Suding to take a look at this script specifically.
  6. @krishna I suggest you contact Support directly, through the in-platform chat option. They'll be able to check credential properties, etc. are correctly set up.
  7. Hi @Misha Stamenkovic, please try again now.
  8. Hi Misha, I've asked someone to take a look into this, thanks for pointing it out.
  9. Update: It sent me a text message the other day to tell me Apple had released a new iMac... 😊 NB. The text widget in the middle of the dashboard is populated and updated using an adaptation of my ConfigSource that writes to a dashboard:
  10. A hack of the Microsoft_SQLServer_SystemJobs datasource that will alert you in the event that the available credentials cannot gather SQL System Jobs. In brief, it attempts the same SQL query that the original DataSource runs, but creates no instances on a success - in the event of a failure, it will generate one instance whose description will be the error message, and one datapoint will be applied that will trigger a warning alert after a couple of minutes. It'll look a bit like this where the SQL query fails: Note this *only* tries the query for System Jobs ('select * from msdb.dbo.sysjobs') and I created this only when I noticed we were getting SQL database data, but not system jobs, from some customer devices. v1.1.0: 33H94M
  11. @Joe Williams the alerts count paging works differently to other calls. I don't know why, but it does: https://www.logicmonitor.com/support/rest-api-developers-guide/v1/alerts/get-alerts/ Therefore, your recursion to fetch additional alerts should run if the 'total' is a negative number. Something a bit like this (NOT complete code): // Enclosure to GET alerts for a Group def GETGroupAlerts(groupWildvalue,filterString='',offsetPassed=0) { /* ... define url including size, fields, filters etc and make API call for alerts. Initial offset will be zero as per default passed parameter. Hardcode size to be 1000 as this is the maximum number of results the API will return from one call. */ /* ... actually use the above to make the API call... */ // Parse the API response and put the results into a map, something like: if (code == 200) { // 200 response code (OK), meaning credentials are good. Slurp... def allResponse = new JsonSlurper().parseText(responseBody); def alertCount = allResponse.total; // LOOP THROUGH RESULTS: allResponse.items.each { alert -> alertsMap << [ (alert.id) : [ severity : alert.severity, sdted : alert.sdted, acked : alert.acked, ], ]; } if(alertCount < 0) { /* // DEBUG println 'we ought to go get some more...'; println 'alertCount: ' + alertCount; println 'size: ' + size; println 'offset: ' + offset; println 'size + offset: ' + (size + offset); // END DEBUG /**/ alertsMap << GETGroupAlerts(groupWildvalue,filterString,(size + offset)); } } return alertsMap; } //---------------------------------------------------------------------------------------- Whenever you finally get a response with a positive 'total' number, you're at the end of the alerts list, the recursion will stop, and you'll have one alertsMap object with all the alerts in it, which you can then do whatever you like with. Note the above bits of code are from a script that uses the API v2 data structure. Note also, the hacked out chunks above are nothing like a complete script. Note graph values match Alerts tab values:
  12. @Joe Williams I'll ask the DataSources team to clear it. However, no need to wait - you can recreate it yourself in-platform. Go to the PropertySources page and start a new PropertySource; make sure you have 'Embedded Groovy Script' selected and paste in those three lines of code (above). Apply it to 'isDevice()' and give it a suitable name, and save it.
  13. Great stuff, happy to have been a help. Feel free to call out our Support Engineer by name, so I can give them some kudos too!
  14. Hi, as you've found (and as commented in the original post), DataSource scripts are run by the collector so you'll always get the collector's (or its network's) external IP. For a per-device result you could create a remote Powershell script for Windows devices and an SSH script for Linux devices that connected to the device assigned, and ran a suitable http request on the device, and grabbed that output. At that point the result would be for the monitored device rather than the collector, although of course if all the devices were on one network that only had one public-facing IP, that IP will still be the value returned from a Google search.
  15. @Athique Ahmed you may also want to take a look at a ConfigSource I created that writes to dashboard text widgets: Clearly there'd need to be a different base script to do the SQL query as opposed to gathering a collector .conf file, but all the rest of the logic should "just work".
  16. A while back I published some very simple ConfigSources to monitor your collector .conf files: https://communities.logicmonitor.com/topic/1345-collector-configsources/ Here's an adaptation that writes the various collected configs to a dashboard, writing each of the config outputs to a text widget. Notes: THIS IS A PROOF OF CONCEPT. No warranty is given or implied (value of your investments may go down as well as up, check with your health professional before taking this medicine, etc). Please test before deploying! As with all data within LogicMonitor (or any system), be aware of access rights of users - in this case to whatever Dashboard(s) the config data will be presented on. Be sure to configure your Roles and Users such that only users who have legitimate need to see this data can access whatever Dashboard(s) you send it to. This uses the REST API v1 to verify the target dashboard exists or create it if it doesn't, and also to create / update the text widgets. It will therefore need an API token for an account with management permission for the relevant Dashboard(s), with ID and Key values set as device properties apiaccessid.key and apiaccesskey.key. All of the API interaction is contained with a groovy checkpoint, rather than within the config collection script, so this could very simply be copied into other ConfigSources. The same logic could be used in other LogicModules, such as to write non-numeric outputs of SQL queries or any data collection methods to dashboards. While this provides no history retention as written, it will show current / most recent values. Within the script you can define the desired Dashboard path, e.g. 'Collector Configs/Groovy Check' (default as presented here), Dashboard name (hostDisplayName is the default), widget name format (hostDisplayName: wildvalue) and other initial parameters such as widget colour scheme, description, etc. This is written for REST API v1. One day I may get around to updating it for v2, for greater efficiency, but today is not that day. Tomorrow is not looking likely either. Dashboard text widgets do have a maximum character limit (65,535 characters). I don't think I've seen a collector config near to or in excess of this, so I have no idea whether a larger config from another device would be truncated or whether the widget creation would fail. Other widgets on the dashboard are unaffected by this script creating and updating widgets; likewise later manual changes to widget size, colours, etc should be respected; updates should be to the text content of the widgets only, so the target dashboard could contain other data from the device. For example, it might look a bit like this: Known issues: On the first config collection for a multi-instance ConfigSource like this, and where the target dashboard does not already exist, only one widget will be created in the dashboard. This is because all instances collect more or less simultaneously, and each determines the dashboard is not initially present. Each, therefore, attempts to create the dashboard and as soon as the first instance does so, the others will fail as they cannot create a dashboard that (now) already exists. This could be coded around with a simple delay / re-check on failure, but I haven't had time, and the second config collection will create all expected widgets without issue. Additionally, if you create the dashboard first, this issue will not occur.
  17. @pperreault I've asked the DataSources team to check and clear this and the associated PropertySource.
  18. Hi @Joe Tran, I've asked one of the review team to do this, it should be available shortly. I hope you find it useful!
  19. Very simple PropertySource that outputs the epoch (in milliseconds) and a human-readable timestamp showing when the PropertySource ran - i.e. this will show the last time Auto Properties ran for any given collector-monitored resource. This will be true for regular collector-scheduled runs and manual / API triggered Active Discovery runs. v1.1.0: Y2P3HN Like any device properties, these could be used in dynamic group definitions and reporting. You could use the same approach to add instance-level properties to discovered instances of multi-instance DataSources (using scripted Active Discovery) to be able to see and report on (via a Device Inventory report) most recent successful discovery update times - and therefore also identify instances where discovery hasn't run successfully, or where persistent instances have ceased to respond and therefore haven't been rediscovered. It's so simple, I'll drop the groovy script in here directly: import groovy.time.TimeCategory; println 'auto.properties_discovered_human=' + new Date().format('yyyy-MM-dd HH:mm:ss'); println 'auto.properties_discovered_epoch=' + System.currentTimeMillis(); That's it. It'll look like this for a device:
  20. From a suggestion from @Mike Suding... This is a very basic PropertySource that finds the StartName for the two LogicMonitor Collector Services running on any Windows Collector. The StartName is the account used to start the services, and therefore gives a quick visual check as to whether the services are set to run as local system, or a named service account, and if so whether it's the intended service account. The same script could be put into a ConfigSource and therefore alert you if the services were reconfigured to use a different account. v1.0.0: 66RJND
  21. I wrote this DataSource for a customer with a specific requirement, namely, they have a particular application that should spawn and maintain a specific number of processes on Windows machines. Operation: The DataSource finds all processes on the Windows machine and groups and counts based on name - e.g. if there are processes powershell, powershell#1, powershell#2, then the powershell instance will be added and will show a count of 3. Out of the box this DataSource will create instances for *all* processes as reported from the Win32_PerfRawData_PerfProc_Process WMI class, except the "Idle" process and the "_Total" metrics. This behaviour is unlikely to be of great benefit; the main use case will involve editing the filters (and cloning the DS as appropriate) such that it only brings back processes you care about (and not, for example, the dozens of svchost processes that will be present on every Windows machine). Also returned are thread count, file handle count, and working set metrics, each being the sum of the per-process metrics. This is possible as these are instantaneous values. Note that unlike the per-process DataSource, CPU metrics cannot be returned. This is because these metrics are returned by WMI as incremental counters and the appearance and disappearance of individual processes between polls would render any sum meaningless. It is however possible to see combined CPU metrics for multiple processes via manipulation of the WinProcessStats- DataSource (clone and filter for the processes you need) and smart graphs with a sum aggregation. v1.0.0 Exchange Locator ID: XHT4MD Example of instances found: Overview graphs: Per-instance graphs:
  22. Q: When's the best time to buy a new computer? A: Yesterday or tomorrow. If you bought it yesterday you'd have been benefiting from a shiny, new, faster computer and could have got more done already. Wait until tomorrow and there'll be a better, faster, cheaper* model out so you'll get more for your money. *At the very least, two out of three aint bad, as the famous philosopher Meatloaf once wrote. If you're considering purchasing from Apple, however, there exists solid buying guidance courtesy of https://buyersguide.macrumors.com/ - an excellent source of data relating to each current product, how long since the current version was released, average time between releases, etc. Here's a DataSource that uses Groovy's JSoup parser to pull all of those useful metrics out of the HTML and present them in graphs within LogicMonitor, and of course offers the option to alert on, for example, advice to "buy now" vs waiting for the next refresh. As saved it's applied to couple of (my) Collector machines, explicitly by their display names; you will need to change this most likely. I suggest applying it to one or two devices only within your account; collector devices are ideal. Collection interval is set to 4 hours because the data doesn't change very often (once per day plus a change on the release of each new product). Also, I don't want to annoy macrumors.com and find my collector IPs blocked from connecting to their site, and neither do you, so be sure to avoid applying it to loads of devices or setting it for an unnecessarily-high-frequency collection interval. Active Discovery runs daily to detect new product lines (and redundant products will disappear as you'd imagine. Data graphed is an interpretation of the Buy/Neutral/Caution/Wait statuses MacRumors lists, and a 'days' graph showing the age of the current version, the average age between releases, and the ages of the previous three (at most) versions of the product, for comparison. All releases listed by MacRumors are added as an Instance Level Property. Caveat: Obviously, if MacRumors restructure their buyers guide page, this will break. v1.2.0: 4X69F7 Yes, it's a bit of fun (although you could use it to guide buying cycles... ish... I guess), but more usefully it's a pair of example scripts that will demonstrate how to parse data from a well-structured HTML page, using selector paths; feel free to take the concepts and apply it to other HTML outputs within your environments - as I always say about LogicMonitor, the only two limitations on what we can monitor are what data something exposes, and your own imagination. Enjoy!
  23. No worries - it's this forum being "clever" with internal links!