Antony Hawkins

LogicMonitor Staff
  • Posts

    187
  • Joined

  • Last visited

Everything posted by Antony Hawkins

  1. Thanks @Stefan W - I believe this *might* be down to collector resources. https://www.logicmonitor.com/support/collectors/collector-overview/collector-capacity 'In general, the SSE requires half of the amount of memory allotted to the JVM. The memory requirements are not shared, rather the SSE requirement is in addition to the JVM memory requirements. If the Collector does not have this memory available, the SSE will not start and you will see “Can’t find the SSE Collector Group” in the Collector Status dialog. The Collector will work without the SSE, but Groovy scripts will be executed from the Agent instead of the SSE.' My first guess is that those collectors responsible for the working metrics are starting without the SSE, for this reason.
  2. First things first, I didn't think of this as an idea, that was my colleague @Kerry DeVilbiss. However, he just said "can we have...", so I ran off with that as an idea. It's a DataSource that applies to each Collector Resource and checks the current list of available Collector versions (via the published /setting/collector/collectors/versions API endpoint) against the current Collector version information. (Edit: There's also now a ConfigSource that applies to your portal Resource, if you've set that up, alongside our core LogicMonitor_Portal_x DataSources) It therefore can indicate whenever there are newer MGD, GD and EA versions available, and how long those (MGD and GD) versions have been available for. MGD versions are alerted on by default based on how long they've been available for, as these will get auto-updated after 30 days and you'll probably prefer to determine your own schedules for this. There is some intelligence built in around EA versions, such that you could for example alert on the availability of newer EA versions, but only if the Collector is already an EA (as EA Collectors are not recommended for Production, and therefore you don't want that noise). Anyway, plenty of flexibility. You could even use this to alert on the presence of an EA Collector version within your production environment. In vast majority this uses the existing API script functions in our core LogicMonitor_Portal_x DataSources. Demands lmaccess.id and lmaccess.key properties for an API token with read access to settings. Polls once per hour only, so presents minimal load to the API or you Collectors. The DataSource applies per-collector to highlight potential version issues for each: The ConfigSource applies singly to the portal resource, enabling a single alert on release of each type of collector (EA, GD, MGD - you can remove the alerts for any of these). This is a slightly different use case from the DataSource, so you may want either or both: DataSource: v1.1.0: 777Y4N ConfigSource: v1.0.0: WAY7HE
  3. @Brandon Sellers um... no idea on the cause, in terms of a first guess at least. I'm more than happy to take a look, calendar gaps permitting (pretty hectic/full currently, but will see what I can do), if you can point me at a working and a non-working example in your portal (PM me the details).
  4. This is a PowerShell-based PropertySource that will query Windows web servers (using Get-IISSite or Get-Website commands as appropriate to the version of Windows) and provide a list of names in a resource property 'auto.iis.sites': Note this will not list the "Default Web Site", mostly because of the way Windows outputs the results and also because it doesn't mean much. Why? The main purpose of this PropertySource is to enable you to easily group IIS machines based on the sites they're running / supporting, to enable simpler construction of Service Insights, dashboards, etc. What else could you do with this? Well... you *could* then build an API script to create, or ensure existed, website monitors within LogicMonitor for each website that your various servers are running, rather than having to go figure out what those all are and manually construct them. You could also create a DataSource to add HTTP/S monitoring of each site on that server, direct from the collector monitoring that host, as instances (similar to the HTTP_Page- DataSource). Caveat on that one, when I did a quick and dirty test of this, if the collector machine is running IIS, the results you get are from the collector monitoring itself, which means you get a 200 code for the IIS instance regardless of whether the domains you've told IIS about exist or not - i.e. it tells you nothing about whether the rest of the world might be able to access the sites. This is therefore potentially of "limited value", but it's a pretty easy thing to create if your IIS machines are not also your collector machines... Note also on this (really important!), my PowerShell skill level is not particularly amazing, so while this works in most cases, I did notice it didn't pick up on the MS Exchange back end instances of IIS on a test machine. Someone else might be able to tweak this for better effect. Will work locally and remotely; remote connection demands WinRM is enabled to allow the collector to connect and run the PS script block on the target server. Version 1.3, Published with lmLocator: XRNW4X
  5. FYI, Collector GD30.000 (and EA29.1xx+) breaks these modules as execution of PropertySource scripts and others are moved to the SSE (Standalone Script Engine), which does not currently support the method used here. The workaround is to disable the use of SSE (not recommended, the SSE exists for a reason!). If you're using these and now they're producing a bunch of NaN (ironically...) and flatline zeros, please log a feature request / contact your CSM for same. When / if I get some free time I will go back and see if I can do something similar via a different method.
  6. Further comments... I've just used almost all of this to integrate to: https://<my-customer>.freshservice.com/api/v2/tickets/ (note, freshservice.com not freshdesk.com) We needed to also add a "group_id" field and integer value into the JSON payloads (can be tokenised as per requesterID and _api.key values); we also needed to add a "category" field and value as my customer's FreshService setup has this as a mandatory field with a defined list of acceptable values - you may therefore need to do the same or similar additions for any fields your setup defines as mandatory. Finally, the JSON path for the created ticket ID from FreshService is (certain for my customer's version) at ticket.id, not merely id. However, all of these things were very easy to learn from the Integration Logs within LogicMonitor and the results of attempting test alert deliveries, so many thanks @Chris Seiber for providing 98% of the necessary work!
  7. NB. Remember this calls the API to make any necessary changes. If you make 1,000 clones to change 1,000 properties on 1,000 devices once every minute, you will crash into API rate limiting and things won't work as expected. Be sensible with deployment and update time intervals. Your ship is not moving that far every minute...
  8. This is a DataSource that enables you to update any single custom property on any collector-monitored resource. Example use case / how this came to be: Let's imagine for a moment that you have a collection of monitored resources on a mobile piece of kit, for example a sea-going vessel. One of those resources is a GPS locator device, out of which you can retrieve latitude and longitude values, and you'd like to use those to populate the 'location' property for the resource in LogicMonitor, such that your various vessel locations can be shown on a LogicMonitor dashboard. This DataSource will let you do exactly that. It's a framework, into which you'll still have to put a bit of effort, namely, you need to add code to actually find the value you want to set/update on the resource. Obviously enough I have no idea what that might be in your case... However, all the rest of the work - actually updating the resource using the LogicMonitor API - is done for you. Yes, you could use this to grab data from another property, such as an auto.prop set by a simple PropertySource. Yes, you could put the same code into a PropertySource or ConfigSource if that were appropriate for your use case, give or take some tweaks to the outputs. Yes, you could use this to pull SNMP location data from a device, although "3rd floor back office cabinet, rack 2, shelf 6" won't translate well onto a Google Maps widget. You will need to: 1. Create LM API token credentials (ID and key) for a user with rights to manage the resource(s) in question. Set these as apiaccessid.key and apiaccesskey.key properties for the resource(s); 2. In the DataSource script, ensure line 27 defines the custom property name you want to update: customPropertyToBeUpdated = 'location'; Note this must be a property that you can access via managing the resource - this excludes any auto.xxx and system.xxx properties; 3. Create and put whatever necessary code into the try{} block of the getNewPropertyValue() function (lines 32-49 in the template), that returns the desired property value as a string. Note that depending on what that code is, you may need to add further imports to the top of the script; 4. Set a polling interval that makes sense for the rate of change of value; 5. Change the AppliesTo rule from false() to whatever is appropriate. What it does: 1. Runs your code to get a new property value; 2. Checks that this would be a change from the existing property value (or absence of property); 3. If it is a change, calls the LM API and updates (or creates) that custom property. It'll graph results on a success/failure basis, and alert if you're missing API credentials or if those credentials are inadequate for the resource. And, look, see, location data: ...granted, you can't tell I didn't just manually do that... Although designed and released as a location updater, bits of this code started life as an SNMP community string updater (by testing multiple community strings from a list, and setting the one that works: https://communities.logicmonitor.com/topic/1867-pick-one-from-multiple-snmp-community-strings/), so could be used for that or for updating any property. Caveats: This script updates exactly one property. If you wanted to update multiple properties, you could clone the DS for each, or you could adapt the script to loop over multiple properties. If you do the clone route, note that the clones will not all run at the same time, so if you're trying to update a username and a password, that's going to leave a mismatch for a period. Also, we have a proper, in-platform, credentials vault integration on the way anyway. v1.2: 3C2PMM
  9. Proof of concept ConfigSource for CheckPoint firewalls, named as such to avoid confusion when a core version is available. From original work by @David Lee, developed and tweaked as more data became available. Requires SSH credentials that either log in directly to the clish / expert mode / bash shell (">" prompt) *or* at least have the right to issue the 'clish' command without further credential - scripts will detect and adapt based on initially-seen prompt character. Discovery finds Virtual Systems using 'show virtual-system all'; collection then does 'show configuration' for each. v1.0.0: CGHCGH
  10. Quick tip... EDIT: Now updated with an even quicker tip... When (if!) you're creating a scripted LogicModule that calls the LogicMonitor API, you need the account name (<accountName>.logicmonitor.com), API token ID, andAPI token key. You can set all these as resource properties of course, but it seems a bit annoying to have to tell resources in the LogicMonitor account their own account name, right? So you might end up with a section of script like this: def accessId = hostProps.get("lm.api.id"); def accessKey = hostProps.get("lm.api.key"); def account = hostProps.get("lm.api.account"); Here's the thing: Collectors run these scripts. Collectors know which account they belong to, it's in their agent.conf file. Collectors can read their own agent.conf file. It's in a line like: company=<accountName> It's possible to pull out collector settings within scripts, therefore, this works: import com.santaba.agent.util.Settings // You'll need the above import... // ...then: account = Settings.getSetting("company") // and then... accessId = hostProps.get('lm.api.id'); accessKey = hostProps.get('lm.api.key'); Ta-dah! A property you no longer have to set...
  11. Update... I have now constructed a "version 2" of this that produces the same output without making any API calls, meaning no need for API credentials and a more efficient collection. It can do this because the collector running the PropertySource for a resource already knows the tasks it has to run for that resource. v2.1.0: EWZY2K
  12. Thanks for the feedback Vitor, much appreciated.
  13. NB. These are both now marked public and accessible. I've also added them to a package in Exchange, named "VMware_vCenter_GuestOS Counts"
  14. NB. Dynamic Thresholds v2 is the next major step in this "puzzle", as it now gives the ability to learn how many tasks and how many NaN tasks normally exist on a resource; this means you no longer need to set a threshold on e.g. X% NaN tasks, nor alert immediately on a change. You can (with DTv2) learn normal for both and therefore alert on changes that indicate a credential or protocol connectivity change, a resource becoming unresponsive, or an Active Discovery run adding or removing large numbers of instances.
  15. Ah, sorry, they got marked private during the LM Exchange roll-out. I've asked the team to review and release them, and have saved new versions as explicitly public. Once they're cleared I'll add them to a package so there should be a single import.
  16. Hi Muiz, Try now. Thanks for pointing this out! Antony
  17. I created this a while back for a customer who wanted to view and alert on license expiry within their PaloAlto firewalls. It uses the same API key and API connection methods as our core PaloAlto_FW_* LogicModules, and calls the '<request><license><info></info></license></request>' endpoint. Monitors for days until expiry (or, whether the license is perpetual, or has already expired). Alert thresholds in this version are as requested by the original customer, at 60 and 30 days remaining, and for "has already expired". v1.0.0: FGMCN9
  18. Updated to use API v2 and handle rate limiting: v1.1.0: HGZENK
  19. From a customer request: Two datasources that track slot occupancy in Dell DRAC blade devices. One lists each slot, one just gives a vacant/occupied count. Sadly, the OIDs exposed by the device give virtually nothing of use for alerting (e.g., there is no reflection of the health of whatever's in any slot), but the value case here is for capacity planning. These will give you instant insight as to the occupancy or otherwise of your devices across all or any parts of your estate, so you'll be able to see quickly whether you already have space to add those six new blades you've determined that you need, whether you can consolidate into fewer lumps of tin, etc. Dell_DRAC_SlotOccupancy: FAY4PX (v1.0.0) (Multi-instance, lists details per slot as instance properties, including occupant type, service tag, etc) Dell_DRAC_SlotOccupancyCounts: RJDT3Z (v1.0.0) (Single-instance, returns counts of total, occupied and vacant slots) Example dashboard view, highlighting low occupancy:
  20. A couple of DataSources for your vCenters, that will show you counts of VMs per unique OS, or per OS family. They will look a bit like this: just i ...and... With further tweaking you could easily have these specifically pull out and therefore alert on the presence of "out of support" OSs (Windows 2000 and 2003, as per above screenshot!). Interestingly, quite often VMWare will list the family as 'null' while a full OS name is reported. I have no idea why, but that's what comes back from the API. Credentials are the same esx.user and esx.pass that you'll already have set for your vCenters, so just import the LogicModules and they'll apply and work. VMware_vCenter_GuestOSNameCounts: NPPDLT (v1.1.0) VMware_vCenter_GuestOSFamilyCounts: JP3KG6 (v1.1.0)
  21. This one came up when a customer pondered how they'd know if ConfigSources weren't finding instances on devices they should be (typically this would be due to an absence of valid credentials, for example). This PropertySource relies on API credentials (set as properties apiaccessid.key and apiaccesskey.key) and checks devices for any ConfigSource that applies to that device, but which has zero instances. As a PropertySource it *only* writes properties, which will look a bit like this: Obviously this doesn't trigger any alerting as written, but you can very very easily write a datasource that simply returns the auto.missing_configsources_count value and then alert on anything non-zero. v1.0.0: ZGEG67 Note: If you're tempted to try the same with DataSources, remember that they're nothing like as clear-cut - a resource may have all sorts of DataSources applied to it (e.g. IIS for a Windows server) that may quite correctly have zero instances discovered.
  22. Problem: How do you know how many collection tasks are failing to return data on any given device? You could set "no data" alerting, but that's fraught with issues. An SNMP community can give access to some parts of the OID tree and not others, so unless you set "no data" alerts on every SNMP metric or DataSource (DO NOT DO THIS!!) you might not see an issue. If you do do this, be prepared for thousands of alerts when SNMP fails on one switch stack... Here are a suite of three LogicModules that cause the collector to run a '!tlist' (task list) debug command per monitored resource, which produces a summary output of task types being attempted on the resource, counts of those task types, and counts of how many have some or all metrics returning 'NaN' (no data). As the collector is running the scripts, no credentials are needed. Unusually, I've used a PropertySource to do the work of Active Discovery, because right now the Groovy import used isn't available in AD scripts and an API call (and therefore credentials) would have been necessary. Additionally, creating a property for instances gives further abilities to the DataSources in terms of comparing what the collection scripts find vs what they were expecting to find, meaning they can "fill in the blanks" and identify a need to re-run Active Discovery. There are then two DataSources, one returning counts and NaN counts per task type, and the other returning total counts and NaN counts, plus counts of task types not yet discovered by the PropertySource (i.e., Active Discovery is needed - don't worry, that'll sort itself out with the daily Auto Properties run). There are no alert thresholds set as presented here, and the reasons are various. Firstly there's no differentiation between tasks that have *some* NaN values and tasks with *all* NaN values. That would demand massively more (unfeasibly more) scripting. Therefore it's a bit fuzzier than just being able to say "Zero is fine, anything else is bad". Secondly, some DataSources sometimes have some NaN values without this indicating any sort of failure. Every environment is different so what we're looking for here is patterns, trends, step changes, that sort of thing - these metrics would be ideal presented in top-N graphs in a dashboard, at least until you get a feel for what's "normal" in your environment. This will help guide you to resources with high percentages of tasks returning no data without generating alert noise. Enjoy... PropertySource: "NoData_Tasks_Discovery": v1.3: NPEMD9 DataSources: "NoData_Tasks_By_Type": v1.3: N6PXZP "NoData_Tasks_Overall": v1.3: 3A4LAJ Substantial kudos goes to @Jake Cohen for enlightening me to the fact that the TlistTask import existed and these were therefore possible. Standing on the shoulders of giants, and all that. NB. Immediately after a collector restart, the NoData counts and percentages will likely drop to zero, because while the collector will know the tasks it's going to run, none of them have failed since the restart because they haven't been attempted yet. Therefore, don't set delta alerts. It might look a bit like this in a dashboard for total tasks per resource: Or for a specific task type on a resource: Yes, I have a lot of NaN on some of my resources, thanks to years of experimenting. I probably ought to tidy that up, and now I can see where I need to concentrate my efforts...
  23. @JSmith try now, should be good to go. 👍 Hope you find it useful.