Example script for automated alert actions via External Alerting


Kevin Ford
 Share

Recommended Posts

  • LogicMonitor Staff

Below is a PowerShell script that's a handy starting point if you want to trigger actions based on specific alert types. In a nutshell, it takes a number of parameters from each alert and has a section of if/else statements where you can specify what to do based on the alert. It leverages LogicMonitor's External Alerting feature so the script runs local to whatever Collector(s) you configure it on.

I included a couple of example actions for pinging a device and for restarting a service. It also includes some handy (optional) functions for logging as well as attaching a note to the alert in LogicMonitor.

NOTE: this script is provided as-is and you will need to customize it to suit your needs. Automated actions are something that must be approached with careful planning and caution!! LogicMonitor cannot be responsible for inadvertent consequences of using this script.

If you want try it out, here's how to get started:

  1. 1. Update the variables in the appropriate section near the top of the script with optional API credentials and/or log settings. Also change any of the if/elseif statements (starting around line #95) to suit your needs.
  2. 2. Save the script onto your Collector server. I named the file "alert_central.ps1" but feel free to call it something else. NOTE: Store the script somewhere outside the Collector’s directory structure to avoid possibility of it being overwritten during Collector updates.
  3. 3. In your LogicMonitor portal go to Settings, then External Alerting.
  4. 4. Click the Add button.
  5. 5. Set the 'Groups' field as needed to limit the actions to alerts from any appropriate group of resources. (Be sure the group's devices would be reachable from the Collector running the script!)
  6. 6. Choose the appropriate Collector in the 'Collector' field.
  7. 7. Set 'Delivery Mechanism' to "Script"
  8. 8. Enter the path to where you saved the script (in step #2) in the 'Script' field (ex. "c:\scripts\alert_central.ps1").
  9. 9. Paste the following into the 'Script Command Line' field (NOTE: if you add other parameters here then be sure to also add them to the 'Param' line at the top of the script):
  10.     "##ALERTID##" "##ALERTSTATUS##" "##LEVEL##" "##HOSTNAME##" "##SYSTEM.SYSNAME##" "##DSNAME##" "##INSTANCE##" "##DATAPOINT##" "##VALUE##" "##ALERTDETAILURL##" "##DPDESCRIPTION##"

    image.thumb.png.67f5e84d7a0c007c65e2c269e6ccfbe1.png
     
  11. 10. Click Save.

This uses LogicMonitor's External Alerting feature so there are some things to be aware of:

  • The Collector(s) oversee the running of the script, so be conscience to any additional overhead the script actions may cause.
  • It could take up to 60 seconds for the script to trigger from the time the alert comes in.
  • This example is a PowerShell script so best suited for Windows-based collectors, but could certainly be re-written as a shell script for Linux-based collectors.

Here's a screenshot of a cleared alert where the script auto-restarted a Windows service and attached a note based on its actions.

image.thumb.png.52f137a00c3ad60d7e026dcc4767efb7.png

Below is the PowerShell script:

# ----
# This PowerShell script can be used as a starting template for enabling
# automated remediation for alerts coming from LogicMonitor.
# In LogicMonitor, you can use the External Alerting feature to pass all alerts
# (or for a specific group of resources) to this script.
# ----
# To use this script:
#    1. Update the variables in the appropriate section below with optional API and log settings.
#    2. Drop this script onto your Collector server under the Collector's agent/lib directory.
#    3. In your LogicMonitor portal go to Settings, then click External Alerting.
#    4. Click the Add button.
#    5. Set the 'Groups' field as needed to limit the actions to a specific group of resources.
#    6. Choose the appropriate Collector in the 'Collector' field.
#    7. Set 'Delivery Mechanism' to "Script"
#    8. Enter "alert_central.ps1" in the 'Script' field.
#    9. Paste the following into the 'Script Command Line' field:
#       "##ALERTID##" "##ALERTSTATUS##" "##LEVEL##" "##HOSTNAME##" "##SYSTEM.SYSNAME##" "##DSNAME##" "##INSTANCE##" "##DATAPOINT##" "##VALUE##" "##ALERTDETAILURL##" "##DPDESCRIPTION##"
#    10. Click Save.

# The following line captures alert information passed from LogicMonitor (defined in step #9 above)...
Param ($alertID = "", $alertStatus = "", $severity = "", $hostName = "", $sysName = "", $dsName = "", $instance = "", $datapoint = "", $metricValue = "", $alertURL = "", $dpDescription = "")


###--- SET THE FOLLOWING VARIABLES AS APPROPRIATE ---###
# OPTIONAL: LogicMonitor API info for updating alert notes (the API user will need "Acknowledge" permissions)...
$accessId = ''
$accessKey = ''
$company = ''
# OPTIONAL: Set a filename in the following variable if you want specific alerts logged. (example: "C:\lm_alert_central.log")...
$logFile = ''
# OPTIONAL: Destination for syslog alerts...
$syslogServer = ''

###############################################################
## HELPER FUNCTIONS (you likely won't need to change these)  ##

# Function for logging the alert to a local text file if one was specified in the $logFile variable above...
Function LogWrite ($logstring = "")
{
	if ($logFile -ne "") {
		$tmpDate = Get-Date -Format "dddd MM/dd/yyyy HH:mm:ss"

		# Using a mutex to handle file locking if multiple instances of this script trigger at once...
		$LogMutex = New-Object System.Threading.Mutex($false, "LogMutex")
		$LogMutex.WaitOne()|out-null
		"$tmpDate, $logstring" | out-file -FilePath $logFile -Append
		$LogMutex.ReleaseMutex()|out-null
	}
}

# Function for attaching a note to the alert...
function AddNoteToAlert ($alertID = "", $note = "")
{
	# Only execute this if the appropriate API information has been set above...
	if ($accessId -ne '' -and $accessKey -ne '' -and $company -ne '') {
		# Encode the note...
		$encodedNote = $note | ConvertTo-Json

		# API and URL request details...
		$httpVerb = 'POST'
		$resourcePath = '/alert/alerts/' + $alertID + '/note'
		$url = 'https://' + $company + '.logicmonitor.com/santaba/rest' + $resourcePath
		$data = '{"ackComment":' + $encodedNote + '}'

		# Get current time in milliseconds...
		$epoch = [Math]::Round((New-TimeSpan -start (Get-Date -Date "1/1/1970") -end (Get-Date).ToUniversalTime()).TotalMilliseconds)

		# Concatenate general request details...
		$requestVars_00 = $httpVerb + $epoch + $data + $resourcePath

		# Construct signature...
		$hmac = New-Object System.Security.Cryptography.HMACSHA256
		$hmac.Key = [Text.Encoding]::UTF8.GetBytes($accessKey)
		$signatureBytes = $hmac.ComputeHash([Text.Encoding]::UTF8.GetBytes($requestVars_00))
		$signatureHex = [System.BitConverter]::ToString($signatureBytes) -replace '-'
		$signature = [System.Convert]::ToBase64String([System.Text.Encoding]::UTF8.GetBytes($signatureHex.ToLower()))

		# Construct headers...
		$auth = 'LMv1 ' + $accessId + ':' + $signature + ':' + $epoch
		$headers = New-Object "System.Collections.Generic.Dictionary[[String],[String]]"
		$headers.Add("Authorization",$auth)
		$headers.Add("Content-Type",'application/json')

		# Make request to add note.. 
		$response = Invoke-RestMethod -Uri $url -Method $httpVerb -Body $data -Header $headers

		# Change the following if you want to capture API errors somewhere...
		# LogWrite "API call response: $response"
	}
}

function SendTo-SysLog ($IP = "", $Facility = "local7", $Severity = "notice", $Content = "Your payload...", $SourceHostname = $env:computername, $Tag = "LogicMonitor", $Port = 514)
{
	switch -regex ($Facility) {
		'kern' {$Facility = 0 * 8 ; break } 
		'user' {$Facility = 1 * 8 ; break }
		'mail' {$Facility = 2 * 8 ; break }
		'system' {$Facility = 3 * 8 ; break }
		'auth' {$Facility = 4 * 8 ; break }
		'syslog' {$Facility = 5 * 8 ; break }
		'lpr' {$Facility = 6 * 8 ; break }
		'news' {$Facility = 7 * 8 ; break }
		'uucp' {$Facility = 8 * 8 ; break }
		'cron' {$Facility = 9 * 8 ; break }
		'authpriv' {$Facility = 10 * 8 ; break }
		'ftp' {$Facility = 11 * 8 ; break }
		'ntp' {$Facility = 12 * 8 ; break }
		'logaudit' {$Facility = 13 * 8 ; break }
		'logalert' {$Facility = 14 * 8 ; break }
		'clock' {$Facility = 15 * 8 ; break }
		'local0' {$Facility = 16 * 8 ; break }
		'local1' {$Facility = 17 * 8 ; break }
		'local2' {$Facility = 18 * 8 ; break } 
		'local3' {$Facility = 19 * 8 ; break }
		'local4' {$Facility = 20 * 8 ; break }
		'local5' {$Facility = 21 * 8 ; break }
		'local6' {$Facility = 22 * 8 ; break }
		'local7' {$Facility = 23 * 8 ; break }
		default {$Facility = 23 * 8 } #Default is local7
	}

	switch -regex ($Severity) { 
		'^(ac|up)' {$Severity = 1 ; break } # LogicMonitor "active", "ack" or "update"
		'^em' {$Severity = 0 ; break } #Emergency 
		'^a' {$Severity = 1 ; break } #Alert
		'^c' {$Severity = 2 ; break } #Critical
		'^er' {$Severity = 3 ; break } #Error
		'^w' {$Severity = 4 ; break } #Warning
		'^n' {$Severity = 5 ; break } #Notice
		'^i' {$Severity = 6 ; break } #Informational
		'^d' {$Severity = 7 ; break } #Debug
		default {$Severity = 5 } #Default is Notice
	}

	$pri = "<" + ($Facility + $Severity) + ">"

	# Note that the timestamp is local time on the originating computer, not UTC.
	if ($(get-date).day -lt 10) { $timestamp = $(get-date).tostring("MMM d HH:mm:ss") } else { $timestamp = $(get-date).tostring("MMM dd HH:mm:ss") }

	# Hostname does not have to be in lowercase, and it shouldn't have spaces anyway, but lowercase is more traditional.
	# The name should be the simple hostname, not a fully-qualified domain name, but the script doesn't enforce this.
	$header = $timestamp + " " + $sourcehostname.tolower().replace(" ","").trim() + " "

	#Cannot have non-alphanumerics in the TAG field or have it be longer than 32 characters. 
	if ($tag -match '[^a-z0-9]') { $tag = $tag -replace '[^a-z0-9]','' } #Simply delete the non-alphanumerics
	if ($tag.length -gt 32) { $tag = $tag.substring(0,31) } #and truncate at 32 characters.

	$msg = $pri + $header + $tag + ": " + $content

	# Convert message to array of ASCII bytes.
	$bytearray = $([System.Text.Encoding]::ASCII).getbytes($msg)

	# RFC3164 Section 4.1: "The total length of the packet MUST be 1024 bytes or less."
	# "Packet" is not "PRI + HEADER + MSG", and IP header = 20, UDP header = 8, hence:
	if ($bytearray.count -gt 996) { $bytearray = $bytearray[0..995] }

	# Send the message... 
	$UdpClient = New-Object System.Net.Sockets.UdpClient 
	$UdpClient.Connect($IP,$Port) 
	$UdpClient.Send($ByteArray, $ByteArray.length) | out-null
}


# Empty placeholder for capturing any note we might want to attach back to the alert...
$alertNote = ""
# Placeholder for whether we want to capture an alert in our log. Set to true if you want to log everything.
$logThis = $false


###############################################################
## CUSTOMIZE THE FOLLOWING AS NEEDED TO HANDLE SPECIFIC ALERTS FROM LOGICMONITOR...

# Actions to take if the alert is new or re-opened (note: status will be "active" or "clear")...
if ($alertStatus -eq 'active') {
	
	# Perform actions based on the type of alert...

	# Ping alerts...
	if ($dsName -eq 'Ping' -and $datapoint -eq 'PingLossPercent') {
		# Insert action to take if a device becomes unpingable. In this example we'll do a verification ping & capture the output...
		$job = ping -n 4 $sysName

		# Restore line feeds to the output...
		$job = [string]::join("`n", $job)

		# Add ping results as a note on the alert...
		$alertNote = "Automation script output: $job"
		# Log the alert...
		$logThis = $true


	# Restart specific Windows services...
	} elseif ($dsName -eq 'WinService-' -and $datapoint -eq 'State') {
		# List of Windows Services to match against. Only if one of the following are alerting will we try to restart it...
		$serviceList = @("Print Spooler","Service 2")

		# Note: The PowerShell "-Contains" operator is exact in it's matching. Replace it with "-Match" for a loser match.
		if ($serviceList -Contains $instance) {

			# Get an object reference to the Windows service...
			$tmpService = Get-Service -DisplayName "$instance" -ComputerName $sysName

			# Only trigger if the service is still stopped...
			if ($tmpService.Status -eq "Stopped") {
				# Start the service...
				$tmpService | Set-Service -Status Running

				# Capture the current state of the service as a note on the alert...
				$alertNote = "Attempted to auto-restart the service. Its new status is " + $tmpService.Status + "."
			}

			# Log the alert...
			$logThis = $true
		}
	# Actions to take if a website stops responding...
	} elseif ($dsName -eq 'HTTPS-' -and $datapoint -eq 'CantConnect') {
		# Insert action here to take if there's a website error...

		# Example of sending a syslog message to an external server...
		$syslogMessage = "AlertID:$alertID,Host:$sysName,AlertStatus:$alertStatus,LogicModule:$dsName,Instance:$instance,Datapoint:$datapoint,Value:$metricValue,AlertDescription:$dpDescription"
		SendTo-SysLog $syslogServer "" $severity $syslogMessage $hostName "" ""

		# Attach a note to the LogicMonitor alert...
		$alertNote = "Sent syslog message to " + $syslogServer

		# Log the alert...
		$logThis = $true
	}

}


###############################################################
## Final functions for backfilling notes and/or logging as needed
## (you likely won't need to change these)

# Section that updates the LogicMonitor alert if 'alertNote' is not empty...
if ($alertNote -ne "") {
	AddNoteToAlert $alertID $alertNote
}

if ($logThis) {
	# Log the alert (only triggers if a filename is given in the $logFile variable near the top of this script)...
	LogWrite "$alertID,$alertStatus,$severity,$hostName,$sysName,$dsName,$instance,$datapoint,$metricValue,$alertURL,$dpDescription"
}

 

Edited by Kevin Ford
Updated mention of where to save the script
  • Like 4
  • Upvote 1
Link to comment
Share on other sites

  • 3 months later...
  • LogicMonitor Staff

I've updated the script to reflect changes made since the original post. It now includes more examples for handling different alerts (including restarting a Windows service) and a helper function for forwarding syslogs if needed.

Heaven forbid you ever get such an alert storm, but I tested triggering 500 alerts/minute to see if the script could handle them all and it successfully processed all of them within a second of the collector performing its regular 60-second External Alerting check. For reference, the test was simply logging each alert to a file using the script's LogWrite function. More complex response actions will likely require additional overhead at that scale so please be sure to tune your alerts & actions appropriately.

  • Like 1
Link to comment
Share on other sites

  • 3 months later...

I tested this quite extensively yesterday and wanted to share my findings. 

  • Doing a basic service restart with the print spooler service works fine however under no circumstances am I able to get the default alert note($alertNote = "Attempted to auto-restart the service. Its new status is " + $tmpService.Status + ".") to appear in my alert, in my portal.
  • When the script is in the lib directory it works.  Moving it anywhere outside of that directory and specifying the script path with or without quotes doesn't work for me.  For testing purposes, the service account running the collector is a Domain Account & member of Local Administrators Group.  Permissions appear identical in both the lib and external test location.

What do you guys think?

Link to comment
Share on other sites

5 minutes ago, Stuart Weenig said:

As for posting the note back to LM, did you put in the API stuff here?

$accessId = ''
$accessKey = ''
$company = ''

Without it, your script has no ability to post back to LM.

All 3 of those values are populated with an API key tied to a user with the administrator role.  Oddly though when I check this key in my portal it's "date last used" field is blank which to me should at least have a date from yesterday.  

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share