• 1

External Alerting - Script - Medium for self-heal/actions?


Question

I will admit that I had completely forgotten that External Alerting was a thing. When we first started with LogicMonitor (like 3+ years ago), someone at LogicMonitor had mentioned External Alerting as a potential solution for a random use-case and I had immediately disregarded out of hand--favoring a Custom HTTP Delivery integration instead. 

Fast forward to now and this post, and all this recent talk about self-heal and actions and an idea was sparked. 

Some internal partners are building automation tools to resolve issues and are pretty comfortable with some DIY. Originally, I had figured I would have to get them setup with an AWS Gateway+Lambda function that can receive alerts triggered which would then start a cascade of custom code, in the correct AWS VPC, to self-heal--but why bother when we have external alerting right? The client environments that I monitor and that these internal partners manage have dedicated collectors in each client environment. Just assign that client's collector to that client's resource groups, throw in a broker-like script that takes in necessary resource metadata,  datasource, and execute the necessary remediation scripts. Disregard any alerts for datasources not supported by our self-healing project. 

This assumes that I'm interpreting External Alerting correctly. The key thing for this to work for my use case would be the ability to have External Alerting AND our normal Alert Rules apply to the same resources/alerts. The Alert Rules would still be responsible for delivering the alert to our ticketing system. Timing of when Alert Rules would trigger and when External Alerting would trigger would be nice. The support center page for this makes it seem the collector polls the resource group at regular, but unknown, intervals. The Alert Rules would populate alert with the ##externalticketid## and it would be neat to have the External Alerting also take that in as a parameter to update said ticket.

I would also need to know if the script executed from this is subject to timeouts, concurrency limits, etc or if there is a limit to the number of External Alerting configs. 

Am I way off base? 

Link to post
Share on other sites

5 answers to this question

Recommended Posts

  • 0

I'm in a multi-tenant IaaS windows environment and I'll need to test my second hop authentication to make sure I can get to the servers we're monitoring from the Collector.  The External alerting documentation specifically says it runs the script on the collector.  As that is the case, I would imagine the script can do anything you could normally do in the script on that collector.

Caveat from the documentation: "You can only have one External Alert entry per Collector."  If you had multiple scripts you wanted to be able to run, they'd need to be tokenized, then pushed through a communication script that starts the pertinent script on the collector which then performs the corrective action on the client.  With that in mind, I'd like to rally for an external alerting method that would allow the Collector assigned to a device to be passed as a token that can be used to define where a common script would run from to be able to reach the client VM.  I also would like them to be able to be a stage in an escalation... potentially using output from the script to inform the escalation chain somehow.  At minimum a true/false continue escalation return flag.

Link to post
Share on other sites
  • 0

This is posted with no warranty.  Use at your own risk.  Don't test it in a production environment:

# Requires "Credential Manager 2.0"
Import-Module credentialmanager

function test-credential {
    param (
        $credential
    )
    Add-Type -AssemblyName System.DirectoryServices.AccountManagement
    $DS = New-Object       System.DirectoryServices.AccountManagement.PrincipalContext('domain')
    write-output           "$($DS.ValidateCredentials($cred.UserName,$cred.GetNetworkCredential().Password))"
}
function get-creds       {
    param (
        $credName = "$env:USERDOMAIN\$env:USERNAME"
    )
    if ( $credName -eq "list" ) {
        # Select stored user from a list
        $credentials = Get-StoredCredential | ogv -PassThru
    } else {
        # Check for Stored Credentials
        $credentials = Get-StoredCredential | ? username -eq $credName
    }

    if ( $credentials.count -eq 1 ) {

        # Test to make sure creds work before moving forward
        if ( test-credential $credentials[0] ) {

            # They work, return the credentials
            write-output $credentials[0]

        } else {

            # They don't work, have the user update the password
            $cred = Get-Credential `
                -UserName $credName `
                -Message "Please update your password"
                 
            # Test to verify the new creds work
            if ( test-credential -credential $cred ) {

                # Update stored cred and return the credentials
                Remove-StoredCredential `
                    -Target   $credName

                New-StoredCredential `
                    -Target   $credName `
                    -UserName $cred.UserName `
                    -Password $cred.GetNetworkCredential().Password

                write-output  $cred

            } else {

                # Updated creds failed, return FALSE
                write-output $false

            }
        }
    } else {

        # Need fresh creds to store for future use
        $cred = Get-Credential `
            -UserName $credName `
            -Message "Enter account password"

            # Test to verify the new creds work
            if ( test-credential -credential $cred ) {

                # Store cred and return the credentials
                New-StoredCredential `
                    -Target   $credName `
                    -UserName $cred.UserName `
                    -Password $cred.GetNetworkCredential().Password

                write-output  $cred

            } else {

                # creds failed, return FALSE
                write-output $false

            }

    }
}

$Computer      = "<1st-Hop Computer>"
$TestComputer  = "<2nd-Hop Computer>"
 
# This was tested with a 'Domain.local' type of domain
$domain        = "<Domain>"
$FQDN          = "$Computer.$domain"

# These credentials need to have access to both computers
# (assumed, not tested - based on my understanding of the CredSSP token auth process)
$cred          = get-creds

#region Enable-CredSSPDelegation

# write-host   ""
$sessionRemote = New-PSSession $Computer

# write-host "get wsmancredssp from remote/delegate"
$remoteSetting = Invoke-Command `
    -Session      $sessionRemote `
    -ScriptBlock  {
        Get-WSManCredSSP
    }

# write-host "set wsmancredssp on remote"
Invoke-Command `
    -Session      $sessionRemote `
    -ScriptBlock  {
        Enable-WSManCredSSP -Role Server -Force
    }

# write-host "set wsmancredssp on local with remote as delegate"
Start-Process     powershell.exe `
    -Verb         runas      `
    -ArgumentList "-command & {Enable-WSManCredSSP -Role Client -DelegateComputer $FQDN -Force}"

# write-host "Connecting WSMan to remote"
Connect-WSMan     $FQDN
Set-Item          WSMan:\$FQDN\Service\Auth\CredSSP -Value $True

sleep 15

# write-host "Starting CredSSP session"
$workingSession = New-PSSession `
    -ComputerName   $FQDN       `
    -Authentication Credssp     `
    -Credential     $cred

#endregion

write-host "--- Starting Processing ---"

# This just looks for the print spooler service on the 2nd-Hop Computer
# It runs the invoke-command on the 1st-Hop Computer to do so
# This Scriptblock is where the payload goes
Invoke-Command                      `
    -session        $workingSession `
    -ScriptBlock    {
        get-service *spool* -ComputerName $TestComputer
    }

write-host "--- Finished Processing ---"

#region Disable-CredSSPDelegation

# write-host "Close Session"
Remove-PSSession -Session $workingSession

# write-host "Disconnecting WSMan from remote"
Set-Item         WSMan:\$FQDN\Service\Auth\CredSSP -Value $False
Disconnect-WSMan $FQDN

# write-host "set wsmancredssp from local back to initial settings"
Start-Process powershell.exe `
    -Verb runas `
    -ArgumentList "-command & {Disable-WSManCredSSP -Role Client}"

# write-host "set wsmancredssp from remote/delegate back to initial settings"

if ( $remoteSetting -like "*is not configured*" ) {
    Invoke-Command `
        -Session $sessionRemote `
        -ScriptBlock  {
            Disable-WSManCredSSP -Role Server
        }
}

Remove-PSSession -Session $sessionRemote

#endregion

No, I'm not OCD, what makes you think that?  I like neat vertical lines I can follow for reference in my code.  This all collapses very nicely in Powershell ISE.

Link to post
Share on other sites
  • 0

In terms of timing between steps in an escalation, we're using blank steps to create time gaps in the escalation.  For instance, AOS services (Dynamics AX) often come up slowly.  We'd like our team to be notified of it if it hasn't com back in 5 minutes, then if it's been a half an hour, we need to either notify the customer or generate a ticket.  We set 5 minutes between steps, then set them up thusly:

  1. 1. 
  2. 2. email internal support
  3. 3. 
  4. .
  5. .
  6. 6. email customer
  7. 7. generate a ticket
  8. 8.

We leave the last bit blank after generating a ticket to prevent the system from sending more tickets through our system (ours is currently driven via email as the escalation chain repeats the last step if left unresolved and unacknowledged.

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Answer this question...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.