Joe Tran

Members
  • Content Count

    127
  • Joined

  • Last visited

  • Days Won

    14

Posts posted by Joe Tran

  1. What's your appetite for increasing your CloudWatch costs and some DIY scripting? You would just need to configure the CloudWatch agent to submit system metrics and DIY something to submit anything it doesn't do out-of-box as a custom metric.  With LM Cloud, an add-on module, the cloud collector can fetch any CloudWatch metric, custom or otherwise, as long as the Datapoint metric path can be constructed without much logic. 

    As for RDS MS SQL Server instances, with LM Cloud, you do have access to the CloudWatch metrics. You can install a collector on an existing app server (one that already talks to the database) to perform custom queries/assign custom datasources. 

    The last option is to deploy the collector like an agent (e.g. nano or small sized). I wouldn't recommend it though. This was a management nightmare for us but granted this for an on-premises customer and they were very strict as to what we were allowed to do. YMMV. 

  2. 🤔 ... LogicMonitor's StatusPage allows for webhook integrations. Something can be designed to consume (or scrape) those events . This would necessitate LM to post their planned maintenances, which they do not do.

    @LogicMonitor --- is there a reason why planned maintenances and portal upgrades are not announced on StatusPage?

  3. Looking at the innards of the service-detector.jar, which is where I think the canonical LMRequest class is defined, you won't be able to do this with an internal web check (scripted or out-of-box) as documented

    You would be able to do this with a scripted Datasource though--something similar to this: https://stackoverflow.com/questions/21223084/how-do-i-use-an-ssl-client-certificate-with-apache-httpclient. The libraries listed in the SO thread solution (except for junit, which isn't necessary) are available to the current GA collector. 

    I have not attempted to use the apache httpclient libraries in a scripted internal web check... yet. So if you feel adventurous 😉...

    • Like 1
  4. I had DIY'd a scripted Groovy datasource ages. Not sure if this was exactly what PortMulti- did, but the results should be similar. 

    (host, port) = "##WILDVALUE##".split(':')
    try {
        s = new Socket(host, port.toInteger())
        s.close()
        return 0
    } catch (ConnectException ce) {
        println ce.toString()
        return 1
    } catch (BindException be) {
        println be.toString()
        return 4
    } catch (NoRouteToHostException re) {
        println re.toString()
        return 2
    } catch (PortUnreachableException pe) {
        println pe.toString()
        return 3
    } catch (Exception e) {
        println e.toString()
        return 5
    }

    You would just then setup a datapoint to capture the exitCode and alert if it was > 0. 

  5. Assuming you leverage and consume custom alert messaging, you can define the KB article at the datasource template level. Taking your CPU utilization example, go to your CPU datasource LogicModule, and add the URL to the custom alert messaging for the desired datapoint triggering alerts. 

    If the KB is different for different subset of resources, then the alert messaging should be updated to reference a custom property that would be assigned to or inherited by the resource. Example ##CPU_KB_URL##, then you would assign/inherit the cpu_kb_url property to your different subsets of resources. 

    This does mean you will have to maintain these properties in LM. 

    • Upvote 1
  6. It looks like someone in my org had enabled "Enhanced Monitoring" for several AWS RDS instances--a surprise, to be sure, but a welcome one . 

    I would love Cloud Collector method that can consume this data and display it along side all other metrics we are collecting in LogicMonitor.

    Implementation should be relatively simple. 

    In the discovery, presumably using describe-db-instances, we would just need a system.aws* property for the "dbiresourceid" which can be used to get-log-events

     

  7. We have website-Overall trigger critical alerts and individual test locations trigger errors. I have toyed with the idea of a script that scrapes our ticketing system's API (ServiceNow) for website alerts, query the LM REST API for the alert message from the one/all error level alerts for that monitor object, and add that "context" to the ticket. 

    We're pretty heavy in AWS so this would be all done via one or more Lambda Functions. Biggest downside--this is asynchronous. 

  8. On 8/5/2019 at 2:23 PM, Joe Williams said:

    For the record I tried to be cheeky and used Developer Tools to see what was being done to clone a dashboard and I can't even use that to get it working. Comes up with 'HTTP Status 415 – Unsupported Media Type'.

    I was able to recently use the v2 REST API to clone dashboards. 

    1. 1. HTTP GET your "template" dashboard  via /dashboard/dashboards/{id} 
    2. 2. Modify the "template" to suit your need for the new clone, and deleting the following keys: id, template, userPermission, shareable, (and maybe groupName and groupFullPath, I was cloning private dashboards, so these keys don't apply to my immediate use case)
    3. 3. HTTP POST your modified "template" at /dashboard/dashboards/{id}/clone 

    *insert disclaimer about undocumented REST API features* 

  9. Assuming this is for the website-Overall instance--I'll just leave these other posts about improving and providing better context for Website (formerly Services) alerting:

     

     

    I am guessing the issue is one of scoping. Both website-Overall and the individual test locations are treated as separate instances. LogicMonitor doesn't have a good mechanism for instances to share data and metadata. I have contemplated creating a job/script/function that crawls our ticketing system for Website alert tickets, polls the LM API for the message from the individual test location (which we populated with the ##WEBSITERESPONSE## token), and insert that into the ticket. 🤷‍♂️

  10. As I am sitting here, trying to explain to one of our internal partners, for what seems like the umpteenth time, on how to read an alert threshold expression from a ##THRESHOLD## token--it would be great if there were individual message tokens for each of the thresholds. 

    Something like ##WARNINGTHRESHOLD##, ##ERRORTHRESHOLD##, and ##CRITICALTHRESHOLD## that should render the comparison operator and that respective threshold value, example---

    Quote

    >= 20971520

     

    This way, I can be more clear as to what this string of numbers actually mean in this type of fashion
     

    Quote

    Warning Threshold: ##DATAPOINT## ##WARNINGTHRESHOLD##

    Error Threshold: ##DATAPOINT## ##ERRORTHRESHOLD##

    Critical Threshold: ##DATAPOINT## ##CRITICALTHRESHOLD##

     

    • Upvote 1
  11. I have a version of the "Oracle_DB_BlockedSessions" datasource template deployed and set an alert threshold on a complex datapoint that accounts for WAIT_TIME and SECONDS_IN_WAIT. Here is the complex datapoint expression for those curious---

    if( eq(if(un(WAIT_TIME),0,WAIT_TIME), 0), if(un(SECONDS_IN_WAIT_RAW),0,SECONDS_IN_WAIT_RAW), 0)

    If the complex datapoint has a value over 300 seconds, an alert triggers with all the enriched instance-level autoProps from the Active Discovery script. All other aspects of this template mirror the gold-standard version--including enabling the "Automatically Delete Instance" option. 

    Enter Client X, and they are comfortable with a threshold of 900 seconds. How can I set this custom threshold at a resource group for Client X when they don't currently have any blocking sessions? If I do manage to catch and set this Alert Tuning customization when Client X has a blocking session, will this alert tuning get wiped out when the DSIs are removed automatically?

    I suppose the Active Discovery script could be modified to always output a dummy instance... but that leaves an unpleasant taste in my mouth. 

    Aside from cloning the datasource just for Client X, are there any other alternatives?

    And no, I do not want to alert off of the "Oracle_DB_BlockedSessionOverview" template because a it doesn't do a good job of discerning between one really long blocking session versus sequential and short-lived sessions that happen to exist at the time of the poll. 

     

  12. I don't typically use the /device/devices/{id}/properties/{name} endpoint but i would give the following a try:
     

        # Construct URL
            $resourcePath   = "/device/devices/2332/properties/Failover.Cluster.ParentGUID"
            $url            = $URLRoot + $resourcePath
    
            $data = `
    @"
      {
            `"type`"         : `"custom`"                                    ,
            `"name`"         : `"Failover.Cluster.ParentGUID`"               ,
            `"value`"        : `"$ClusterID`"
      }
    "@
            $response       = Send-Request  `
                -accesskey    $accessKey    `
                -accessid     $accessId     `
                -URL          $url          `
                -data         $data         `
                -httpVerb     "PUT"

    The type key might not be needed. The SwaggerDoc on this endpoint is weird. It literally says that the POST method is supported but all the keys in the model are readOnly?

     

     

  13. For /device/devices the inheritedProperties objects should include a key-value pair that identifies the hostGroupId that object is inherited from

    Example

    {
        "status": 200,
        "errmsg": "OK",
        "data": {
            "total": 1,
            "items": [
                {
                    "name": "hostname",
                    "inheritedProperties": [
                        {
                            "name": "keyname1",
                            "value": "value1",
                            "inheritedFromHostGroupId": 2
                        }
                    ]
                }
            ]  
        }
    }

    I'm not super tied to the name of the proposed key 😉

     

    Thanks!

  14. Yup, customProperties object only has directly assigned props. 

    What you do what are the inheritedProperties for the /device/devices API endpoint--

     $queryParams = '?fields=inheritedProperties,name,id&size=1000&filter=inheritedProperties.name:servicenow.companyid,inheritedProperties.value:DefaultCODE';

     

    P.S. Thanks for bringing this up. I've been trying to find where the "aws.accountid" is grouped under and lo' and behold it's an inheritedProperty!

  15. Resurrecting this feature request to make any such helper class available to PropertySources and any other LogicModule that supports embedded Groovy scripts. 

    I was just asked by a fellow engineer if we could track when specific proprietary software is updated and correlate that with system metrics. A PropertySource would be ideal for this type of version tracking--we would just need to be able to create OpsNotes from within the embedded Groovy. 

  16. What type of report are you running? 

    If this is a custom report using data from the REST API, the location property wouldn't be a customProperties property in this particular scenario, but an inheritedProperties property. 

    IIRC, the Device Inventory report type in the Reports UI, should be able to give you inherited properties without issue.