Cole McDonald

  • Content Count

  • Joined

  • Last visited

  • Days Won


Everything posted by Cole McDonald

  1. If you have a website built that will take a URL structure that can be married to device/instance property values, you can have the alert generate the URL form the inciting instances properties to direct you to the appropriate page. You may need to build out a redirection page within your site that receives and interprets those URLs for you. ##DATASOURCE## might be the right token to use for building that decision/redirection tree.
  2. ACK should be removable if determined it was checked incorrectly by a user.
  3. That's what I've been doing currently. Any change to it though means you have to change, then distribute that change manually still. So it's pretty much set in stone once you've produced it at scale. The templating would be a way to make changes without having to take this process to that extreme.
  4. You would bring in the ping as a "do not display" value, then make a virtual datapoint that uses an if() to evaluate the >100 and return a NAN for the false condition.
  5. The alerts take you to a specific instance, so it should be possible. URL structure is this: https://<companyName><deviceInstanceNumber> You should be able to derive the instance # from the REST API... but you state that you're a bit of a novice. Might be ##system.instanceid## or just ##instanceID##
  6. My ability to edit the previous post timed out (annoying)... here's the final thing I was going for: if (get-date -format HH -eq 10) { if ( test-path "\\servername\C$\Path\To\File.txt" ) { write-output "1" } else { write-output "0" } } Just checks the 24hr Hour to see if it's 10. Have it fire once an hour.
  7. I don't specifically have a ready made one, but if you make a DS that "AppliesTo" a system.hostname=="oneOfYourCollectors" (make sure it's in the same domain as the resource with the UNC path you want to check). Then something like: if ( test-path "\\servername\C$\Path\To\File.txt" ) { write-output "1" } else { write-output "0" } as a powershell script should do the trick.
  8. With the issues we've been having with collector resource exhaustion, I've been thinking about ways to reduce the amount of dataSources that run on the collectors at any given time. It occurs to me that if there is a host down, all of the datasources are still trying to run against it and having to await timeout before releasing their resources on the collector. I'd like to submit that a Host down status should issue a partial SDT for the device that would prevent all but the host status datasources from running against that device. The host status change could then remove the SDT once it has cleared. This would prevent devices in environments that spin them up / down based on load rather than a schedule from taking up too many collector system resources during their down time. It would also help alleviate strain during larger outages while still providing just the actionable information necessary to address the situation.
  9. or for sites that place dashboards up on big NOC displays at the front of a call center.
  10. No such luck... could you verify that this is the correct URL structure for this @Forrest Evans - LM? Even being really pedantic about it still isn't taking it:,preferredCollectorId same error: Invoke-RestMethod : {"errorMessage":"custom property name cannot be predef.externalResourceID\ncustom property name cannot be predef.externalResourceType\n","errorCode":1404,"errorDetail":null} At C:\Scripts\Balance-LMABCG.ps1:51 char:24 + $response = Invoke-RestMethod ` + ~~~~~~~~~~~~~~~~~~~ + CategoryInfo : InvalidOperation: (System.Net.HttpWebRequest:HttpWebRequest) [Invoke-RestMethod], WebException + FullyQualifiedErrorId : WebCmdletWebResponseException,Microsoft.PowerShell.Commands.InvokeRestMethodCommand
  11. Thank you Forrest. As always, a font of great information. I assume from the reading that the default is "refresh" From the link above (to save folks some clicking): "opType=replace indicates that the properties included in the request payload will be added if they don't already exist, or updated if they do already exist, but all other existing properties will remain the same" I'll be putting this in place to day and I'll report back... hopefully with a new $data block for the script and calling it done!
  12. No updates from LM on this last point. It's the last piece I've got on this. I'm going to put it in place anyway as it works on the other ABCGs in our environment and doesn't do anything to the other when it fails due to the azure resources being REST API incompatible due to naming convention issues with their properties. I suspect it's just a miss and will magically start working at some point. (Again, this is where I point out that we're not liable if you implement this and it doesn't work in your environment). customProperties : {@{name=predef.externalResourceID; ... hopefully they get this sorted out soon.
  13. My thought for passing data from frame to frame was to write to temp properties of the dashboard and use them in the other widgets. I was going to use something like this to make a slicer so you could have a list of devices in a graph and select which ones you want to have show up.
  14. So... I'm on windows, your response came in as I was getting all of that code in there It'll be functionally similar for Linux. Not sure if it will work, but you may be able to install the Powershell core on your Linux collector to see if that will work there. (Free from Microsoft: ) (I don't know Groovy Script yet )
  15. #!!! Requires Credential Manager 2.0 from the repository !!!# Import-Module CredentialManager function Send-Request { param ( $cred, $accessid = $null, $accesskey = $null, $URL , $data = $null, $version = '2' , $httpVerb = "GET" ) if ( $accessId -eq $null) { $accessId = $cred.UserName $accessKey = $cred.GetNetworkCredential().Password } <# Use TLS 1.2 #> [Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12 <# Get current time in milliseconds #> $epoch = [Math]::Round( ( New-TimeSpan ` -start (Get-Date -Date "1/1/1970") ` -end (Get-Date).ToUniversalTime()).TotalMilliseconds ) <# Concatenate Request Details #> $requestVars = $httpVerb + $epoch + $data + $resourcePath <# Construct Signature #> $hmac = New-Object System.Security.Cryptography.HMACSHA256 $hmac.Key = [Text.Encoding]::UTF8.GetBytes( $accessKey ) $signatureBytes = $hmac.ComputeHash( [Text.Encoding]::UTF8.GetBytes( $requestVars ) ) $signatureHex = [System.BitConverter]::ToString( $signatureBytes ) -replace '-' $signature = [System.Convert]::ToBase64String( [System.Text.Encoding]::UTF8.GetBytes( $signatureHex.ToLower() ) ) <# Construct Headers #> $auth = 'LMv1 ' + $accessId + ':' + $signature + ':' + $epoch $headers = New-Object "System.Collections.Generic.Dictionary[[String],[String]]" $headers.Add( "Authorization", $auth ) $headers.Add( "Content-Type" , 'application/json' ) # uses version 2 of the API $headers.Add( "X-version" , $version ) <# Make Request #> $response = Invoke-RestMethod ` -Uri $URL ` -Method $httpVerb ` -Body $data ` -Header $headers $result = $response Return $result } function Get-LMRestAPIObjectListing { param ( $URLBase , $resourcePathRoot , # "/device/devices" $size = 1000 , $accessKey , $accessId ) $output = @() $looping = $true $counter = 0 while ($looping) { #re-calc offset based on iteration $offset = ($counter * $size) + 1 $resourcePath = $resourcePathRoot $queryParam = "?size=$size&offset=$offset" $url = $URLBase + $resourcePath + $queryParam # Make Request $response = Send-Request ` -accesskey $accessKey ` -accessid $accessId ` -URL $url if ( $response.items.count -eq $size ) { # Return set is full, more items to retrieve $output += $response.items $counter++ } elseif ( $response.items.count -gt 0 ) { # Return set is not full, store date, end loop $output += $response.items $looping = $false } else { # Return set is empty, no data to store, end loop $looping = $false } } write-output $output } #!!! Change to your company name !!!# $company = "yourCompanyHere" $URLBase = "https://$" # This will resolve to proper values if it's being run from inside LM $accessID = "##Logicmonitor.AccessID.key##" $accessKey = "##Logicmonitor.AccessKey.key##" if ( $accessID -like "##*" ) { # Not being run from inside LM - populate manually for testing Import-Module CredentialManager $Cred = Get-StoredCredential -Target LogicMonitor $accessID = $cred.UserName $accessKey = $Cred.GetNetworkCredential().Password } #!!! Populate the pertinent ID numbers from the "Info" section of the LM objects $deviceNumber = 123 $dataSourceNumber = 456 $instanceNumber = 789 #region Get collectors $resourcePath = "/device/devices/$deviceNumber/devicedatasources/$datasourceNumber/instances/$instanceNumber/data" $response = Get-LMRestAPIObjectListing ` -resourcePathRoot $resourcePath ` -accessKey $accessKey ` -accessId $accessID ` -URLBase $URLBase To find the ID Numbers, you can build out the $resourcePath for the last few bits until you come across the pieces you need. I develop in PowerShell ISE to allow me to explore the date more easily once I've populated it.
  16. You can grab historical data for an dataSource instance using the REST API. Once you've got the time range you want to evaluate, finding the Max should be relatively simple. Let me fish up a thread with how to grab those counters for you... I found my thread for tokenizing the return using powershell, but apparently, didn't include the data grab portion of the code in the thread
  17. We are in a Microsoft clustered environment: The cluster hosts have multiple vNICs and therefore multiple IPs. Each of those IPs have a DNS entry associated with them. As such, LM sees them as separate entities and creates devices for each of them. Every WMI counter that wants to check on the host is also firing for each of the vNICs. The DNS entry associated with the secondary vNICs on the host each relate to a customer evironment. We'd like to be able to present each customer's metrics to them (ultimately server CPU, MEM, NET) for their part of that resource in their dashboards. Can anyone think of a way to get LM to not duplicate WMI effort on those servers? Is it just a matter of waiting for the dependency mapping to come out and start alleviating those issues? Do I need to write something to prevent them from collecting that data? If I do so will that prevent the customer's dashboard from presenting the data there? I sound like the end of an old timey episodic radio show. Stay tuned next week
  18. I've found an issue that I'm working with LM to address. If the device is an auto-discovered Azure resource, some of the custom properties applied during that process seem to not allow the REST API to alter the device at all. So if you have azure based resources in a collector group, they won't move. The specific property names mentioned in the error message start with "predef." ( predef.externalResourceID, predef.externalResourceType ). Invoke-RestMethod : {"errorMessage":"custom property name cannot be predef.externalResourceID\ncustom property name cannot be predef.externalResourceType\n","errorCode":1404,"errorDetail":null} At C:\Scripts\Balance-LMABCG.ps1:49 char:24 + $response = Invoke-RestMethod ` + ~~~~~~~~~~~~~~~~~~~ + CategoryInfo : InvalidOperation: (System.Net.HttpWebRequest:HttpWebRequest) [Invoke-RestMethod] , WebException + FullyQualifiedErrorId : WebCmdletWebResponseException,Microsoft.PowerShell.Commands.InvokeRestMethodComm and So currently, my code won't work for azure resources. Works like a champ for us on Hyper-V hosted VMs and physicals.
  19. Here's the load drop after changing the frequency of them.
  20. Based on this, we should be able to support 800 devices / collector (large) for WMI queries and we have two of them in the group... so 1600 devices: We're nowhere near that. Reducing the frequency on the Cluster and hyperv counters to 5 minutes seems to have taken off quite a bit of load. My active ports are now down to reasonable counts on the collector again.
  21. I've also lowered the timeout of the wmi calls to allow them to fail faster. They're bottle necking pretty hard... entirely based on available Ports and the close_wait timeout required by the TCP/IP spec. I've even gone so far as to triple the number of available ephemeral ports since the collectors we've got are used for nothing but LM. The problem is that LM doesn't seem to care if there are any ports available, it'll queue the requests no matter the state of the collector itself. Once they start timing out, they become a log jam that builds upon itself and brings the collection of metrics to a screeching halt. Since we use our data collection to prove our SLA to our customers, I have to alleviate that ... and balance it with the spend of extra VMs to increase that capacity. We have no collector groups with > 250 devices... which is well under the capacity we were told they should be able to support. Since that doesn't seem to be the case, we're having to re-provision our VM Spend for the system and address our messaging to our customers ... we'd initially told them this change would reduce our need for extra VMs in their environments as we'd needed with SCOM (we can get into trust and security model discussions about open access vs. gateway'd single point of trust later). But those 200+ servers are choking out our two (really well balanced) collectors. So we're going to be forced to re-examine our deployment if I can't tune LM to account for the WMI load (heavily focused around HyperV and Clustered Resources). I've reduced the frequency on nearly every WMI based 'Source we have as well as the script based ones that would use WMI to gather their data. I've easily cut the load in half from when I've started and it's still hitting the roof on the TCP ports. I've got it flowing pretty well and not dropping too much at this point, but I'll have to add more customers soon and it's going to add to the load on the collectors. (Not having to add the local resources is a sales point for us). Ultimately, I'm just looking for other places I can trim the fat on this system to keep it within initial spec.
  22. Looks as though the Windows Cluster 'Sources are all WMI as well and are quite WMI aggressive in our clustered Hyper-V environment
  23. We have far more WMI requests than I'd like to see on our collectors. Does anyone know if using a batchscript 'Source uses fewer TCP sockets/ephemeral ports to perform data gathering over a WMI based 'Source? The Hyper-V metrics are fairly aggressive in our environment.
  24. I've added the JSON to our integration with teams. Looks good, thank you. The Facts sections seem to look great on my iPhone, but are very narrow under windows. I'm going to see if there's a way to nudge them wider.
  25. From a current methods type of perspective. Having a catch all alerts of priority X with a pass through check box would handle that as well. That pass through (or conversely a "Stop Processing") option in a rule would be fairly easy to implement from a backend code perspective and wouldn't take any grand restructuring on the part of the dev team at LM. Just a UI item that skips/includes the if/then statement to stop processing the alert. Depending on how they've implemented it at the back end, it can be 3-4 lines of code and included in the next release