Cole McDonald

  • Content Count

  • Joined

  • Last visited

  • Days Won


Community Reputation

7 Neutral

About Cole McDonald

  • Rank
    Community All Star

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. Cole McDonald

    API params

    Ultimately, the goal is to get a count of all of the instance per device per collector group so I can sort them and have a script manually re-balance the "auto-balancing" collector groups (ABCG) in our environment. I intend to run this either once a day or once a week, then let the normal ABCG algorithms handle the load in-between. As it sits, all of our VM Hosts have migrated to once collector, and all of our VMs to the other in the group, leading to massive CPU and port availability problems when batchscripts are run against WMI counters. I have to figure out how to get those per device instance counts next.
  2. Cole McDonald

    API params

    I'm trying to get a subset of devices, but my filter isn't filtering... here's my path and query, anything obvious that I'm doing wrong here? $resourcePath = "/device/devices" $queryParam = "?size=1000&filter=CollectorId:'$($'" $collector is previously gathered and the '$($' resolves the way I expect it to ... '12' in one case. But it's getting every device (first 1000) rather than pre-filtering to members of a specific collector.
  3. Cole McDonald

    Smoothing Datapoints

    My first thread here was an effort to get my head around the API to get the kind of functionality I was used to coming from the SCOM world: I see dataSources as a timed script event and propertySources as a run once script event (I normally target one of the Collectors with it to minimize its resource impact and allowing it to only run once per day or so). This allows me to leverage the API and bend it to my will to not only gather realtime data, but also to make historical comparisons that the interface alone doesn't allow for. One of the things I am trying to figure out is how to leverage an azure script to grab performance data from the last hour on key metrics to add to the alert emails that get sent to our support staff so they can have an at a glance view of how the server has been running leading up to an alerting event without having to log into another system to look at it. I also have reports that I run using the metrics collected by SCOM, now by LM that will let me perform "right-sizing" of VMs in Azure and Hyper-V for our customers to get them maximum performance at minimal cost.
  4. Cole McDonald

    Monitoring Logoff/Logon Events for Anomalies

    Happy to help. Can I ask specifically what you're using for your criteria? If it's useful, I'd love to see how you end up performing your metrics. I have other thoughts as well showing average session duration, average number of processes started during a given session, normal logon/logoff times / username/guid. These can be instances linked to servers that the user has access to. If you have server specific access defined in AD, you can even use get-adcomputer and get-aduser to show if anyone trying to log in is part of the AD, but doesn't have access to that particular server. Basically, looking for anything anomalous that could be used to alert for possible security issues. Stay paranoid, stay safe
  5. Cole McDonald

    Monitoring Logoff/Logon Events for Anomalies

    Sure... I'm using this as a datasource targeting isWindows() called "Active Directory Failed Login Count" try { $events = Get-WinEvent ` -ComputerName ##system.sysname## ` -ErrorAction SilentlyContinue ` -FilterHashtable @{ LogName = "Security" Id = 4625 StartTime = (get-date).AddMinutes(-5) } ` | where Message -Match "0xC000006D" } catch { $events = @() } "$($events.count)" No warranty for the code, use at your own risk. Please note the use of backtick line continuation for readbility.
  6. Cole McDonald

    !!! Collector Debug Console Security !!!

    yikes! Sounds like there's a few holes that need to be plugged. Big product, there's bound to be some. Hopefully, these types of issues get pushed ahead of functionality since it's an attack vector into a customer's enviornment.
  7. Cole McDonald

    Simple Check for SSL Cert Expiration Monitoring

    I've lightened the load slightly on the winCertCheck (which is technically no longer the same DS as I've replaced the entirety of the scripts with simplified .NET based powershell scripts to avoid using invoke-command which tends to lead to some resource constraint issues. This should help though, will keep the same instances alive from the old code as the output is identical to the previous version by @Jonathan Arnold: ##--------------- Discovery ------------------## $readOnly = [System.Security.Cryptography.X509Certificates.OpenFlags]"ReadOnly" $localMachine = [System.Security.Cryptography.X509Certificates.StoreLocation]"LocalMachine" $store = new-object System.Security.Cryptography.X509Certificates.X509Store( "\\##SYSTEM.SYSNAME##\root", $localMachine ) $store.Open( $readOnly ) $store.Certificates ` | Select-Object {$_.Thumbprint + "##" + $_.Thumbprint + "##" + $_.Subject + $_.FriendlyName} ` | Format-Table -HideTableHeaders ##--------------------------------------------## ##-------------- Counters --------------------## $readOnly = [System.Security.Cryptography.X509Certificates.OpenFlags]"ReadOnly" $localMachine = [System.Security.Cryptography.X509Certificates.StoreLocation]"LocalMachine" $store = new-object System.Security.Cryptography.X509Certificates.X509Store( "\\##SYSTEM.SYSNAME##\root", $localMachine ) $store.Open( $readOnly ) $store.Certificates ` | Where-Object {($_.Thumbprint -like "##WILDVALUE##")} ` | Select-Object @{ Name = "DaysUntilExpire" Expression = {((Get-Date -Date $_.NotAfter) - (Get-Date)).Days} } ` | Format-List ##--------------------------------------------## (please note the line continuations to help readability of the code) As always, neither I nor Beyond Impact warranty this code. It's working in our environment, I can't guarantee it'll work in yours. This doesn't account for anything that needs credentials other than what the collector uses.
  8. Cole McDonald

    Monitoring Logoff/Logon Events for Anomalies

    I'm doing something like this for failed logon attempts with a simple threshold (event ID 4625 in windows). For that, I'm gathering 5 minutes of security log and counting the # of bad password 4625 events. For your purposes, if you were to have a DS that ran every 5 minutes, it could gather 1 hour of log data, then count an event (whatever the first one for a successful logon is in your environment), then count those in the last 5 minutes and create a ratio. You can then threshold that ratio for alerting. It's basically just getting the current percentage of that event now vs. the past hour... you could even multiply by 100 to make it an actual percentage. If you wanted to get really fancy, you could check every 5 minute segment over that hour except the last one, average them, then compare the last segment to that average... even more fancy, given that data set, you'll have enough to generate a standard deviation, which you can then use as a threshold for the alert. I'm using powershell for my scripts as I know it better than groovy... and it has quite a bit more windows specific commands for gathering data. This would allow you to use $events = get-winevent to gather logs from windows then filter for the event ID and for some various content to eliminate even more of the unnecessary events before wrapping it in ($events).count
  9. Cole McDonald

    Multiple Collectors

    I've also setup a dashboard graph widget to monitor the balance of each of my groups:
  10. Cole McDonald

    Multiple Collectors

    Here's that thread:
  11. Cole McDonald

    Multiple Collectors

    We've been working with the ABCGs and have found some of their foibles. Specifically, when they rebalance, they only consider the instance count, so the device counts may be heavily skewed. I'm working on a rebalancer that does a better job splitting the load. dataSources that use batchscripts to collect are also VERY heavy handed on the collector they run from. Keeping an eye on # of batchscripts specifically can help show you if your environment is truly balanced. Ours ended up filtering all of the hyper-V hosts to one collector and all of the VMs to the other through rebalanacing... so when a DS fires on one VM, it fires on all of them and since they're all on a single collector, there is no load balancing done. Keep an eye on which resources end up on which collector... you may end up having to increase the balance threshold to prevent it from balancing, then manually move a set of them before lowering the threshold back down to maintain. I go through and force a rebalance about once a week. Ideally, the ABCG would sort devices by number of instances reporting to each, then tack each next device and hand it to the next collector in the group in a round robin fashion to get true balance of the load. This is the basis of the rebalancer I'm working on. ABCG has been the majority of my time over the past few weeks trying to prevent resource exhaustion on the collectors. We're a very heavy monitoring shop and we're finding the limits of LM. I have another thread I posted on that has a calculator for getting your threshold to actually balance your instances. I'll have to dig that up.
  12. Cole McDonald

    Simple Check for SSL Cert Expiration Monitoring

    If you're monitoring a windows environment, you may have another DS in your deployment "WinCertCheck" that is turned off by default. I have another thread on this forum with the changes that need to be made to make it work correctly, then just set it ti isWIndows() instead of false() in your appliesTo and you'll get a bit more data that allows for better diagnostics. It's MUCH more aggressive about finding certs on your systems than the SSL DS is... so bear that in mind, you'll need to take a while turning off certs that you don't need to monitor. At scale, this can be cumbersome. I'm working on a script that will help alleviate that part... (i.e. stop alerting on all instances with a given thumbprint).
  13. Cole McDonald

    Complex Boolean operators on alert conditions

    Here's what I used... working like a champ: if( gt( DaysUntilExpire,(0-60) ), DaysUntilExpire, unkn() ) <= 30 14 7 for the alerting drastically reduced the number of alerts that are non-actionable. Thank you Mike!
  14. Cole McDonald

    Complex Boolean operators on alert conditions

    I'd still like to know that there's an instance and how long ago it had expired, just not alert on it. I'll definitely play with the virtual datapoints as a possible workaround though. That's got some promise.
  15. I have a case (certificates) that could use a pair of < , > conditions to handle alerting. There are some certs that need to be in place and expired for the system to work properly and fail certs made against those CAs... most of them have really long expirations on them. I'd like ot raise an alert if the daystoexpire is <30, but >-100. Right now, I'm having to disable alerting manually on thousands of certificates in our environment to enable useful alerting on them. I'll also accept anyone with a good hacky workaround for it... I hate clicking