All Activity

This stream auto-updates     

  1. Today
  2. Eric Egolf

    Smoothing Datapoints

    If I understand this approach correctly, it may also solve another item I have been looking at for years. The concept of comparing the average CPU usage for a period of time...lets say the average of an hour on monday with the average of the previous monday and alert if it is over that amount by say 30%. The same holds true for bandwidth on my 300 some odd customer firewalls where we always seem to find out after a customer calls saying internet is slow...then a quick check to Logicmonitor shows they are using way more bandwidth than usual. Would much prefer to simply have a datasource called "Internet Usage Increase" that calls these things out.
  3. Eric Egolf

    Smoothing Datapoints

    Cole, are you suggesting that I could create a datasource, say "smoothed CPU". That datasource would be a powershell script datastore. Then in the manner described in your post, that script would pull the last 10 datapoints from another datasource, say CPU Process Queue via api and then do the averaging/smoothing?
  4. Yesterday
  5. Mike Moniz

    API params

    try: $resourcePath = "/device/devices" $queryParam = "?size=1000&filter=preferredCollectorId:`"$($collector.id)`"" I don't think CollectorId is a valid property for devices and APIv2 docs say to use double quotes with filters.
  6. Stefan W

    Improve Import DataSources from Repository

    There's also some improvement that could be done on handling name collisions from LM Exchange. Right now there appear to be no guard-rails to prevent clobbering an existing LogicModule with a different one which is named the same. Specific example: I was looking at this post https://communities.logicmonitor.com/topic/2129-logicmonitor-portal-metrics/ and imported the first mentioned DataSource J7RGZY named "LogicMonitor_PortalMetrics". At the end of the thread there's another one mentioned, GJNN46 and I imported it to see how it differed. Nope!! The latter is also named "LogicMonitor_PortalMetrics" and importing it silently obliterated the first one. In this case they are different versions of the same DS but it could be a big problem if they were for completely different purposes but had the same name. Our portal is v123 and apparently there's some improvements in v124 for importing, but I feel that even showing a diff (like when updating from the Repository) wouldn't be enough protection. If it's different authors and/or Datapoint names I think the Import confirmation screen should have some big red warning message. Even better would be some option to import and have a different name, to allow trying out both. (Even just changing the name is enough to dissociate from the Exchange and make it an "Unpublished DataSource". That's annoying!)
  7. Cole McDonald

    API params

    Ultimately, the goal is to get a count of all of the instance per device per collector group so I can sort them and have a script manually re-balance the "auto-balancing" collector groups (ABCG) in our environment. I intend to run this either once a day or once a week, then let the normal ABCG algorithms handle the load in-between. As it sits, all of our VM Hosts have migrated to once collector, and all of our VMs to the other in the group, leading to massive CPU and port availability problems when batchscripts are run against WMI counters. I have to figure out how to get those per device instance counts next.
  8. Cole McDonald

    API params

    I'm trying to get a subset of devices, but my filter isn't filtering... here's my path and query, anything obvious that I'm doing wrong here? $resourcePath = "/device/devices" $queryParam = "?size=1000&filter=CollectorId:'$($collector.id)'" $collector is previously gathered and the '$($collector.id)' resolves the way I expect it to ... '12' in one case. But it's getting every device (first 1000) rather than pre-filtering to members of a specific collector.
  9. Cole McDonald

    Smoothing Datapoints

    My first thread here was an effort to get my head around the API to get the kind of functionality I was used to coming from the SCOM world: I see dataSources as a timed script event and propertySources as a run once script event (I normally target one of the Collectors with it to minimize its resource impact and allowing it to only run once per day or so). This allows me to leverage the API and bend it to my will to not only gather realtime data, but also to make historical comparisons that the interface alone doesn't allow for. One of the things I am trying to figure out is how to leverage an azure script to grab performance data from the last hour on key metrics to add to the alert emails that get sent to our support staff so they can have an at a glance view of how the server has been running leading up to an alerting event without having to log into another system to look at it. I also have reports that I run using the metrics collected by SCOM, now by LM that will let me perform "right-sizing" of VMs in Azure and Hyper-V for our customers to get them maximum performance at minimal cost.
  10. Cole McDonald

    Monitoring Logoff/Logon Events for Anomalies

    Happy to help. Can I ask specifically what you're using for your criteria? If it's useful, I'd love to see how you end up performing your metrics. I have other thoughts as well showing average session duration, average number of processes started during a given session, normal logon/logoff times / username/guid. These can be instances linked to servers that the user has access to. If you have server specific access defined in AD, you can even use get-adcomputer and get-aduser to show if anyone trying to log in is part of the AD, but doesn't have access to that particular server. Basically, looking for anything anomalous that could be used to alert for possible security issues. Stay paranoid, stay safe
  11. Eric Egolf

    Smoothing Datapoints

    We have datapoints that are very spiky by nature. In order to see the signal through the noise so to speak we need to average like 10 datapoints together... effectively smoothing the data. For example if we took 1 minute polls of CPU Processor Queue or CPU ready we would want to plot the average of the past 10 datapoints. If anyone has suggestions on how to do this or how they approach datasets that are inherently to noisy for threshold based alerting I would love to hear about it.
  12. Eric Egolf

    Monitoring Logoff/Logon Events for Anomalies

    Perfect thanks Cole. This worked very well for me. The only comment is that I had to find the location of the applications and services logs. I found this article that helped.
  13. Saw this article from Ars Technica pop up in my news feed 😶
  14. Cole McDonald

    Monitoring Logoff/Logon Events for Anomalies

    Sure... I'm using this as a datasource targeting isWindows() called "Active Directory Failed Login Count" try { $events = Get-WinEvent ` -ComputerName ##system.sysname## ` -ErrorAction SilentlyContinue ` -FilterHashtable @{ LogName = "Security" Id = 4625 StartTime = (get-date).AddMinutes(-5) } ` | where Message -Match "0xC000006D" } catch { $events = @() } "$($events.count)" No warranty for the code, use at your own risk. Please note the use of backtick line continuation for readbility.
  15. Cole McDonald

    !!! Collector Debug Console Security !!!

    yikes! Sounds like there's a few holes that need to be plugged. Big product, there's bound to be some. Hopefully, these types of issues get pushed ahead of functionality since it's an attack vector into a customer's enviornment.
  16. Last week
  17. Eric Egolf

    Monitoring Logoff/Logon Events for Anomalies

    Thanks Cole...great approach...any chance you can share your Powershell code or Datasource?
  18. mnagel

    !!! Collector Debug Console Security !!!

    Oh it gets better :). We had an issue awhile back (still do) that could only be resolved via an internal debug command (update system.ips property) normally run in the collector debug context. This is entirely doable via the API. No MFA required, no IP restriction possible. Chew on that one for a bit...
  19. Cole McDonald

    Simple Check for SSL Cert Expiration Monitoring

    I've lightened the load slightly on the winCertCheck (which is technically no longer the same DS as I've replaced the entirety of the scripts with simplified .NET based powershell scripts to avoid using invoke-command which tends to lead to some resource constraint issues. This should help though, will keep the same instances alive from the old code as the output is identical to the previous version by @Jonathan Arnold: ##--------------- Discovery ------------------## $readOnly = [System.Security.Cryptography.X509Certificates.OpenFlags]"ReadOnly" $localMachine = [System.Security.Cryptography.X509Certificates.StoreLocation]"LocalMachine" $store = new-object System.Security.Cryptography.X509Certificates.X509Store( "\\##SYSTEM.SYSNAME##\root", $localMachine ) $store.Open( $readOnly ) $store.Certificates ` | Select-Object {$_.Thumbprint + "##" + $_.Thumbprint + "##" + $_.Subject + $_.FriendlyName} ` | Format-Table -HideTableHeaders ##--------------------------------------------## ##-------------- Counters --------------------## $readOnly = [System.Security.Cryptography.X509Certificates.OpenFlags]"ReadOnly" $localMachine = [System.Security.Cryptography.X509Certificates.StoreLocation]"LocalMachine" $store = new-object System.Security.Cryptography.X509Certificates.X509Store( "\\##SYSTEM.SYSNAME##\root", $localMachine ) $store.Open( $readOnly ) $store.Certificates ` | Where-Object {($_.Thumbprint -like "##WILDVALUE##")} ` | Select-Object @{ Name = "DaysUntilExpire" Expression = {((Get-Date -Date $_.NotAfter) - (Get-Date)).Days} } ` | Format-List ##--------------------------------------------## (please note the line continuations to help readability of the code) As always, neither I nor Beyond Impact warranty this code. It's working in our environment, I can't guarantee it'll work in yours. This doesn't account for anything that needs credentials other than what the collector uses.
  20. Cole McDonald

    Monitoring Logoff/Logon Events for Anomalies

    I'm doing something like this for failed logon attempts with a simple threshold (event ID 4625 in windows). For that, I'm gathering 5 minutes of security log and counting the # of bad password 4625 events. For your purposes, if you were to have a DS that ran every 5 minutes, it could gather 1 hour of log data, then count an event (whatever the first one for a successful logon is in your environment), then count those in the last 5 minutes and create a ratio. You can then threshold that ratio for alerting. It's basically just getting the current percentage of that event now vs. the past hour... you could even multiply by 100 to make it an actual percentage. If you wanted to get really fancy, you could check every 5 minute segment over that hour except the last one, average them, then compare the last segment to that average... even more fancy, given that data set, you'll have enough to generate a standard deviation, which you can then use as a threshold for the alert. I'm using powershell for my scripts as I know it better than groovy... and it has quite a bit more windows specific commands for gathering data. This would allow you to use $events = get-winevent to gather logs from windows then filter for the event ID and for some various content to eliminate even more of the unnecessary events before wrapping it in ($events).count
  21. Background - We have a fairly large citrix environment(70 customers, 1200 users). Each customer has 1 or more xenapp servers depending on how many users. The environment is setup in a manner that often times the first step in troubleshooting is having the users logon/log off(which obviously creates an event id). We would like to plot the number of logon/logoffs(via event ids) per every 10 minute period and look for anomalies(periods of high logons/logoffs relative to normal or relative to number of users in environment). First step for us is simply plotting the data. Any ideas ideas on the best way to approach this problem. My initial thought is simply to write a powershell script to search for the eventids over the 10 minutes and return the number...then apply this to each xenapp server in logicmonitor but maybe there is a better approach? I also don't know the best approach to aggregate by customer or even factor in the number of users...assuming we would need to export to excel to handle some of that. Ideas welcomed.
  22. mnagel

    Conditional EventSources

    I agree and raise you -- there should be a general correlation facility. I would be excessively happy right now to even be able to reference the value of a different datapoint in the same datasource in an alert string. The right solution would be to define correlation rules similar to Zabbix (https://www.zabbix.com/documentation/4.2/manual/config/event_correlation) where you would suppress alerts depending on a complex evaluation of any LogicModule result. For events specifically, they themselves need to be bucketed with a "correlation key" and counters with alerts tied to more than just an ephemeral point in time (see SEC for a great simple-ish tool that does this for event streams (https://simple-evcorr.github.io/).
  23. Cole McDonald

    Multiple Collectors

    I've also setup a dashboard graph widget to monitor the balance of each of my groups:
  24. Cole McDonald

    Multiple Collectors

    Here's that thread:
  25. Cole McDonald

    Multiple Collectors

    We've been working with the ABCGs and have found some of their foibles. Specifically, when they rebalance, they only consider the instance count, so the device counts may be heavily skewed. I'm working on a rebalancer that does a better job splitting the load. dataSources that use batchscripts to collect are also VERY heavy handed on the collector they run from. Keeping an eye on # of batchscripts specifically can help show you if your environment is truly balanced. Ours ended up filtering all of the hyper-V hosts to one collector and all of the VMs to the other through rebalanacing... so when a DS fires on one VM, it fires on all of them and since they're all on a single collector, there is no load balancing done. Keep an eye on which resources end up on which collector... you may end up having to increase the balance threshold to prevent it from balancing, then manually move a set of them before lowering the threshold back down to maintain. I go through and force a rebalance about once a week. Ideally, the ABCG would sort devices by number of instances reporting to each, then tack each next device and hand it to the next collector in the group in a round robin fashion to get true balance of the load. This is the basis of the rebalancer I'm working on. ABCG has been the majority of my time over the past few weeks trying to prevent resource exhaustion on the collectors. We're a very heavy monitoring shop and we're finding the limits of LM. I have another thread I posted on that has a calculator for getting your threshold to actually balance your instances. I'll have to dig that up.
  26. Mike Moniz

    Multiple Collectors

    Yeah, it depends far more on what you are monitoring per device then the number of devices you have. Monitoring all the shares on a windows file server via WMI will put more load then doing SNMP on a switch. I personally don't have a lot of experience in balancing collectors myself and Auto-Balanced Collector Groups (ABCG) are very new and I haven't played with them. I would likely setting up 1 (or 2 for failover) in an auto-balanced group per site and then grow with more collectors as needed. But as I haven't used ABCG perhaps others on the forums can make better suggestions or you can open a chat with LM to look at your particular environment, especially about choosing small/med/large.
  27. starboy9

    Multiple Collectors

    @Mike Moniz Awesome Thank You for that explanation! I have a few follow ups... 1. As far as "Load" when looking at the sizing chart i am not seeing anything that says if you have "X" number of machines you need this many collectors. Is there anything that you do when looking at a new environment to determine the appropriate size/amount of collectors? 2. When splitting from Site-to-Site or segmented networks - would you just create a collector group and then add the discovered devices specific to that site/segment to be monitored by that particular collector?
  28. Mike Moniz

    Multiple Collectors

    There are several reasons why you would want multiple collectors in an environment: High Availability: If a collector system fails you can have another one take it over (LM supports active-active). Also useful for collector upgrades without downtime Load: Depending on how many items (not just devices) your monitoring, you may need many collectors to handle the load. Site: Some site-to-site VPN are not stable enough for monitoring remotely over and there can be a major side-effect mentioned below. Network Segmentation: The collector and it's failover collectors needs to be able to directly communicate with each resource being monitored, so network segmentation may require multiple collectors. (Guess that also depends on your definition of "environment"). I don't think it really matter if you setup a virtual machine or use physical boxes. I think that would depend on your infrastructure. Most of ours are virtual without problems. A big possible problem to keep in mind about remote monitoring is due to LM's current lack of full dependencies. If a site that goes down that is being monitored by a remote collector will cause an alert for every resource at that site. Without the collector also going down, it will not cause a Collector Down condition and will not prevent the those alerts from occurring. So I avoid doing remote monitoring (on the Resource tab) whenever possible personally.
  1. Load more activity