Brandon

Members
  • Content Count

    79
  • Joined

  • Last visited

  • Days Won

    15

Posts posted by Brandon

  1. Hi @Stuart Weenig!

    Thanks so much for taking the time to look into this.  I did discover some issues with my output and it's now properly formatted.  But, unfortunately, it's still not changing the appearance of the icons.  I suspect perhaps this is related to the combination of category and type.

    For example - the in order for the AWS EC2 icon to appear on an EC2 instance, the output would need to look something like this:

    {"rawERIs":[{"category":"cloud.aws","priority":2,"type":"AWS EC2","value":"i-a1b2c3d4e5f6g7"}]}

    Of course, this is only a theory.  I have yet to find any documentation around categories and how they come into play - but it's very likely that I'm not fully grasping how topologysources work.

    In any case, I have gotten a number of maps to correctly display - It's just that the icons are very similar which makes it less obvious what type of devices/instances we're looking at.

  2. I've been playing around with topology mapping with some success.  However, the icons displayed after the map is built are the standard IP and instance icons.  I've modified my custom ERI propertysources to try to get them to display something other than these two icons, but the icons never seem to change.  According to the LM documentation regarding these icons, I should be able to choose anything from the list: Topology-Vertex-Type-Icons.png

     

    Has anyone successfully been able to get the map to reflect the icons above?  For example, "predef.externalResourceType=AWS EC2" does not appear to do anything.  Nor does "LoadBalanceCollector" or "Load Balance Collector", etc.  I suspect there is a specific value associated with each of the icons and the value is case-sensitive.  However, out of all of the values listed in the image, none have worked for me.

  3. Hi @Nicklas Karlsson and @Jonathan Hill,

    So you guys are running into an issue where the datasource isn't finding any instances.  That's no good!

    The datasource uses remote Powershell commands from the Windows collector to the target server.  I would probably start by using an RDP session to a collector and running the script from an elevated PS session.  Before you run it, you'll need to modify the script to use whatever creds you have set up on that target server.

    For a quick test, just check to see if these two wmi calls return anything from the remote system:

    $hostname = <target Server>
    
    gwmi -Namespace "root\MicrosoftDFS" -ComputerName $hostName -Query 'SELECT * FROM DfsrReplicatedFolderInfo'
    gwmi -Namespace "root\MicrosoftDFS" -ComputerName $hostName -Query 'SELECT * FROM DfsrConnectionConfig WHERE Inbound = "False"'

     

    If these two work, DM me and we can try to figure out what the issue is.  I'm guessing we're running into a character limit and I need to account for that.

     

    Thanks!

  4. Hi @John Biniewski,

    The alert thresholds are largely going to depend on how large your share are.  You can set the alert threshold to something quite low like 10, and wait for an alert, but unfortunately, this is one of those times when you'll need to revisit your thresholds to determine what is normal for your environment(s).  Wish I could be more help here.

  5. We have begun implementing a tagging standard in our cloud accounts to better control discovered resources and route alerts accordingly.  I would like to be able to route alerts by default based on the value of a tag.  I'm aware that I can already set up specific users and then achieve exactly what I'm requesting, but I would much prefer to have a blanket rule that uses the tag's value as the recipient email address(es) directly.  Some examples below:

    system.aws.tag.MonitorAlertEmail=ThisIsAnExample@PleaseOfferThisFeature.com

    system.aws.tag.SendAlertsHere=AnotherExcellentExample@WeWillPayExtraForThisFunctionality.gov AnotherEmailAddress@OkayNotReally.net

    See the screenshots below for a visual example of how I'd like to structure this automation.

    image.thumb.png.7241611e20d1efc0309564a728833ff4.png

    image.thumb.png.927a8b73d4b67a44be1cec05e7ffefd6.png

     

  6. I have built a generic StatusPage.IO datasource to allow for monitoring the status of various services we use.  Since so many companies are using StatusPage.io, I figured it's a good idea to have a heads up in the event there is an outage with one of our many service providers.  This has worked well as an early warning system for our service desk guys to know about issues before they start getting calls from end users.  LogicMonitor actually uses StatusPage, but of course there are many, many others. Attached is a screenshot of the Box.com StatusPage data that we've collected from https://status.box.com.

    image.thumb.png.bd8b7c71354b60a5fd6290f7eac7299b.png

    This datasource should be universal to any statuspage.io site.  So far it has worked against every site I have tested it against.

    NYJG6J

    • Upvote 1
  7. I have been using a custom datasource to collect the metrics for each resource and method (excluding OPTIONS) behind a API Gateway stage.  It has been extremely useful in our production environments.  I would share the datasource via the Exchange, but the discovery method I'm using will not be universal, so I think it would be best if that discovery were to work natively.  If possible, could we please have a discovery method for AWS API Gateway Resources by Stage?  

    *Something to note - This has the potential to discover quite a few resources and thus, create a substantial number of cloudwatch calls which might hit customer billing.  For this reason, I added a custom property ##APIGW.stages## so that I could plug in the specific stages I wish to monitor instead of having each one automatically discovered.  The Applies To looks like this:

    system.cloud.category == "AWS/APIGateway" && apigw.stages

    Autodiscovery is currently written in PowerShell (hence why not everyone can take advantage of it)

    $apigwID = '##system.aws.resourceid##';
    $region = '##system.aws.region##'
    $stages = '##APIGW.Stages##';

    $resources = Get-AGResourceList -RestApiId $apigwID -region $region

    $stages.split(' ') | %{
        $stage = $_
        $resources | %{
            if($_.ResourceMethods) {
                $path = $_.Path
                $_.ResourceMethods.Keys | where{$_ -notmatch 'OPTIONS'} | %{
                    $wildvalue = "Stage=$stage>Resource=$Path>Method=$_"
                    Write-Host "$wildvalue##${Stage}: $_ $Path######auto.stage=$stage"
                }
            }
        }
    }

     

    APIGW.png

  8. We're experimenting with netflow now and we are also struggling with these very real limitations.  It would be great if we could get a response as to whether or not enhancements to Netflow are going to be prioritized.  Currently we're finding that we have no other choice but to rely on multiple tools to gather this data.

  9. It would be immensely helpful if I could see and test alert routing from the Cluster Alerts page at the device group level similar to the existing Alert Routing button on the Alert Tuning tab.  As we begin to more heavily utilize this functionality, it's critical that we can verify that alerts are routed correctly wherever we set it up.

    • Upvote 1
  10. Wow @mnagel!  Thanks so much for this!  I'm going to look into running this so I can get that list put together.  I don't have any plans to systematically delete the datasources - I'm just wanting to compile a list so I can review them.  I'll feed the obvious ones into a script as a one-time purge and once I've done that, I can take a closer look at those that should be working, but aren't for whatever reason.

  11. HI @Sarah Terry,

    Thanks for the tip!  I also saw that as a workaround, but unfortunately it wouldn't really help with what I'm attempting to accomplish.  What I'm trying to do is find datasources with applies-to functions like "isWindows()" or "isLinux()" that aren't actually discovering any instances.  It's almost always because the datasource is built to monitor a product or service that we don't use and likely never will.  I'd like our datasource list to only contain datasources that are actually in use and applicable to our systems/services.  Similarly, there are datasources that apply to specific hardware that we don't own.  I'm currently going through manually and removing them so we don't have to scroll past them when browsing our list of datasources.  If and when we ever deploy new hardware/software, I'll go and re-import those (updated) datasources from the LM Repository.

    I hope this makes sense.  Thanks again for your response!

  12. I'm trying to clean up datasources that are in our account that do not have any instances associated with them and likely never will.  Currently I have to do this manually by inspecting each datasource in the GUI.  It would be really great if the datasource instance count was returned as a property.  Even better would be if the instances and associated device ID's were returned as well, but for now I'd be happy with just the device/instance counts.

  13. Have you tried enabling rate-limiting?  At least until it all gets sorted out - I'd consider setting up a duplicate escalation chain and an alert rule specifically for some of your syslog eventsources and enable rate-limiting on them.  Before my LogicMonitor days, I had this happen a few times and it sucks dealing with a crippled Exchange server while also trying to work out a firewall issue.  Syslog is unpredictable sometimes.

  14. We have several clustered devices where metrics are gathered on each node.  However, the instances across each node are identical.  When attempting to graph this data, this means that I would need to add a new datapoint for each instance and use a glob pattern to select the devices from which to pull those instances.  This can mean that a lot of time goes into creating these graphs if there are several instances to monitor. Examples:

    • Three Solr nodes - each servicing search requests for the same 10 collections.  In order to see the total number of GET requests for each of those collections, I would need to create a graph that has 10 individual datapoints.  Instead, I would like to add one datapoint and have the graph intelligently aggregate all instances that have the same name, regardless of the node.
    • Several device groups exist under a parent group.  If I want to see the average CPU utilization across each of these groups on a single graph, I would need to add a separate datapoint for each group.

    A potential solution could be to allow the integration of regex instead of glob patterns to allow for capture groups.  Otherwise a simple checkbox for "aggregate instances by device group" and "aggregate instances by instance names" when selecting aggregated graph types would be extremely useful and time-saving.

  15. DO NOT comment out the applies to field on the datasource!  This will remove all historical data - which I can only imagine most of us want to keep.  You can disable the datasource by creating a device group (if you don't have one already) and populating it with all of the ESX hosts. Then, at the group level, select the alert tuning tab and uncheck the box next to the datasource.  This disables polling and alerting, but allows you to keep historical data.

  16. I have a device property that I would like to update every 15 minutes or so.  This is because I have groups with auto-include rules that are looking for that property.  I need to have the device move in and out of the groups on the fly.  It would be great if we could set individual custom propertysources to update on a more frequent basis.  Currently I'm achieving this using the LogicMonitor Rest API which I have baked right into a datasource as a workaround - but I think this solution is messy.

    Thanks!

  17. Hey Joe,

    I'm not sure that this exactly meets your needs, but I think it's a good start.  Basically, you can call hostProps.toProperties() method which spits out an array that you can now dig through and filter using regex.  Something like this:

    def allProps = hostProps.toProperties()
    allProps.each{
        if(it ==~ /.*\.databases=/){
            println it
        }
    }

    Let me know if this doesn't address what you're trying to accomplish.

     

    • Upvote 2
  18. @George Bica - Last one.  This is an eventsource, so no pretty graphs here.  It makes an API call to pull all of the events in the solr log and then alerts on Error and Severe events only.  It doesn't apply to servers by default because it can be quite noisy if you don't have it tuned properly.  Once you're sure it's not going to blow up after you've applied it, go ahead and change the Applies To rules and you should be good to go.  Hope these help!

     

  19. W9PN3Y

    I thought I had already posted this one, but regardless - here it is.  This does not apply to any servers by default as it can be extremely noisy if you don't have it tuned.  This makes an API call to solr to pull error and severe logs and then formats them so that LogicMonitor can understand them.  Before applying this, it's not a bad idea to review those logs manually to make sure something isn't repeatedly triggering (as is common with SOLR).  Still - it's helped us detect and diagnose a range of issues that would have otherwise been difficult to see.

  20. @Michael Rodrigues - Thanks!  I've got more on the way.

    @George Bica - Here's another one.  It might not work for your version of SOLR, but it has the exact same requirements otherwise.  Add solr.port and solr.append to your solr servers and this datasource will provide lots of useful JVM metrics without the need for enabling JMX.

    I'm still going through my datasources, so I might still have one or two to post.  I'll keep replying to this thread with whatever I've got.