Sam Gendler

  • Content Count

  • Joined

  • Last visited

Community Reputation

0 Neutral

About Sam Gendler

  • Rank

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. SNMP based monitoring of an EC2 instance that is a member of an ECS cluster picks up filesystems at /var/lib/docker/containers/<container_id> which go away once the docker container exits. This causes logicmonitor to alert on the 'StorageNotAccessible' data point of the 'Filesystem Capacity' data source. Does anyone have a good fix for this that doesn't just turn off monitoring of those filesystems? And do I care, anyway? They seem to be some kind of overlay filesystem that is just content on a normal filesystem that is also being monitored, so I can probably just exempt /var/lib/docker/containers/* from being monitored at all, though I'd love to have it do that automatically just for hosts that are AWS/EC2 instances. I'll eventually figure out how to do that via the documentation, but it may be faster to ask here. Incidentally, there seem to be 2 kinds of mounts with that path pattern, one shared memory segment and the other overlay. It looks like logicmonitor ignores shm filesystems. Perhaps it could ignore overlay by default, too?
  2. SNMP credentials do exist - at the top level so there's no way to avoid them - and the collector does correctly populate some snmp data sources, so I know the access is working. I can do arbitrary snmpwalk commends from the collector host to the instances in question, so I know the network ACLs and security groups are correct (also, I spent far longer than should have been necessary getting them to work in the first place, so I KNOW they are correct). But it does not seem to be picking up auto properties correctly. I have no system.sysinfo property and I'm unsure how that gets populated. It could be that everything is failing due to a lack of sysinfo value - the isLinux() function seems likely to be dependent on it, since that's the only property that contains the word linux on the instances that were manually added and which seem to function correctly. When I manually add an instance via the wizard, it fails to recognize the type of the server automatically, instead asking me to select the OS. I do that and it then tries to scan it and everything works correctly (actually, by default, it seems to fail to find the account properties specifying the SNMP credentials, so it first tries a scan, fails, and asks me to provide credentials. I give it the same credentials I have in the account properties and then everything works correctly - I get a system.sysinfo property and everything else looks about right). If I add it via the expert page instead of the wizard, it picks up the account properties for snmp, but it doesn't seem to get the sysinfo stuff - I guess because it doesn't know it is a linux host. So how do I manually tell it it is a linux host? What does the body of that function look like? I still don't even know for sure what properties/functionality it is lacking since I don't know what logicmonitor is looking for.
  3. Are ssh keys supported yet? Better yet would be the ability to fetch an ssh key, by name, from a key management service such as those offered by AWS and others. I don't want to have to encode passwords into the logicmonitor interface, and I certainly don't want to have to leave a private key with access to all my production resources on my collector, but my collector runs on an ec2 instance that has an instance role assigned to it, so it would be trivial to give that role a policy which allows it to fetch a key by name. If the basic ssh functionality included in scripts were to include the ability to look up an ssh key, or could receive the output of a different data source (I could write a script to fetch the key easily enough), that would be ideal, since modifying every data source to use a different script would be a drag. This would give me the ability to use key-based authentication for ssh while still allowing me to do key management without having to update filesystems and logicmonitor properties. I could simply rotate in a fresh key in the key management service and everything would continue to function normally.
  4. I recently enabled collector-based monitoring for resources discovered via my AWS Cloud Account. My intention was to enable snmp based monitoring of instances, since cloudwatch doesn't give visibility into a lot of metrics that are otherwise useful - disk utilization, most notably. However, while the ec2 instances did pick up basic collector monitoring - ping, Host Status, etc - none of the extended data sources that are automatically applied if I add the instance to the UI manually were enabled. Looking at data source definitions, many snmp data sources use functions like Servers() and isLinux() in their applies_to expressions, in addition to looking for various snmp-based strings in system.categories. The definition of Servers() is visible in the user appliesTo functions, and it can be easily modified to include ' == "AWS/ECS"' as one of the clauses. However, the definition of isLinux() isn't visible and it doesn't appear that the linux-ness of the ec2 instance is visible via the standard properties, even though manually adding the instance via the logicmonitor UI would cause it to be correctly identified as both an SNMP-enabled resource and as a resource running linux. Instead, I had to build groups based on and other variables in order to apply the following values to system.categories: snmpTCPUDP,Netsnmp,snmpHR,snmpUptime,snmp. The only categories it picked up, by default, were AWS/EC2 and collectorDataSources. To some extent, it feels like a bug that an instance added via the UI gets treated differently than an instance added automatically via cloud account detection - it seems that there is no value for system.sysinfo when the instance is automatically added via cloud account detection. But in the absence of a fix for that bug, is there some other way to make isLinux() return true for a linux-based EC2 instance? What about other built-in functions that don't seem to have modifiable, or even visible, definitions?
  5. Thanks. I hadn't seen that, but that's pretty much what I ended up implementing - except that I want info about collector id and collector description to be available to the rest of my provisioning system, so I do all the parts other than downloading and running the installer within terraform templates/modules via provisioners and data sources which run local python scripts, and then the id is passed to the ec2 instance, which then runs a script to download the installer for that collector id and runs it. I'll eventually post a blog entry about it, but I've got too much on my plate to document it in full just yet.
  6. I'm looking for cloud-init configuration which will get a collector installer URL with my API key, download the installer, and run it. I'll happily take any non cloud-init solution and convert it. Here's a cut and past of the question I posted at stack overflow. The impetus behind this is using terraform to manage AWS infrastructure and not wanting to have to roll a custom AMI just to get a collector up and running. Here's my SO question, which explains what I'm looking for in a bit more detail ( ) [quoted question below] Logicmonitor collector installers must be fetched with a valid token and the installer expires after a period of time. So there's no simple way to pull a collector installer binary onto a new ec2 instance and then run it. Instead, it is necessary to use a script which uses Logicmonitor's REST API to generate a new collector installer URL, then fetch that and run it immediately. I'm guessing that, since there is a logicmonitor provider in terraform, at least one other person has gone through this process and already has a working script. Frankly, Logicmonitor's docs provide the bulk of it, so it isn't that hard to generate for myself, but if someone out there already has a nice template or module which adds the necessary pieces to an instance's userdata, you'll save me a couple of hours of copypasta and trial and error work. Something that uses cloud-init would be particularly useful, but I can convert. Basic example: runcmd: - export COLLECTOR_URL = ` ${api_key} ${other_var} ${yet_another}` - curl -o LogicmonitorCollector.bin $COLLECTOR_URL - chmod +x LogicmonitorCollector.bin - ./LogicmonitorCollector.bin I can pull the script for out of github or an S3 bucket in an earlier statement easily enough. This is the kind of thing I'd love to eventually build into a resource in the logicmonitor provider in terraform. I am new to terraform and don't know what is involved in adding a new resource, but this seems like a common need for anyone planning to bring up all of their infrastructure via terraform, since I don't want to have to manually install collectors on the instances terraform will be launching in my mgmt VPC for handling things like bastion duty and monitoring collectors - especially if those instances actually get launched by Amazon as part of an auto-scaling group, so a self-configuring launch configuration for new instances is very desirable, since there's no telling how long it will take before the user data is executed after the launch configuration is updated with new userdata script. Note - Logicmonitor claims in the docs that the installer binary itself will expire after 2 hours. I've been taking them at their word on that. If what they actually mean it that the token in the installer URL will expire after 2 hours, I could just download the installer once and stick it in an AMI or S3 bucket so that I need never download it again - just change the config to update collector id. But I'm guessing there are reasons their docs don't provide instructions for just pulling the binary once. An alternative solution would be an AMI based on Amazon Linux with collector already installed, if anyone has such a thing publicly accessible.