Stuart Weenig

  • Posts

  • Joined

  • Last visited

Everything posted by Stuart Weenig

  1. Of course, you can always manually create instance groups and sort the items into them manually.
  2. Cloning is one option, but a cleaner option may be to group the instances under that DataSource. It looks like there are several properties that should get discovered by default for every VM instance: 'auto.config.alternate_guest_os_Name': vmConfig?.alternateGuestName, 'auto.config.annotation' : vmConfig?.annotation, 'auto.config.firmware' : vmConfig?.firmware, 'auto.config.guest_os_full_Name' : vmConfig?.guestFullName, 'auto.config.guest_os_id' : vmConfig?.guestId, 'auto.config.managed_by' : vmConfig?.managedBy?.type ?: "false", 'auto.config.modified' : vmConfig?.modified?.getTime(), 'auto.config.template' : vmConfig?.template, 'auto.guest.guest_os_family' : vmGuest?.guestFamily, 'auto.guest.guest_os_full_name' : vmGuest?.guestFullName, 'auto.guest.guest_os_id' : vmGuest?.guestId, 'auto.guest.hostname' : vmGuest?.hostName, 'auto.guest.tools_version' : vmGuest?.toolsVersion, 'auto.guest.tools_version_status' : vmGuest?.toolsVersionStatus2, 'auto.resource_pool' : vm?.resourcePool?.name, 'auto.resource_pool_full_path' : resource_pool_array.reverse().join(' -> '), 'auto.snapshot_count' : vm?.layoutEx?.snapshot?.size(), 'auto.cluster' : esxhost?.parent?.name, 'auto.cluster_full_path' : cluster_path_array.reverse().join(' -> '), '' : esxhost?.name You could group by any of these by simply setting the "Group method" to "Instance Level Property" and then choosing which property to group by. Alternatively, you could choose "Regular Expression" as the Group method and define each group with its own regular expression: For example, if your dev VMs started with "D" and prod VMs started with "P", you would do this: Development="D.*" Production="P.*"
  3. I'm not too familiar with it, but i would imagine it's the version number of the API that's available given your current version of Solidfire.
  4. As far as LM is concerned, the identity of the server is tied to the IP address that the Collector uses to monitor it. You can change a lot of things about the device, but as long as the device IP address stays the same, LM will consider it the same device. Discovery will eventually run on the device after you make the change, potentially causing some instances/DataSources to disappear since they no longer apply, and new ones may show up. Depends on the change you make to the device. There are only two things to consider here: reachability and authentication. Unless you're changing the IP address or the network in some way, reachability is unaffected. Authentication is the only thing to keep in mind here. Your Collector defaults to using integrated authentication, meaning that the credentials used to run the Collector service are used to communicate with remote devices. In this case, if some target servers are moved to another domain, you'll need to make sure that the credentials used to run the Collector service still have the requisite permission on those systems in the new domain. This can be accomplished, like @Mike Moniz mentioned, using a trust between the two domains. The other way authentication can happen is if you specify the Windows credentials as properties on the devices in LM. If you do this, the Collector uses those credentials instead of the integrated credentials for access to remote systems. Look at these two support documents: and
  5. Yeah, i've run into this with our fork of the OTel Collector, the config gets overwritten when the installer runs. If a config is provided through a volume mount, i think it errors out. Definitely worth making it an option to pop in your own config files.
  6. Like have another script configured in the datasource that runs whenever you click that button? So for configs, "push config". For the ping DS, you could have a "trace route now" option. For CPU, a "show top processes right now" button. Like that?
  7. The better way to do this would be to build an integration between LM and Ansible Tower. For that matter, even a scripted DataSource could call an ansible playbook if you wanted, as long as ansible was installed on the Collector.
  8. Wait, I may be misunderstanding, but: you've built your own container for this? Why not use our image? I've run Collectors in my lab almost exclusively in containers and i run into problems with the EA version sometimes. But our image is what is used to monitor the insides of K8s, so while the EA may have issues, the GA version is usually rock solid.
  9. Yes, there is some work that needs to be done here. Likely the UIv4 transition will completely change things. "pow" is expected because you're only taking the current value and raising it to a power. Similar to other simpler arithmetic functions like a + b. Yes, percent(0) and percent(100) are the easiest, although not most intuitive ways of getting the max and min on the graph as flat lines.
  10. There is consolidation and aggregation, then there is domain aggregation: Consolidation is where data is combined along the x-axis. For example, if your data is obtained every 1 minute, displaying that graph over 3 months would result in roughly 130,000 points along the time axis (x-axis) per instance. The consolidation function of a graph's datapoint determines how that large number of datapoints is simplified to make the graph display-able and readable. Choosing average as the consolidation means that the graphing engine will combine roughly 5 hours worth of data into a single point and graph that one point, representing the entire 5 hour timeframe. Aggregation is where multiple instances are combined together vertically according to a function. You would do this if you wanted to know the total bytes of storage free across all instances, for example. You would enable aggregation and choose "sum" as the aggregation method. If you wanted to know the average CPU across all CPUs included in the graph (according to your filters), you would enable aggregation and choose the "average" aggregation method. Domain aggregation is where you use a statistical formula to describe a single trend line using a single descriptor. You can create a virtual datapoint that uses domain aggregation functions. These functions take into consideration all data currently displayed, so they're calculated at runtime. While the linked page talks about complex datapoints, the percentile functions can only be used in virtual datapoints (not complex datapoints).
  11. Q: There are a lot of alerts configured out of the box. If my company doesn't care about them, is there a good way to turn down alerts to avoid excess noise in the platform? A: One best practice involves creating a dynamic group containing all devices. On that group level, you can tune thresholds globally. You will have to do it datapoint by datapoint. The Alert Frequency report will show you which datapoints will give you the most noise reduction. Ok - basically what i've been doing. Wasn't sure if there was a quicker/best practice way of doing this. Thanks!! Q: Will there ever be an all-in-one, single pane of glass, readout for setting up alert to alert chains to recipient groups? A: That’s a good question. Our product team needs to hear that feedback and your CSM is the best way to get it to them. I agree, we need a simpler way to show it. Q: Can we create a customer user to receive sms and calls as well as only see his environment upon logging in? A: Yes, you can create one or many user accounts to allow your customers to log into LM. The contact information you specify on that user account is what is used by Escalation Chains. You can also limit that user to only see a certain branch of the resource group tree. Excellent. Thank you! Q: Can we set alert dependency? A: Unfortunately, the Zoom cut off before we could get to this question, but it requires clarification. It's a deep topic depending on which way it goes.
  12. Technically, you can create a LogicModule that uses SSH to connect to the device and execute any commands you want. For DataSources, you'd just need to return some number that you can track, maybe a status of the execution of the command. There's nothing out of the box that allows you to just push configs back to a device. Can't speak to roadmap though.
  13. Are you doing SSH or SNMP? If SSH what command output(s) are you inspecting? If SNMP, which OIDs are you looking at?
  14. As for posting the note back to LM, did you put in the API stuff here? $accessId = '' $accessKey = '' $company = '' Without it, your script has no ability to post back to LM.
  15. You'd need to figure out which tasks are running that are attempting to log into the SQL instance. This can be done through the collector debug console looking at the output of !tlist and !adlist (and maybe !aplist). You can filter by host on all these commands so that you can see which tasks are hitting the server. Then note the LogicModule name. Go to the LogicModule and adjust the AppliesTo so that it doesn't evaluate to true. The nuclear option is to add " && false()" to the AppliesTo, which turns off the LogicModule entirely. Alternatively, you can put a property into the expression and require that property exist on any devices you do want to monitor, effectively shutting down the LogicModule.
  16. You can create a virtual datapoint on a graph that does domain aggregation (sum/min/max/average): The summary is over the entire timerange of the graph, so if you want to look at 30 days and see the max for each day, that's not what this does.
  17. "Needs no improvement" means leave it as it is "Not needed" means remove it.
  18. We're coming up on some opportunities to improve the Community and we'd like your feedback. Please consider providing feedback to us by filling out this 6 question survey (5 minutes tops).
  19. We are working on this and should have some news to share next quarter. Hint hint, the communities are finally getting official attention/sponsorship from LM.
  20. Agreed, a DataSource would fix the multiple alerts that results from an EventSource running. Although i think the larger question of alert correlation (multiple alerts being statically or dynamically grouped into incidents) is something you should be requesting from your CSM. Even something like occurrence counts on alerts would be good. The same problem happens with SNMP traps; traps can come in every minute and be about the same thing still in an unwanted state. Each one should just increment a counter on the alert. Counter thresholds should be something we can add to alert rules. Even regular datapoints could benefit from this, counting the number of poll cycles/minutes that a particular metric has been over threshold.
  21. I second this. It would also allow you to trend the metric over time and add dynamic thresholds.
  22. Yep another nice one is that when latency is detected, kick of a tracert to figure out which hop might have the latency issue. These are normally referred to as incident responses or actions. I think there is something like this in the roadmap. Everyone should definitely reach out to their CSM and jump on the bandwagon. LM prioritizes based on how many people are on a particular bandwagon (how many people submit the same feature request). So, the more people ask for it, the sooner it'll happen. I'm still waiting for people to get on my idea of toaster monitoring...
  23. When using "script" as the method, you provide the powershell script and make sure that your script outputs to stdout (through "Write-Content" or any of a couple other methods). So, your PS script will run on the collector, connect to whatever resource where you would normally run it, run the script, gather the results, format it as json, and output it to the "screen" (the stdout pipe). LM watches the stdout stream and any properly formatted json will result in event(s) being created. I'm not a PS guy, but i believe there are native cmdlets that let you take data and convert it to json. Notice that in the json, the "events" object is followed directly by a [. This means that the json can contain a list of events. So, if your script would normally pick up on 4 things that need to be turned into alerts, it might look like this: { "events": [ { "happenedOn": "Thu Jan 21 14:25:00 PST 2016", "message": "This is the message of the event", "severity": "Warn", "source": "Custom" }, { "happenedOn": "Thu Jan 21 14:26:00 PST 2016", "message": "This is the message of the 2nd event", "severity": "Warn", "source": "Custom" }, { "happenedOn": "Thu Jan 21 14:27:00 PST 2016", "message": "This is the message of the 3rd event", "severity": "Warn", "source": "Custom" }, { "happenedOn": "Thu Jan 21 14:28:00 PST 2016", "message": "This is the message of the 4th event", "severity": "Warn", "source": "Custom" } ] }
  24. Welcome to LM! You're on the right track with the EventSource, but you'll want to make it a scripted EventSource. Normally, this would be done in Groovy, but you can do it in any language, provided the Collector can execute it from the command line. More information here and here. Essentially, you will need to modify your script to match the output format that LogicMonitor expects. That part should be pretty trivial. When building the EventSource, you'll need to choose the "Script Event" Type and select "Upload Script file". You'll upload your script to LM, which will cause it to be pushed to any collector executing this task. The "Windows Script" field will point to powershell.exe (use the full path). Your uploaded script will be found in the lib subdirectory of the LogicMonitor program directory. So in the "Parameters" field, you'll point to your script (i.e. "C:\Program Files\logicmonitor\lib\adlockout.ps1". When the task is run, the two are concatenated onto the command line. Command calls powershell who runs the PS1 script. As long as your output syntax is correct, it should generate one alert for each entry in your output.