Mike Aracic

LogicMonitor Staff
  • Posts

    38
  • Joined

  • Last visited

Reputation

2 Neutral

About Mike Aracic

  • Rank
    Observer
    Observer

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. We covered the basics of alerts and alert routing https://docs.google.com/forms/d/e/1FAIpQLScPWW5DzNxe2W5ieh6PjamLYWcP5AhDbUl1E3U7ZKryEgwEoA/viewform?usp=pp_url&entry.2118543627=2021-07-14&entry.2116906043=US/Americas
  2. Today's questions were about embedding dashboards or widgets and feature requests Participant survey here: https://docs.google.com/forms/d/e/1FAIpQLScPWW5DzNxe2W5ieh6PjamLYWcP5AhDbUl1E3U7ZKryEgwEoA/viewform?usp=pp_url&entry.2118543627=2021-06-30&entry.2116906043=EMEA
  3. 6:30 Q: Can an alert be un-acked using the API or an integration? A: No, an alert cannot be un-acked. In the logicmontior platform an ack is interpreted as taking ownership of an alert. It may be possible to use an SDT instead of an Ack, but I do not believe this is possible using a bi-directional integration. 12:30 Q: I would like to create a widget that includes storage metrics but the percentage does not provide enough context. A: Let me explain where the data comes from and demonstrate the table widget and some of its capabilites 28:45 Q: We have a topology issue where certain VMware entities are not showing up properly in topology maps A: An explanation of how Topology Works (ERIs, TopologySources, etc) and a description of some basic troubleshooting techniques 47:15 Q: How might I audit the ability of a role to access a particular resource A: Either the role report through a manual process, or use of the api to match users roles, groups, and devices. API docs for the relevant objects here: https://www.logicmonitor.com/swagger-ui-master/dist/ Please fill out our webinar survey https://docs.google.com/forms/d/e/1FAIpQLScPWW5DzNxe2W5ieh6PjamLYWcP5AhDbUl1E3U7ZKryEgwEoA/viewform?usp=pp_url&entry.2118543627=2021-05-26&entry.2116906043=US/Americas
  4. Participant Survey: https://docs.google.com/forms/d/e/1FAIpQLScPWW5DzNxe2W5ieh6PjamLYWcP5AhDbUl1E3U7ZKryEgwEoA/viewform?usp=pp_url&entry.2118543627=2021-05-12&entry.2116906043=US/Americas Webinar Q & A 0:30 Question: How do I handle Syslog and Traps in conjunction with Auto Balanced Collector Groups? A: We generally try to make it so that traps are unnecessary by polling for most of the data you'd need. If traps or syslog are necessary, it's likely that statically assigned collectors are necessary. It's possible if you're using LM Logs to capture these flows on dedicated collectors. 22:00 Question: Please demonstrate Safe LogicModule Merge in the exchange Answer: We showed a few things; note that the LM exchange is about to be changed 32:15 Question: Is it possible to add scripted text to a dashboard Answer: There are several potential ways to make dashboard text dynamic. Text widgets are actually in HTML and can be worked with programmatically. It's also possible to use the HTML widgets to embed content from other web services. 39:30 Question: How do I write efficient groovy scripts for data collection? Answer: We give a quick rundown of some good coding practices, including using BatchScript for collection.
  5. There was a request to demonstrate a LogicModule merge, I didn't have an example ready, but I believe Stuart covered it in this webinar
  6. 3:30 Auto Balanced Collector Group: planning for large distributed environment 10:00 Collector analytics 15:00 Monitoring feeds using scripted EventSources 21:00 Auto Balanced Collector Group Continued 40:30 Website checks 51:00 SLA feature request and feedback 58:30 AWS filters using tags Please fill out a feedback survey
  7. This is from a lab collector that is monitoring a handful of devices, so what you're seeing might be totally normal. I think those are single threads having to do with the collector's internal task management (but I am not certain of this, and will be interested to hear what the support team has to say) This datasource has the counters for the collection tasks. Probably the most important one of these having to do with thread availability is the unavailable thread counter (visible on the instances)
  8. Thanks to all who joined us today. Please help us by filling out this brief survey: Survey A recording of the webinar:
  9. Thanks for attending the US office Hours Please fill out our feedback survey: here Q: How to monitor memory used by a specific windows process and alert on threshold. [beginning] A: I demonstrate how to add a windows process instance. After some great contributions by our participants, they let me know that there is a datapoint that returns memory usage. Unfortunately, there are no datapoints in that datasource to serve as a good denominator, so to add a threshold would be an alert tuning exercise. What I didn't point out at the time is that this would be a great candidate for using a dynamic threshold. A dynamic threshold is perfectly suited to this because it's a single, important metric with no obvious normal threshold. A dynamic threshold would alert you to rapid or unusual changes, which is what I think the end user would likely be looking for. https://docs.microsoft.com/en-us/previous-versions//aa394323(v=vs.85)?redirectedfrom=MSDN Q: With the new log monitoring, does it except windows logs natively? [~10:00] A: LM logs is a new product under intense development, and I do not see anything specific about ingesting windows logs natively. It is extensible, so this is something I would recommend asking the support and product teams about. Q: I have an API that is giving me delta in data over time rather than the full results when requested, Is there a method for handling this via a data source? [27:30] A: LogicMonitor's derive and counter datapoint types do the opposite operation, but I do not believe that there is the capability to track these things in a cumulative manner this way. I advised the user to contact support and ask them about creating LogicModules for the devices in question. Q: We have an interesting challenge where we have to retire some old Linux VMs running Collectors, and vacate the VLAN currently being used. We have new VMs with Collectors installed and running but on a new/different VLAN. We suspect that the Resources and Websites being monitored by the legacy Collectors might not have all their ACLs and firewall rules open to the new Collector VLAN. The question is: can you suggest a method of testing reachability of monitored Resources & Websites from a new Collector, while production monitoring is still continuing from the old Collectors? We don't want to cut over to the new Collectors unless / until we know the new ones can reach all the same targets as the old ones! [14:30] A: If you're unwilling to simply move production servers over to a new collector for observation (understandable), my advice is select a small, representative subsample of the devices and websites, and temporarily double-monitor them by adding them in as new devices on the new collector(s) with distinct (and unique) display names. Watch them and make any changes to the environment that you need until you're satisfied things will work, and then, once proven, delete the duplicates and move the devices to the new collectors. Q: (follow up) [are there any example API scripts that automate the duplication of devices]? A: No, I would do this by hand, using expert mode, on a reasonably-sized, but representative population. You'll be paying close attention to them in any case. Q: I read in the API docs that the Basic Auth is deprecated and going to be retired soon (in favor of API Tokens only). Any update on the time-frame for when Basic Auth will be removed for good? [21:00] A: I do not know what event on the roadmap will cause Basic Auth to be unusable. If you have use cases (in this case, a third party alert management system) which demand Basic Auth, please let our product team know about them Q: If our primary collector is in US N/E and we setup a secondary collector in HK or China; what sort of latency will we see in the LM portal? [24:30] A: A couple of things: there are definitely exceptions, but we recommend that collectors be as close to monitored resources as possible. We also recommend generally that failover collectors be in the same location as the primary. This model can break down when there are large numbers of devices spread out geographically in groups not big enough to warrant their own collectors, so feel free to ask support when planning collector deployment. The training team also offers some content about this in some of our courses. Second: there are several factors which drive latency, including the latency between the collector and LogicMonitor and the collector and monitored resources, but since the polling cycle is, at shortest, one minute, practically speaking, even worldwide latency should not affect how quickly the data appears by much. Q: I have been teaching myself Groovy for doing custom DataSource / PropertySource / etc. and it's going well. 🙂 I've been reading through DataSources in Core which import libraries which are incredibly useful, but don't seem to be listed (or barely mentioned at all) on the LM website. Example: JSoup which only comes up in community postings and a LM blog post, even though it's bundled with the Collectors. Is there a comprehensive list of all the Groovy libraries that are bundled with the Collector, and optimally which libraries come with which Collector versions? [30:00] A: In the video I go into this at some length. This is a fairly advanced topic in a few ways, but what it boils down to is that the configuration files for the collector, specifically the wrapper, include entries for additions to the Java ClassPath. All the java classes and packages available in the .jar files referenced can theoretically be imported in to the groovy scripts for use. (Later, during the part about ConfigSources [39:00], I show an example) Unfortunately, the names of the .jar files do not correspond to the names of the useful Java objects, so some sleuthing is necessary. I recommended identifying the jar and its source and then looking into the corresponding javadoc online to see what classes might be available and how to use them. (The person with the question had been using LogicMonitor-provided datasources, which also is a good source of examples. Q: I heard we might have Sandbox access since we are enterprise level. Can you show some stuff about adding and using the Sandbox? [my answer here is better than the one on the video] A: Sandbox accounts, when available, are fully independent LogicMonitor accounts. They are provisioned by our customer success (and technical operations) teams and their features and availability are subject to change over time. Some of our users reported their sandboxes being on an earlier upgrade schedule and some reported some other features, but this is subject to change according to the details of your service agreement. Q: [follow up] Is there a way to clone from prod to sandbox? A: The specifics features available to sandbox environments are subject to change and your user agreement, please contact customer success for questions about this. Q: [About my ConfigSource example ~42:00] [Are wildvalues sanitized to prevent malicious code injection] A: I do not believe that they are, it's the responsibility of the author of the LogicModule to handle that and to make sure that the LogicModules and Collectors do not run arbitrary code. Exchange LogicModules are given a security check as part of the publishing process. Q: So it will probably help if I see this, what is the best way to identify devices for custom data sources? For example: I have a data source for a specific device, when LM does its discovery how do I get it to identify that device automatically in a way that my data source knows to run. [53:00] A: There are several approaches that can be taken here and it depends a lot on the specifics. The way LogicMonitor does this is very generalized so that it will work in any environment right "out of the box". We use a combination of an auto-properties system, system categorization, and PropertySources to attach metadata to the devices so that they can be classified properly by the LogicModules' applies-to scripts. For custom, customer-written monitoring, this can be emulated, although simpler approaches to matching applies-to scripts can also be used using something as simple as a device group and the automatic output of the applies-to wizard. A fair number of our older LogicModules use a fairly general applies-to and then a more specific Active Discovery to ensure that instances are there Amusing footnote: for some part of this the presenter, due to a few glitches, ended up sharing the wrong screen. Imagine!
  10. Q: My question is sorta newbie, but it seems like every time a data source updates dashboards break, is there an easy way to test for that ahead of time, or know what is changing in an updated data source? [1:11 and later] A: Dashboards are built up of widgets that refer to Resources, Instances, and Datapoints by name. Generally, the Monitoring Engineering group generally updates LogicModules in a way that will not affect dashboards. In the odd case that this happens, you should be able to see what the widgets are referring to and figure out what happens. I would reach out to support if something like this happens. We also noted that there are now generalized dashboards built into the product and a whole set at https://logicmonitor.com/sales/dashboards Q: What network diagram view or creation capability does LogicMonitor have? [8:25] A: Topology https://www.logicmonitor.com/support/forecasting/topology-mapping/topology-mapping-overview Q: Does the diagram only show the devices being monitored that I'm paying for, or will it show all devices on the network including workstations not being monitored? A: Only devices under monitoring will appear in topology maps. Q: Will you please explain dynamic thresholds and the best practices for using them?[11:15 and later] "When we enabled the critical level for VM Disk fullness, it was alerting to large changes even if the disk fullness is under 50%. this triggered critical alerts (text message to the client) for a non full disk.. not ideal..",can you provide more detail on the bands / polls settings?,"Agreed... we went back and removed critial from the dynamic level but left warning and error. this alerts us to a large change, but doesnt allow it to go critical.", Q: Are there limitations? Such as enabling it on too many devices? A: Yes, there are limits. You can see them in the usage section in the settings page in the account page. Q:We're looking into options for monitoring Windows outstanding patches, and found the community / staff contribution (LM Locator N7R7YZ) which looks promising. However, with very limited staff resources for maintaining custom LogicModules we try to stick with Core modules as much as possible. Is there anything in the pipeline for a Core module for Windows patching like this? ("No" and "I don't know" are totally valid answers for this one!!) A: Check the Community for things like this. Q: How and where can I make use of relations between objects?, A: Topological links can be used to navigate through maps and are also used by LogicMonitor's Root Cause Analysis. Q: Does the diagram piece only show the devices being monitored that I'm paying for, or will it show all devices on the network including workstations not being monitored? A: It won’t typically display those objects because LM has to know about them in some way. Most “undiscovered vertices” that can be displayed are things like switches and other network gear. Q: Is there an option to save and roll back device configurations? Switches, routers, etc? A: Configs collected through LM Config are stored in LM. You have the option to download the config and restore manually. I’ve heard of some customers who use the API to fetch the config automatically and feed it back to something like Ansible which can push the config back out. You typically will want to include some human element there anyway, so we do not currently automate this. Q: how to generate a device availability SLA report/dashboard? i.e. if a device (or group of devices) became unreachable from LM collector for certain duration breaching their 99% availability SLA, is it possible to report on that metric? [29:30] A: We offer two SLA reports in the reports page, one which uses alert status to determine availability, and the other uses the satisfaction of given metrics. There is also an SLA widget for dashboards. Q: In the doc about Collector config editing (https://www.logicmonitor.com/support/collectors/collector-configurations/editing-the-collector-config-files) it states: "...it is highly recommended that you use the interface (and not manual modification of the local Collector configuration files) to make any required updates as there are safeguards in place in the UI to prevent errors. Manually modifying the local Collector configuration files should be done at your own risk." There are Patch and Update methods in the API/SDK for changing Collector configs. Are those API/SDK methods going to give the same error-checking as the UI, or is the API/SDK more akin to a manual filesystem file change without error checking? A: We do not think that the API has the same safeguards as the UI. If you intend to hand-modify (or machine-modify) you collector config files (other than changing their size), we recommend you work with our support team in order to make sure what you're doing is sound. Q: I'd like to be able to compare different config versions side by side when something changes so I can decide whether to roll back. [39:25] A: Yes, there is a diff view when looking at the actual config. There's a switch and then you select the two versions you'd like. Q If we wanted to split a company out of our LM instance into a child LM instance under our enterprise plan do you have an easy process for this? [41:50] A: Speak with your account manager about how this works. Q: Please talk a little bit about Services, especially adding websites in. [50:00] A: Website monitors cannot currently be added to services, although this is a good idea. Please bring it up with out product team using the feedback link in the support area. Q: ,We have ran into issues with high CPU coming from collectors with large clients. A lot of people internally in my company contribute this to being a java based application. Are there any design plans on moving away from java for collectors? A: No plans that we are aware of. If resource usage is an issue it may be able to be tuned with collector configuration or by managing monitoring workload. Q: Is there a place to show the data sources that have dynamic threshold enabled?[56:00] A: In the Alert Thresholds report, with only custom thresholds selected.
  11. US attendees: Thanks for a great session! The session feedback survey can be accessed here. The recording is here: I'll be posting the question list and times as soon as I process them.
  12. Q&A Transcript: Q: Is there a LogicMonitor app for iOS/Android to clear/acknowledge alerts? A: Yes, here is the link for the app in the Google Play store: https://play.google.com/store/apps/details?id=com.logicmonitor.mb&hl=en_US Q: Any plans for a set of Groovy Scripting Training Sessions, specifically catered for LM usage rather than just generic stuff? A: We have it on our roadmap for both a Live Training Webinar as well as an e-learning option through the “Training” link in your portal. Q: Any reasons we might not want to keep collector monitoring (via resource tab) to not use 127.0.0.1? A: Technically it shouldn’t matter. There are nuances where it makes a difference, but if you wanted to change it, you could without any real impact. Q: What if a website is reachable from few site monitors but fails on the other? A: You can configure the alert criteria. You can have it alert when all, half, more than one, or any one of the locations indicates a failure. You do that in the Alert Triggering section. Q: Is it possible to push application level logging into logicmonitor from an application? A: Yes, theoretically, that’s certainly possible. We’d need more details to provide better guidance, but it would involve configuring the logs to be sent to the Collector, and also configuring an EventSource in LogicMonitor to tell the Collector to accept and alert on the logs. I’d suggest posting to our community forum with the details of what you’d like to do and we can provide more guidance.
  13. The alert trigger intervals work for no data alerts, too. For a few reasons, it's not good practice to have a no data alert set on a datapoint that also has a value-based threshold. (For one, the message should definitely be different) Is that what you're talking about?
  14. I had an interesting case today: A user reported that a collector had completely stopped collecting data for all its hosts. He\'d verified that the collector\'s service account was good and had the necessary permissions. I checked one of his collection tasks, and it reported WMITask failed. (msg=winproxy return status=400 errmsg=IWbemLocator::ConnectServer:Error: 800706ba:The RPC server is unavailable.I ran a WMI query against the host manually and got the same result. (that the RPC server was unavailable) As mentioned in our helpful article on WMI Troubleshooting, generally when the error is RPC server unavailable, the issue is one of basic connectivity, usually caused by a network or firewall issue blocking the WMI traffic from reaching its destination. In this case however, the user had the service account set up correctly, but the hosts had wmi.user and wmi.pass properties overriding those settings. This worked until the password for the user specified in those parameters expired, at which point all the hosts so affected ceased responding to WMI queries from the collector. Once the host parameters were removed, collection resumed immediately.