Leaderboard


Popular Content

Showing content with the highest reputation since 12/07/2018 in all areas

  1. 3 points
    I'm currently working on building out common dashboards for a large number of subgroups of server resources in our environment (we're a multi-tenant managed hosting company - so one for each of our customer's environments). it's a simple dashboard with 5 widgets. I have each of them driving device selection through a token with our internal unique customer id number. The problem will come if I ever need to change anything in the dashboard... while I built a single dashboard initially, then cloned it 50 times... making any incremental change 50 times sounds like I'm guaranteed a day of annoyance in the future. I'd like to see a way to make a template dashboard that can be linked to so you can change a single master dash and have an entire set of child dashboards change. In our example, the server status dashboards for each of our clients.
  2. 3 points
    I would like to be able to create dashboards where I have the option to create sub-dashboards that I can navigate to. In some of our dashboards, the number of charts or widgets can be high and would cause clutter and confusion. Sub Dashboards would allow us to better organize charts. Of course, provide the ability to navigate back to the parent dashboard(s) too. Also consider allowing alert/color indicator in child dashboards to propogate up to the parent dashboard. If I use a dashboard to depict the health of a business application, I can then see which child dashboard(s) contibuted to the alert and then drill down to see the offending resource(s) that this business application depends on.
  3. 3 points
    Rather than have Websites as separate section in the product with a separate hierarchy to manage; how about making all of the Websites stuff part of the Device Tree and rename the Devices section to something that covers both. Then if I want to add a website or service check I simply do it against the "group". This way I wouldn't have to maintain two hierarchies of business services. What do other folks think of this?
  4. 2 points
    Right now, ACK and SDT work, but miss important functionality. Please consider addressing all of these: * ACK should be able to expire (critical issues that should not be lost forever, or to set a maximum expected recovery time period -- not possible with SDT). * ACK should be able to clear if a worse condition occurs (in Nagios, this is a non-sticky ACK) * ACK and SDT notices should be shipped to custom email integrations (this one is a bug as far as I am concerned)
  5. 2 points
    ACK should be removable if determined it was checked incorrectly by a user.
  6. 2 points
    In my opinion the alerts New UI colours are terrible, please allow us to adjust them or create few other themes (dark theme or old grey theme worked great) to choose from.
  7. 2 points
    Also need to preserve the subject and body of customized notifications for data source's datapoints. Though I would much prefer to be able to define customis notifications templates separately and reference the template in the data point config. This would minimize the maintenace effort for customised notifications.
  8. 1 point
  9. 1 point
    Assuming you leverage and consume custom alert messaging, you can define the KB article at the datasource template level. Taking your CPU utilization example, go to your CPU datasource LogicModule, and add the URL to the custom alert messaging for the desired datapoint triggering alerts. If the KB is different for different subset of resources, then the alert messaging should be updated to reference a custom property that would be assigned to or inherited by the resource. Example ##CPU_KB_URL##, then you would assign/inherit the cpu_kb_url property to your different subsets of resources. This does mean you will have to maintain these properties in LM.
  10. 1 point
    Hi Can we please get the 'Threshold History' and a 'Last Modified' date as optional columns in the Alert Thresholds report. This would allow a huge time saving in reviewing modified thresholds. Currently we have to run the report which results in hundreds of entries, each of which then requires someone to go to the corresponding device, locate the alert, go into edit the thresholds in order to view the history and time the change was made. This current method is not practical now when we've just started using LogicMonitor let alone when we add hundreds more devices.
  11. 1 point
    Have you looked at the data APIs? I haven't used them myself but seems to fit the request. https://www.logicmonitor.com/support/rest-api-developers-guide/v1/data/get-graph-data/#Get-widget-data https://www.logicmonitor.com/swagger-ui-master/dist/#/Data/
  12. 1 point
    A dark theme would be my preference. Much easier on the eyes, especially for night shifts.
  13. 1 point
    One day we might get a dark theme... 😀
  14. 1 point
    If your are not aware you should be able to edit the soap xml values to fill in whatever fields you need. I don't use Autotask myself but it looks very similar to other Integrations. When you setup an integration, you will fill some values on the top half and then click on a Generate button. This will then auto-populate the HTTP Delivery section based on your values. You can edit the various parts of the HTTP Delivery section and modify the default Soap XML request to add in any other field you want to add. You can use various LM Tokens or hard-coded values.
  15. 1 point
    We continue to do battle with LM when alerts trigger due to dependent resource outages. I know the topology mapping team is working on alert suppression, but I am not convinced that will solve all problems regardless of how well they succeed. We really need a way to setup dependencies within logic modules and it should not need dozens of lines of API code each time (most of which should be made available as a library function IMO). One fresh specific example -- site with multiple firewalls in a VPN mesh running BGP. One firewall goes down, then all other firewalls report BGP is down. We care about BGP down, so we have alerts trigger escalation chains. It should be possible to define a dependency in the datapoint that suppresses the alert if the remote peer IP is in a down state. There is no way to express this in LM right now and that leads to many alerts in a batches, and that leads to numb customers who ignore ALL alerts.
  16. 1 point
    Allow devices to be dependent on one another. If a router goes down, the switch behind it will most likely go down or have an error as well.
  17. 1 point
    Hey Cole, I work with David, We have been working on tackling the per-customer alert rules as well. If you're interested, we should talk via DM and exchange some ideas.
  18. 1 point
    I'd like a line graph to show alerts over time. In order of priority I would want to easily see specified groups of devices, then by device, and then by instance. This would greatly assist in identifying trends. This post hints at a cumbersome workaround, but the ability to see number of alerts over time is a basic necessity and should be easy to accomplish. https://communities.logicmonitor.com/topic/732-number-of-alerts-on-dashboard/ Ideally this would just be an eventsource or a datasource which could be easily applied to any group whether it be website, resource or device.
  19. 1 point
    Sure, then I would remove the threshold on daystoexpire and let that be for information/graphing use. Then create a complex DataPoint with the expression of "daystoexpire" which has the valid range and thresholds for alerting.
  20. 1 point
    I am not sure exactly how to describe this other than by example. We created an API-based method a while back to control alerting on interfaces based on the interface description. This arose because LM discovered interfaces that would come and go (e.g., laptop ports), and then would alarm about the port being down. With our change, those ports are labeled with a string that we examine to enable or disable alerting. The fly in the ointment is that if an up and monitored port went down due to some change, our clients think they should be able to change the description to influence behavior. Which they should. Unfortunately, because LM will not update the instance description due to the AD filter, the down condition is stuck until either the description is manually changed in LM or until the instance is manually removed in LM. Manual either way, very annoying. My proposal is that there should be a way to update the instance description even if the AD filter triggers. Or a second AD filter for updates to existing instances. I am sure there are gotchas here and perhaps a better way exists. I considered using a propertysource, but I don't think that applies here. The only other option is a fake DS using the API to refresh the descriptions, but then you have to replicate the behavior of many different datasources for interfaces.
  21. 1 point
    Please make the SQL Statement field for JDBC data sources into a text area field so that editing a long query statement is easier. It should be a sizable field like the script fields.
  22. 1 point
    Try looking at https://www.logicmonitor.com/support/datasources/active-discovery/datasource-discovery-filters. If I understand the issue correctly, too many DataSource Instances and you want to ignore some, you can fix that without any REST API work and making changes within the system instead.
  23. 1 point
    Calling the REST API with the below should get you a list of all the devices assigned to that collector: /device/devices?size=1000&fields=id,name&filter=preferredCollectorId:COLLECTOR_ID_HERE Replace COLLECTOR_ID_HERE with the collector ID number which you can get from the Collector Settings page and/or via the API. If there are more than 1000 devices on a collector then you need to call this multiple times with an index as the API maxes at 1000 devices per call. I believe there are examples in the API docs.
  24. 1 point
    If you don't want to change the event timing itself, you can add a blank line to the Escalation chain... it will use the escalation interval on that blank step which will add time. We use this for Services restarting that take a long time. We need to know that they've restarted, but also need to know if they don't finish restarting. So we have an escalation chain just for the service alerts that alert our team, then waits 20 more minutes before alerting us again. If you add a blank to the end of the escalation chain, you can stop repeated messaging as well. Works especially well if you are using a ticketing system that only accepts email as an incoming connector.
  25. 1 point
    In the Auto Updates for Collectors, when you set a schedule there is no way to go back and edit the Collectors in that schedule without having to delete the schedule and re-add it again with the new Collectors. Please give us the ability to add or remove Collectors from an Automatic Collector Upgrade Group within the Selected Collectors area. This would make administration much easier than having to delete and create every time I need to add or remove a Collector from an Auto Upgrade.
  26. 1 point
    I'm currently trying to make a customer facing dashboard that inventories their VM status in a NOC view, then when one is clicked on, rather than going to the resource page (they don't need that much detail) takes them to a dashboard for that resource with specific info on it (CPU, MEM, DISK, etc.). This would be a very helpful feature to add. Perhaps with the ability to define data that gets passed from one dashboard to another to be access with a ##variable## in the sub-dashboard. This would allow the click to pass the ##system.displayname## for instance to allow it to show the data from whatever you're looking for (resource or resource group, specific DataPoint, etc.).
  27. 1 point
    What is the retention period for data in TSDB? How can I fetch data from TSDB if I want to observe a trend over a particular period of time?
  28. 1 point
    The ability to have a SDT "scheduler" like the one below would be helpful.
  29. 1 point
    In Nagios, there is a concept of an event handler that can run to try to fix problems (e.g., restart a service, remove old files, etc.). I see no similar capability in LM and it is of course something customers want to see happen. For example, I just deployed a custom DS for someone to check for too many files in a share, indicating a service problem. Once I implemented that, the next question was "Can you restart the service when that goes into warning?" I see no facility for this in LM, but perhaps a custom alert could be used to trigger the behavior. If I used that approach, I would insert a custom HTTP alert into the escalation chain earlier on to give the problem a chance to be corrected, then I will have to create a secure REST API server to accept those and trigger the correct behavior. So in theory it could be done (if I am not missing something), but it feels like using a screwdriver to hammer in a nail. Thanks, Mark
  30. 1 point
    Please forgive me if this has been requested in the past. I didn't see it in my brief searching. Background/use-case: Our users and administrators often bounce between several different combinations of filters in LogicMonitor's Alerts tab throughout the day. There are some common combinations of filters used more frequently than others by individuals, by specific teams, and by all users. While we can, and have, created dashboards for some of these common sets of alerts, the dashboard Alerts widget doesn't offer the same functionality as LogicMonitor's main Alerts tab - particularly the inline graphs and history (which are wonderfully useful). The Ask: Allow saving combinations of filters on the Alerts tab as named filter-sets (i.e. favorites), along with a basic interface for managing these filter-sets (editing, deleting, re-ordering, etc). It would also be wonderful to have the ability to designate shared/public filter-sets (controllable by role). I see this as a valuable enhancement for customers not doing event management through external systems. As an additional nice-to-have, it would be very useful to have the option to save column settings per filter-set as well.
  31. 1 point
    Can I also make a feature request to retain the custom thresholds / attributes (user optional, probably by means of a toggle button to choose between overwrite or leave as is ) while updating LogicModules? I did notice related requests from the past and it seems that it is not yet released.
  32. 1 point
    It would be nice if in Alert Tuning we could search like we can in other areas. Also in Alert Tuning it would be nice if we could see the real DataSource name as well instead of guessing which of the 4 CPU's I see are the right one.
  33. 1 point
    Hi ! Do you plan to implement full Netflow v9 support with templates anytime? We need this to present application usage information for our customers with Palto Alto firewall. Specifically we are looking for support for the Palo Alto app-id that is supported in IPV4-ENTERPRISE template. https://www.paloaltonetworks.com/documentation/70/pan-os/pan-os/monitoring/netflow-monitoring.html
  34. 1 point
    @Sarah Terry Please address urgently. These new verbose error dialogs expose the WMI password. Ideally I'd like a Settings options to disable such verbose error messages, or restrict them by role. (Also can these dialogs be more responsive, no a 1920x1080 screen these appear as narrow panels in the middle.)
  35. 1 point
    P.S. If I have not gone crazy in my line of thinking, please update the REST API Documentation to include descriptions/methods/models etc for /setting/alert/internalalerts resources. 😀👋
  36. 1 point
    I get asked about this a lot as well. Some devices report uptime via this widget, some don't. Most don't. It would be nice to be able to hide/remove it entirely.
  37. 1 point
    There is two main types of SNMP checks. There is your SNMP Get/Walk and there is SNMP Traps. They work very differently. SNMP Get/Walk is where LogicMonitor will directly query your device for state/performance data, this is what most of LogicMonitor wants to use, is the best option and what !snmpwalk does. There is also SNMP Traps where you setup the device to send out alerts to the monitoring system. The setup for each of these are completely different. Many devices support both but some devices only support SNMP Traps (looking at you EMC). If the device supports SNMP Get/Walk, there is likely a section for this on the device config separate from the SNMP Trap section. Also you may need to white-list the IP address of the collector on the device. If the device only supports SNMP Trap, you can still set it up in LogicMonitor but it's far more limited: https://www.logicmonitor.com/support/eventsources/types-of-events/snmp-trap-monitoring/
  38. 1 point
    @Steve Francis Thank you for this, Steve! Can website ping checks be used as primary devices/values for 'depends_on'?
  39. 1 point
    GX2WXT A single Lambda function might have several versions. The default Lambda datasource monitors and alerts on the aggregate performance of each Lambda function. Using the Alias functionality in AWS, this datasource returns CloudWatch metrics specifically for the versions to which you have assigned aliases, allowing you to customize alert thresholds or compare performance across different versions of the same function. This datasource does not automatically discover aliases and begin monitoring them (as this could very quickly translate into several Aliases being monitored and drive up your CloudWatch API bill). Instead, add only the Aliases you want monitored by adding the device property "lambda.aliases" either to individual Lambda functions or at the group level if you're using the same Alias across several lambda functions. To add more than one, simply list them separated with a single space - e.g: "Prod QA01 QA02". If an alias does not exist, no data will be returned. This datasource is otherwise a clone of the existing AWS_Lambda datasource with the default alert thresholds.
  40. 1 point
    It is currently impossible to detect certain conditions without having to be bombarded by noise alerts, which I am told is against the philosophy of Logic Monitor. Consider a few cases: * interface flaps a few times versus more frequently -- how do you tell the difference? right now, you have no choice other than perhaps to construct an API script (not tested). A better solution in this example would be to count the number of flaps over a period of time, and use that as your alert trigger. As it stands right now, there is not even a method to select the top 10 most unstable interfaces since it is literally a yes or no value and top 10 makes no sense. * resource utilization (bandwidth, CPU etc.) is sometimes much better checked over a period of time than just a single interval. the answer I have received on that is "require N checks to fail", and this works if the resource is pegged, but not if it is spiky. As it stands now, the longer of a period you want to simulate via "N checks", the higher the chance one check will reset the alert but the overall result is clearly bad on inspection. Please note this problem has been solved long ago by other tools, like Zabbix (https://www.zabbix.com/documentation/3.4/manual/config/triggers/expression), so hopefully this can be added to LM in the near future as well.
  41. 1 point
    We run a horizontally distributed architecture. As such, we really don't care (too much) if we lose one of N hosts, provided that a minimum number of hosts/processes/etc. are up and healthy. LogicMonitor makes it easy to make a graph of computed datapoints that span hosts, but doesn't let us configure alerts on the same computed data. Tangible example: One application, when running, publishes capacity data to LM. This capacity data is aggregated and graphed, giving us great insight for planning purposes. However, the only alert configuration that LM supports requires us to alert on every single host, sometimes causing unnecessary wake ups in the middle of the night. Operationally, we'd be fine having one host be down, as long as we maintain adequate reserve capacity. System-wide reserve capacity can only be determined by aggregating data across the set of hosts (just like the graphs do). We've been told to write custom scripts to do the collection and aggregation, and perhaps some rainy day we will. However, it seems like 1) LM does so much of the necessary bits already and 2) this would be a really useful capability for anyone that runs a horizontally distributed architecture. This isn't a "holy cow, gotta have this now!" type of feature request, but certainly would be a great value-add.
  42. 1 point
    Would it be possible to provide an API call or calls that provide a 'hit count' (historical and current) against alert rules and escalation chains? Ideally it would allow a filter to be assigned for alert levels of interest. This would help in providing metrics around how many alerts are being generated, and to what areas of responsibility, and help drive additional questions around configuration and maintenance. I know there is a report to extract thresholds and their destinations, but these metrics are not available currently, it seems. Many Thanks ~Nick
  43. 1 point
    Matthew, Please let me know when this is out. None of our Cisco equipment works. IN the meantime, we have started using a different syslog system that works fine.
  44. 1 point
    I'd forgotten about auto properties tasks - these were the actual culprit causing some of our UPS's to trigger these "unauthorized access" email notifications. Unfortunately, there doesn't appear to be a way to disable these. I would really like a switch, check box, radio buttons - whatever - to easily choose what tasks (auto properties, active discovery, data collection) run against a given device.
  45. 1 point
    Because of the of the way Alert Rules are processed, we need a way to export all Alert Rules definitions so that we can store a copy in our Configuration Management ystem, and also so that we can take a backup before implementing major changes to Alert Rules, so that if the change needs to be reversed, we can refer back to the previous configuration of Alert Rules. Yes, of course, we can take screen shots, but that would be very time consuming if we had to update lots of alert rules. A generic Export/Import All Alert Rules option would be ideal.
  46. 1 point
    I've been looking at how best to represent the status of a service / application using Dashboards within LogicMonitor. It would be really useful if we could nest/cascade dashboards, so that you could represent items at a very high-level (e.g. overall status of your datacentre infrastructure) and then be able to drill down through underlying dashboards, etc. I've found that a similar functionality request was submitted some time ago by another member, but that was back in 2014.
  47. 1 point
    ConnectWise (and most modern PSA) APIs allow for two way communication. It would be very helpful if ACK or Clearing of Alerts triggered actions in ConnectWise tickets. For example, an ACK could update the status to in progress and/or even assign it to a user, comments could be added to the ticket. Also, multiple alerts for the same item could be child tickets or additional comments on the ticket. Lastly, it would be very helpful to set custom statuses on emails based on certain conditions. I was really hoping for a richer integration similar to how our RMM works.
  48. 1 point
    I have brought this up before and was shot down with the "works as designed". We 100% agree with this statement "Second, when a alert crosses a threshold the second time a week after the original acknowledgement (as we saw in my first post) I think it is safe to assume that should be considered a new "alert session." We have cases with the following conditions: 1. alert triggers on warning threshold 2. NOC acks with "monitoring" 3. alert crosses error threshold 4. NOC escalates to SME 5. NOC acks with "escalating to SME" 5. alert crosses critical threshold 6. NOC acks with "incident created. Management informed" 7. SME remediates just enough to move the alert down to warning 8. SME informs NOC issue fixed 9. NOC closed incident and resumes watching the alert page 10. alert crosses error threshold 11. No notification 12. alert crosses critical threshold 13. No notification 14. server crashes 15. People ask why no alert.... As a monitoring service, over communication is 100x more acceptable than a server crashing.
  49. 1 point
  50. 1 point
    I would like to propose an idea that Logicmonitor needs a better way for external systems to input data into the Logicmonitor system. Similar to Zabbix Sender https://www.zabbix.com/documentation/2.2/manpages/zabbix_sender Use case is: suppose alongside LM, a company runs an APM like New Relic, a log monitoring tool like Elasticsearch/Splunk, a custom Data warehouse for analytics. As the NPM, LM should be the one source of alarming and trending. I believe the best way to integrate is to allow a direct api to send data or allow ability to interface with the collector to send data. This way any application, no matter custom or common public applications can input data into Logicmonitor. For examples - if elasticsearch/splunk found a critical error in its munching of logs, it will open connection to logicmonitor and send data that this error log occurs 5 times in the last 2 mins. Logicmonitor is configured to alert if > 1 so there is an alert to our NOC - if the APM finds that a website has an immense increase in traffic from one location causing performance issues, it will open connection to logicmonitor and send data that this is occured. - if the mining of our data warehouse finds that customers interest/purchase of one of our products has dipped 20% in the last month, it will open connection to logicmonitor and send this data.