Steve Francis

LogicMonitor Staff
  • Posts

    267
  • Joined

  • Last visited

Everything posted by Steve Francis

  1. That is something we are considering, but in the interim - why so many collectors for so few hosts? One collector should easily handle 100 devices (unless they are all large NetApp arrays or vmware servers with hundreds of virtual machines). The approach we are taking for now is to make a collector able to handle hundreds of hosts, mostly negating the need for collector groups.
  2. In the new UI: http://help.logicmonitor.com/getting-started/i-just-signed-up-for-logicmonitor-now-what/7-tuning-alert-thresholds/ or old UI: http://help.logicmonitor.com/using/i-got-an-alert-now-what/how-do-i-adjust-thresholds/
  3. If you pick the Country time option under Account Settings (e.g. Europe/Belfast, Dublin, Guernsey, London), as opposed to the Timezone (UTC Dublin, Lisbon, London, Monrovia, Reykavik; or GMT), then it does adjust for daylight savings time. Yes, that is not a reasonable thing to expect people to realise. We have a new design for this coming out....
  4. For Overview graphs, as the graph is not about a specific instance, but a possibly changing set of the top 10 instances, the title can only use the ##HOST## token. The other tokens are available for the legend of an overview graph, however.
  5. Hi - you can (I think) do this now. You can use the device NOC widget (http://help.logicmonitor.com/the-new-ui/dashboards/widgets/device-noc-widget/) to create a green/yellow/red table cell for each VPN tunnel, reflecting the status of individual datapoints on individual tunnels. (You can also aggregate up to router or group level.) So you could base this off the IP SLA responder. You can use wildcards to show all VPN tunnels that have SLA responders automatically, etc. You can label the cell whatever you want. You can make a dashboard page that is just this widget, or share that, or embed that widget on some other system's page. Does that do what you want, or we missing something?
  6. Can you clarify a bit? All instances of datasources, script are not, are associated with a host in LogicMonitor...
  7. The new UI does this better. Its sorting works correctly, so sorting by Roles (or other columns) in the users table does what you'd expect. The users table also has a filter, so you can limit it to people in certain roles or statuses. There is not a direct export to Excel - but you could make one from the getAccounts RPC call and a little scripting. What's the use case for that?
  8. It's there now: call account.logicmonitor.com/santaba/uiv3/dashboard/index.jsp#widget=X You can see the ID (the X value) from the configuration screen of the widgets....
  9. THis is not currently possible -we were just discussing internally today that we need to do this by having WMI helper classes that can be called in scripts, the same way we do for SNMP and JMX. So... you can't for now, but stay tuned....
  10. If your servers are Windows servers - yes, you can do it. Good timing - we just wrote a blog about Windows Time monitoring today. See http://www.logicmonitor.com/blog/2015/07/10/windows-its-about-time/ To solve this specific problem, I'd say get the Win_TimeOffset datasource and then: - add another complex datapoint, called TimeZoneOffset. Again, use Groovy script to calculate the value, but make the Groovy just this: rawDate=output["LOCALDATETIME"] TZ=rawDate[21..24].toInteger() return TZ This datapoint will now reflect the systems timezone offset from UTC, and you can set an alert if it does not equal zero. If oyu mean Linux servers, you could do a very similar thing by parsing the output of 1.3.6.1.2.1.25.6.3.1.5.1
  11. And done officially. Import Win_TimeOffset from core. Blog about it at http://www.logicmonitor.com/blog/2015/07/10/windows-its-about-time/
  12. Whops. Yes we do. I'll get that in there....
  13. OK, this was a good idea. Our standard answer to this has been "Windows will log events when NTP sync fails - use event logging." But as event logs tend to be noisy, that's hard. I had thought there should be a better way to do it than comparing to the collector time. Using w32tm to report on difference between the configured time source was what I hoped to do - but not all hosts being monitored will be running the time service; and if they are and on a large domain, w32tm /monitor can take literally minutes to return, and there are remote execution issues.... So, I did just what you suggest. I'll email you a datasource that does this - it would be great if you could import it and let me know your feedback before I put it in our core repository. Thanks
  14. Good idea. We'll add it to the UI queue.
  15. A tree-view type thing is being prototyped now for the devices page... Please hold. :-)
  16. In this specific case - the easiest way is to go the Settings..Datasources page, and load the snmp64_If- datasource. Clear the 'persistent' checkbox, then go the the device in question and run Active Discovery. That will remove all the down interfaces. Then re-enable the 'persistent' checkbox in the datasource.
  17. Ah - you've got a customized Alert Subject line that is used for all alerts. You could override that for this specific alert, and in the datapoint ServerStatus in A10AX_VirtualServers datasource, set the Alert Subject Template to be: ##LEVEL## : ##GROUP## | ##HOSTNAME## | ##INSTANCE## | DOWN Given that this alert will trigger only when the virtual server is not up, hard coding in the Down value will work. (Although it was also say that on the alert clears.) There is not a way to generally lookup a datapoint value and convert to a text string at the moment, unfortunately...
  18. If you always admin down ports that are not supposed to be in use - this is easily done with a change to the datasource discovery. The reason we don't set this up by default is we only discover interfaces that are operationally 'up', as most people leave all interfaces admin up, so whether operationally up or not is the thing that matters for discovery. We don't undiscover interfaces that are operationally down, as that is when you want to alert on them. If, in your case, all admin up interfaces should be discovered, and alerted on whether operationally up or not, it's easy to set the filters that way. (Although it would result in alerts for admin up interfaces that have never been operationally up.) Is this regarding Optical interfaces, or the regular datasource?
  19. Hi Raz - what context is this in? Off the top of my head, I cant think of any alert subjects that have just numerical status....
  20. Hi Bastian To your points: -Yes, we are SaaS only. This does mean that an outage of our service can mean monitoring is out. Obviously we try to prevent this. :-) It is in fact a very rare occurrence for our service to be down, in a way that impacts monitoring and alerting. We did actually have a 12 minute such outage this year, caused by an error in a script that updates DNS records. This affected about 10% of our customers, for about 10 minutes. Prior to that, I dont believe there had been such an outage for over a year. - the more common outage we post is about our external web site checking (where we test the reachability/performance of websites from various places). As we have test nodes spread around the internet, the test nodes are more subject to various internet issues. - our main service is less subject to internet issues, for a few reasons: it has multiple top level ISPs; it is designed to route around BGP failures. (Data being reported back from customer sites will try to use regular BGP transit paths, but if they fail, it can reroute via other application forwarding nodes we run around the internet - so if a direct path from the customer to their main site is not working, it may route via Singapore, or the north west USA.) - plus, of course, we have a 24 x 7 NOC staff on call, focussed on nothing but the performance and availability of our applications. Something that can not usually be said of premise based systems. Custom monitors are certainly possible. Data can be collected using a variety of collection protocols (SNMP, WMI, JMX, http content interpreted by JSON/XML/Regular expression, various APIs (NetApp, Vmware, etc), and so on.) While LogicMonitor does have datasources for virtually all hardware and software found in a datacenter, it is fairly easy to write a datasource to collect whatever you want, provided it is exposed. So we have customers pulling data out of databases via SQL queries to plot the $/minute flowing through ad networks, for example. You can collect custom JMX mbeans; perfmon counters, or whatever you wish. This data can then be graphed, alerted onm escalated, and aggregated, like any other data we collector. You can certainly collect data via XML. Our web page collector system has built in XML interpretation. If you find there is custom collection that is too complex to be done in our standard collectors (say, for example, you wish to collect data from three different we pages, and create a compound metric that is derived from content on all three), you can also easily extend collection using embedded groovy scripts, that can call whatever groovy libraries you wish (although again, we expose a lot of groovy methods from our collector, to make this easier.) Powershell is also possible. Any scripting language supported by the collector platform (which can be linux or windows) is possible. We generally recommend groovy, as its supported on both platforms, and has some other advantages, but some of our own windows specific datasources use powershell. Feature requests are brought to us in several ways: support requests; posts in the forum; or direct customer discussions. Sometimes its a clearly good idea, and we implement as soon as possible. Sometimes we solve the problem in a different way. Sometimes we get a request that is only appreciated by a single customer, which will not be acted on - but oftentimes we'll acquire further customers with the same needs, and do so. The metrics we look at depend on whether we are adding functionality, or easing workflows. Requests that add functionality generally require higher standards, as we do not want too complex a product. There are a variety of differentiators. Whether it is worth it depends on your situation. We tend to cover a lot wider range of devices (network, servers, storage, power, virtualization) and software. We automate a lot more of the device management: if you add a device, say Citrix Netscaler, we will discover not just all the VIPs and content switching VIPs, and set up monitoring appropriately - we will also keep the monitoring up to date. So if you later add more VIPs, or enable compression, or global server load balancing - the monitoring will notice, and start monitoring those new features. Our view is monitoring is too complex to be not automated. You need a very good engineer to determine what to monitor, and to maintain the monitoring, for each specific class of device. And those engineers will be better deployed adding more strategic value to your company. We are SaaS based, which certainly tends to work better for companies with many datacenters or sites (avoiding VPNs, etc), but also ensures that you are not subject to the "I didn't know my datacenter was down because that was the one with my monitoring in it" issue. We dont limit on interfaces or anything else. And our support is both very good (being staffed with people with operational experience, in US and England) and very accessible (with embedded real time chat in the application). Our company culture is that we are here to help our customers deliver excellent service to their customers (those who use the IT infrastructure.) In summary, what most of our customers have found is we give better monitoring with much less staff resources required, as compared to their prior monitoring. Let me know if you have any more questions. Best Regards
  21. You can reference a graph without adding it to a dashboard by using the getGraphImage API call: http://help.logicmonitor.com/developers-guide/download-data/#image This let's you reference the graph by the combination of hostname, datasource-instance name, and graph name. e.g. to access a device shown as "Cisco 3560", for the CiscoCPU_ datasource, instance 0, to show the CPU graph: https://ACCOUNT.logicmonitor.com/santaba/rpc/getGraphImage?hostDisplayedAs=Cisco%203560&dataSourceInstanceName=CiscoCPU-0&graphName=CPU This provides a static image - not the zoomable/searchable versions. That's a good idea - I'll see if we can get that exposed, but hopefully this helps in the short term.
  22. Interesting idea - one approach you could do is create a rule that catches all alerts (priority 1) that doesnt send the alerts anywhere. You can see them in the LogicMOnitor application, but they wont get escalated.rnThen when you;ve checked/resolved everything, delete that rule to enable normal alert handling,
  23. Both these features are coming soon.rnThere will be templates for the host level data view (plus also other views, like most frequently visited graphs on the host, etc0rnThere will also be the ability to expand an entire host, plus to easily compare multiple graphs within the tree.rnGreat feedback!
  24. Quite correct on the InputDiscard alert - it should have been including InDiscards. That is corrected in core now, thanksrnRe output discards - my question here is, if you are seeing output discards on an interface that is doing less than 10 pps, isnt that a cause for concern/investigation? Input packets can be discarded at low volumes as sometimes that is how systems treat packets not addressed to their MAC addresses (if a switch is flooding out an unknown destination, say).rnBut why would you ever have output discards, coupled with a low packet rate? Love to see some examples, and learn why... Thanks for the feedback!