• 0

Collector Health Dashboard


Gusty
 Share

Question

Hello, I am looking for recommendations or pre built dashboards to show collector health. Does anybody have any widget ideas to share for me to go off of? 

One of my ideas that I have not conceptualized yet is a number widget to show total tasks count across all collectors and a respective failed task number widget across all collectors.

TIA!

~Gusty 

Link to comment
Share on other sites

5 answers to this question

Recommended Posts

  • 1

Mine's more about the configuration than the performance. I've built a few custom datasources that check things like:

- whether or not the collector description matches the host's display name

- whether or not the escalation chain on the collector is set to the right one

- whether or not the resend interval is set to our desired value

- what the current version is compared to the minimum version

As far as collector performance, there are a ton of collector datasources and I intentionally do not modify them. Not because i'm scared i'll screw something up, but so that whenever we do have a failure that can be pinned on a collector performance issue that one of the myriad of collector datasources didn't pick up, I hold LM's feet over the fire to get them to fix their own datasources to alert when that problem happens.

Link to comment
Share on other sites

  • 1

Yeah, use the OOTB Collectors dashboard as a start. As far as which graphs to pay attention to and how to know if something is ok or not, i'm still waiting on guidance from LM. The collectors collect and provide tons of data about themselves, which is great when support is trying to troubleshoot something. But there's no list of top-3 or top-5 metrics to look at to know that your collectors are performing well.

That said, I look at the following:

Collector Active Discovery Tasks > TasksCountInQueue (how many discovery tasks are pending execution at any given time. May jump up but should stay within a certain band)

Collector Data Collecting Tasks > TasksCountInQueue (how many collection tasks are pending execution at any given time. Should be pretty normal except when datasources change, like updates.)

Collector JVM Memory Pools > MemoryPool_Utilization (this one looks like a continuous memory leak, but that's just how java works)

Collector Thread: CPU > CPU_Utilization (high utilization isn't bad as long as it's not pegged. Will likely follow TasksCountInQueue)

Collector Thread: CPU > ThreadUtilization (little fuzzy on exactly what this one is, but I think it's how many of the available threads are being used. If you're using all of them, tasks will wait in the queue until one is available)

Also, I run the NoDataMonitoring DSs

Link to comment
Share on other sites

  • 0

Thanks guys! We have about 25 collectors globally and I am just trying to put some high-level widgets together that show all collectors globally at a glance. So far I have two tables configured (Host Status, CPU Status) to show all collectors and they pop orange or red when they reach an alert threshold for those datapoints.  Trying to think of some other cool ideas I can put to work for us along those lines so thanks for your recommends, let me know if you have any other additional ideas for us :) 

 

Link to comment
Share on other sites

  • 0

@Stuart Weenig Yeah I hit the ground running with that pre built dashboard and it is looking great. The TasksCountInQueue are especially useful to us at the moment and the NoDataMonitoring equally useful as we have two auto balance 2xl collectors with pegged cpu usage and running the debug is something I've been doing manually for about a week during my investigation/troubleshooting! Thanks again Stuart!

Link to comment
Share on other sites

 Share