mnagel

Members
  • Content Count

    406
  • Joined

  • Last visited

  • Days Won

    70

Community Reputation

108 Excellent

7 Followers

About mnagel

  • Rank
    Myth
  • Birthday July 17

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. I was not actually thinking that, but sure, I really would like one! Seems like a way to possibly do some of the cross-device correlation I have been wishing I could do. Just would not dream of touching without docs....
  2. So you know what I am going to ask next, right? What is CollectorDb and where is that documented? Feels a bit unfair to us poor mortal developers :).
  3. Are you sure about the batchscript limitation? Because the new SNMP_Network_Interfaces DS supports subrate ifSpeed designation as ILPs and is batchscript (we know this because it blew out several collectors due to default batchscript thread counts). I have not looked at the guts yet, so perhaps I am off track there.
  4. I found only one instance of WMI.open() in the datasource set we have loaded -- in Citrix_XenApp_UserExperience: def session = WMI.open(hostname); def active_apps = session.queryAll(namespace, "SELECT * from Citrix_Euem_ClientStartup", 15); The implication is it would have to handle wmi.user/wmi.pass behind the scenes. All other references are for WMI.queryAll and WMI.queryFirst with the same implication. If none of the modules properly handle those properties, then there are a lot of broken modules: [mnagel@colby datasources]$ egrep -l 'WMI\.(query|open)' * Citrix_XenApp_UserExperience LogicMonitor_Collector_TotalCPUMemory Microsoft_Exchange_ActiveDirectoryDomainControllers_2016+ Microsoft_Exchange_EdgeTransportDatabaseInstances_2016+ Microsoft_Exchange_EdgeTransportDatabases_2016+ Microsoft_Exchange_MailboxDatabaseInstances_2016+ Microsoft_Exchange_MailboxDatabases_2016+ Microsoft_Exchange_MailboxOverview_2016+ Microsoft_Exchange_Replication_2016+ Microsoft_Exchange_TransportQueueOverview_2016+ Microsoft_Exchange_UnifiedMessaging_2016+ Microsoft_LyncMediationServer_Stats Microsoft_LyncServer_AccessEdgeServerStats Microsoft_LyncServer_Authentication Microsoft_LyncServer_BackupCentralManagementModule Microsoft_LyncServer_ClusterManager Microsoft_LyncServer_ConferencingAttendant Microsoft_LyncServer_Datastores Microsoft_LyncServer_EmergencyCallRouting Microsoft_LyncServer_InstantMessaging Microsoft_LyncServer_MCU Microsoft_LyncServer_Messages Microsoft_LyncServer_Networking Microsoft_LyncServer_Protocol Microsoft_LyncServer_Routing Microsoft_LyncServer_RoutingApps Microsoft_LyncServer_StorageService Microsoft_LyncServer_WebServices Microsoft_LyncServer_XMPPProxy WinCPU Win_WMI_Access_Denied_ErrorCodes Win_WMI_UACTroubleshooter
  5. I would say if the instance can be uniquely identified with data on hand (as described above), then the datasource should be using that as the instance wildvalue, not some arbitrary other thing that could cause excess instances due to customer action or anything similar. As far as data retention, I have found that decisions are often made that lead to loss of data and it is distressing. I just had a case where I pointed out that a datapoint label had a typo. Fixed, but the fix kills all old data for that datapoint. Why must the label be the index rather than a label tied to a persistent index? I see similar problems for DS replacements. I suggested in a F/R long ago that it be possible at DS load time to upgrade from the previous DS version. I fully appreciate that new datasources with alternate structures should be created, but if there was a migration function you could select the datapoint mapping to avoid losing data (currently best option is to run both in parallel until you get enough new data to not look foolish to your clients). Preferably this would be builtin to the new datasource, so it would happen automatically or at least could provide guidance. That sort of mechanism could also handle my typo'ed datapoint issue. Nuts and bolts stuff like that is hard to market, though :(.
  6. To add to this -- there is in fact a pair of datapoints in the new SNMP_Network_Interfaces datasource that could be used to detect aggregate speed problems, a proxy for channel member loss (inInterfaceSpeed and outInterfaceSpeed, set to the reported speed unless overridden by ILPs). To use this, you would either have to set a threshold on the specific datapoints needed and deal with generic alerts, or add a virtual datapoint with a proper alert message, making future module updates painful. I really wish that the F/R I posted years ago to allow for LogicModule inheritance was implemented... If you use SNMP_Network_Interfaces in any real world environments, be sure your collectors have been tuned to allow more batchscript threads!
  7. Yes, I also pull data from ConfigSources to check into Git repos with post-commit triggers to send email. That is how we can find out what actually has changed (and we have had to add a decent chunk of post-processing in some cases as the ConfigSources often suck in ephemeral updates, thrashing the changelog). In this case, I really don't want to pull so much of something that should be handled within LM into external script processing. My method works fine now, just unfortunately you cannot send ILP tokens into PowerShell collection scripts so the list of groups must be hardcoded.
  8. Yeah, we ended up having to pay extra for SumoLogic, but could be anything. Still would be nice to have the barest level of correlation so you could effectively ACK events.
  9. There is no Cisco MIB for etherchannel that I have found (in general -- may be something for some specific device types). Same for other aggregates, like PPP multilink. In our experience, the only safe way to detect aggregate issues is to monitor the reported speed of the bundle as the underlying link does NOT necessarily need to be down for an aggregate to lose members (seen it happen when carriers do testing and fail to put the member back into the bundle). Unfortunately, despite previous requests, the interface datasource still does not report ifSpeed so you cannot set a threshold on that datapoint to detect out-of-spec speeds.
  10. Oh yes, now I recall why I did not consider LMConfig. Changes can not be sent (unless you use an external API script), so you just get the red light alert ("something changed"). My current alert message includes the current and expected list. The alert communication channel in general is fairly small and I hate to constrain it further.
  11. Writing to a file is more or less how we used to do it with Nagios (technically, it was a File::Cache object) but that was with a central Nagios server + gearman and those checks always ran on the central server. With LM, this would be suboptimal in the face of collector pools (coming, one day) or collector failover. IMO, the right for that is a distributed key/value store. As for LMConfig -- it is a premium feature that we don't by default push our clients to have to add to their cost structure to achieve things that should be possible without it. In this case, you could get away with one per domain, so the cost increase is defensible. Will see what I can do...
  12. In our previous life, we had written a Nagios plugin to check whether a sensitive Windows group had changed (e.g., Domain Admins). I created a replacement for this within LM, but since we can't really keep track of deltas without a key/value store, we use a property for each group that specifies the expected members, which should be updated when membership changes intentionally. We also use a property to list the groups for AD so we can store useful ILPs, but since those ILPs are not passed to the collection script (they could be, just are not currently passed for Powershell), the list of groups that can be checked is restricted to what is builtin to the collection script. For one or more AD controllers then, you would specify (for example): windows.groupcheck.list: Domain Admins windows.groupcheck.spec.Domain_Admins: administrator,alice,bob If the list diverges, the datapoint for that group will alert. There is also a total count of members that is tracked, and can be used to set an alert if needed (e.g., some groups like Schema Admins should normally be empty, but that can be handled by the spec). 2Y9FM6
  13. Posted new version with datapoint messages and revised out of the box thresholds. Did not change AD or collection code, but it still shows pending review. PE9KPD
  14. Thanks! I need to make one more pass on it to enable custom alert messages. I added two different virtual datapoints so messages can say "expired XX ago" versus "will expire in XX". Looking forward to one day being able to just check stuff when alert messages are actually handled by template processors :).