Monitoring HAProxy?


Skeer
 Share

Recommended Posts

I've got a trio of Load Balancers running HAProxy that I need to monitor, and I found the HAProxy module and installed it. I verified using the 'Test Applies To' and it found all 3 servers, so I assume that means it's been associated right?  It's been a few days and the resources are not displaying any HAP related info nor do I have a dropdown (IDK the correct term) under the resource itself like there is for CPU, Disks, etc.

Second question.. reading the description here: https://www.logicmonitor.com/integrations/ha-proxy am I correct in assuming that the only stats this module will report is sessions?  If so that's missing a ton of important stats....

 

Thanks!!

Link to comment
Share on other sites

  • Administrators

Yes, the only datapoint that DataSource tracks is sessions. It would seem the active discovery is not returning anything in your case. Navigate to the DataSource (the same place you tested the AppliesTo) and click the "Test Active Discovery" button. If you see no results there, that's your problem. 

This DS uses the HTTP discovery method, meaning that discovery involves pulling up a web page and scraping it for the instances (that's the word you were looking for). In this case, it's looking at https://[hostname/ip]/haproxy?stats and scrapes looking for anything matching RegEx: <th colspan=2 class=.pxname.>(.*?)</th>. I would start by hitting one of your stats enabled frontend on one of the haproxies to see if the page loads. If it does not, you probably need to add that frontend to your haproxy config.

It's possible that this used to be enabled out of the box for older versions of haproxy and the newest version of haproxy requires you to explicitly configure it.

Once you get the page loading in your browser, you might need to make some changes to the DS to get the discovery to pull the page correctly. Once that's working, it looks like it shouldn't be hard to get the other stats from the table on that page. You'll just have to get real familiar with RegEx.

I just got haproxy up and running in Docker and i'll take a look today during any free time i have to see what can be done to pull some of the other stats. Did you manually add the haproxy category to your servers or was it discovered? I'm not aware of a propertysource that auto-discovers haproxy installed on devices, but it wouldn't be the first time there's a propertysource i'm unaware of.

Link to comment
Share on other sites

  • Administrators

Ok, i think i have something for you. Using this haproxy.cfg file:

frontend stats
    bind :8404
    mode            http
    log             global
    maxconn 10

    timeout client  100s
    timeout server  100s
    timeout connect 100s
    timeout queue   100s

    stats enable
    stats hide-version
    stats refresh 30s
    stats show-node
    stats uri  /haproxy?stats

frontend mysite
frontend hissite
frontend theothersite
frontend google.com

 

I was able to write a DS to pull in 56 different datapoints for each frontend. Your mileage may vary. My /haproxy?stats is running on port 80, not 8404 (running inside a container where the container runtime remaps from 80:8404. Either way, you can add a property to the host called "haproxy.port" to specify a port other than 80 that your stats page is running on. I'll be publishing this to the Exchange shortly where it will need to undergo code review, but here it is in the meantime: https://github.com/sweenig/lmcommunity/tree/master/haproxy_2_4

FYI, instead of scraping the HTML like the old version did, i dove into the json version of the data. I don't know if this just wasn't available in previous versions of HAProxy, or if someone thought it was easier to scrape the HTML. Either way, it necessitates a new DS since the collection method changes from WEBPAGE to BATCHSCRIPT. You should be able to import it into your portal without changing the existing HAProxy DS. Once you get it working, you can delete the existing HAProxy DS.

  • Thanks 1
Link to comment
Share on other sites

  • 1 month later...

so since i run multiple ha processes, each have their own stats page.  With the JSON format I was getting weird values as stats would pull from different proceses thus my graph would be bouncing around.  I finally figured out that there is a lau script that can pull stats from all the processes and aggregate them to CSV.

So how my main HAprocess page shows this output

And i'm wanting to parse it for the metrics i want

<from my lau stats page>

http:///myserver:8888

pxname,svname,qcur,qmax,scur,smax,slim,stot,bin,bout,dreq,dresp,ereq,econ,eresp,wretr,wredis,status,weight,act,bck,chkfail,chkdown,lastchg,downtime,qlimit,pid,iid,sid,throttle,lbtot,tracked,type,rate,rate_lim,rate_max,check_status,check_code,check_duration,hrsp_1xx,hrsp_2xx,hrsp_3xx,hrsp_4xx,hrsp_5xx,hrsp_other,hanafail,req_rate,req_rate_max,req_tot,cli_abrt,srv_abrt,comp_in,comp_out,comp_byp,comp_rsp,lastsess,last_chk,last_agt,qtime,ctime,rtime,ttime,agg,
stats-aggregate,FRONTEND,,,0,4,800000,7,4505,44273,0,0,0,,,,,OPEN/OPEN/OPEN/OPEN,,,,,,,,,4/2/2/2,8/8/8/8,,,,,0,0,0,4,,,,0,13,0,0,0,0,,0,8,13,,,0,0,0,0,,,,,,,,statistics-cpu-4/statistics-cpu-1/statistics-cpu-3/statistics-cpu-2,
BILLY_front,FRONTEND,,,5,21,800000,379,821472,7749620993,0,0,0,,,,,OPEN/OPEN/OPEN/OPEN,,,,,,,,,4/2/2/2,2/2/2/2,,,,,0,0,0,21,,,,0,0,0,0,0,0,,0,0,0,,,0,0,0,0,,,,,,,,statistics-cpu-4/statistics-cpu-1/statistics-cpu-3/statistics-cpu-2,
stats-2,FRONTEND,,,1,6,600000,1350,80940,4664196,0,0,0,,,,,OPEN/OPEN/OPEN,,,,,,,,,2/2/2,5/5/5,,,,,0,3,0,9,,,,0,1349,0,0,0,0,,3,9,1350,,,0,0,0,0,,,,,,,,statistics-cpu-1/statistics-cpu-3/statistics-cpu-2,
stats-4,FRONTEND,,,1,2,200000,1103,66120,3879247,0,0,0,,,,,OPEN,,,,,,,,,4,7,,,,,0,1,0,4,,,,0,1102,0,0,0,0,,1,4,1103,,,0,0,0,0,,,,,,,,statistics-cpu-4,
stats-aggregate,BACKEND,0,0,0,0,80000,0,4505,44273,0,0,,0,0,0,0,UP/UP/UP/UP,0,0,0,,0,3417.0,0,,4/2/2/2,8/8/8/8,,,0,,1,0,,0,,,,0,0,0,0,0,0,,,,,0,0,0,0,0,0,725.0,,,0.0,0.0,0.0,4.25,statistics-cpu-4/statistics-cpu-1/statistics-cpu-3/statistics-cpu-2,
BILLY_back,BACKEND,0,0,5,21,80000,379,821472,7749620993,0,0,,146,0,48,0,UP/UP/UP/UP,16,16,0,,4,1706.75,48.25,,4/2/2/2,3/3/3/3,,,249,,1,0,,21,,,,0,0,0,0,0,0,,,,,4,0,0,0,0,0,1321.25,,,0.0,0.0,0.0,131623.25,statistics-cpu-4/statistics-cpu-1/statistics-cpu-3/statistics-cpu-2,
BILLY_back,FOX1,0,0,1,6,0,53,198554,2433438274,,0,,0,0,0,0,UP/UP/UP/UP,4,4,0,12,4,1700.0,56.0,0,4/2/2/2,3/3/3/3,2/2/2/2,0,53,0,2,0,,4,L4OK/L4OK/L4OK/L4OK,,0/0/0/0,0,0,0,0,0,0,,,,,0,0,,,,,2569.5,,,0.0,0.0,0.0,1252990.75,statistics-cpu-4/statistics-cpu-1/statistics-cpu-3/statistics-cpu-2,
BILLY_back,FOX0,0,0,2,6,0,39,97291,355396905,,0,,6,0,18,0,UP/UP/UP/UP,4,4,0,12,4,1706.75,48.5,0,4/2/2/2,3/3/3/3,1/1/1/1,0,21,0,2,0,,5,L4OK/L4OK/L4OK/L4OK,,0/0/0/0,0,0,0,0,0,0,,,,,0,0,,,,,1740.5,,,0.0,0.0,0.0,461095.0,statistics-cpu-4/statistics-cpu-1/statistics-cpu-3/statistics-cpu-2,
BILLY_back,FOX3,0,0,1,6,0,123,196964,2033573869,,0,,7,0,21,0,UP/UP/UP/UP,4,4,0,12,4,1691.0,64.0,0,4/2/2/2,3/3/3/3,4/4/4/4,0,102,0,2,0,,4,L4OK/L4OK/L4OK/L4OK,,0/0/0/0,0,0,0,0,0,0,,,,,1,0,,,,,1319.0,,,0.0,0.0,0.0,4631.0,statistics-cpu-4/statistics-cpu-1/statistics-cpu-3/statistics-cpu-2,
BILLY_back,FOX2,0,0,1,6,0,82,328663,2927211945,,0,,3,0,9,0,UP/UP/UP/UP,4,4,0,12,4,1695.75,60.0,0,4/2/2/2,3/3/3/3,3/3/3/3,0,73,0,2,0,,4,L4OK/L4OK/L4OK/L4OK,,0/0/0/0,0,0,0,0,0,0,,,,,3,0,,,,,1327.75,,,0.0,0.0,0.0,76763.25,statistics-cpu-4/statistics-cpu-1/statistics-cpu-3/statistics-cpu-2,
stats-2,BACKEND,0,0,0,0,60000,0,80940,4664196,0,0,,0,0,0,0,UP/UP/UP,0,0,0,,0,3417.0,0,,2/2/2,5/5/5,,,0,,1,0,,0,,,,0,0,0,0,0,0,,,,,0,0,0,0,0,0,0.0,,,0.0,0.0,1.0,1.0,statistics-cpu-1/statistics-cpu-3/statistics-cpu-2,
stats-4,BACKEND,0,0,0,0,20000,0,66120,3879247,0,0,,0,0,0,0,UP,0,0,0,,0,3417.0,0,,4,7,,,0,,1,0,,0,,,,0,0,0,0,0,0,,,,,0,0,0,0,0,0,0.0,,,0.0,0.0,0.0,1.0,statistics-cpu-4,



ie.... i want to pull "BILLY_back, FOX1    scur value"

Edited by danp
Link to comment
Share on other sites

  • Administrators
15 hours ago, danp said:

so since i run multiple ha processes, each have their own stats page

Is each page at its own address (port)? If so, we should be able to easily modify the discovery and collection scripts to pull from each one.

14 hours ago, danp said:

I may just do this with python and snmp, seems much more of a simple approach, yet requires code on servers

If that's an option, you can try it. If it's pure SNMP, you might try the no-code option of building an SNMP DataSource in LM. 

Link to comment
Share on other sites

Yes, each process will require it's own stats page.  

What we found is that when we ran multiple processes that master haproxy stats page would pull randomly from one of the running processes.  Thus our stats would look like 32 current_sessions then a second later would read 15 current_sessions, the the graph was skewed.  When we really needed 32+15 for total sessions.

We used lau to aggregate the stats as shown here: https://discourse.haproxy.org/t/lua-solution-for-stats-aggregation-and-centralization/27

Thus it creates a master aggregate page... ours on port 8880 which dumps the csv.

I ended up just doing a simple python script to pull that stats back as a keyvalue pair and extending snmp to pull them: (it was the easy solution)

 

import requests
import io
import csv

r = requests.get('http://127.0.0.1:8880/')
f = io.StringIO(r.text)
reader = csv.reader(f, delimiter=',')
for row in reader:
    if row[0] == 'BILLY_back':
       print(f"{row[1]}_SessCur={row[4]}\n{row[1]}_Status={row[22]}")


thus returns a clean K-V pair, which can easily be used as an snmp extension and metrics pulled into a very simple datasource
 

BACKEND_SessCur=60
BACKEND_Status=4
FOX1_SessCur=15
FOX1_Status=4
FOX0_SessCur=15
FOX0_Status=4
FOX3_SessCur=15
FOX3_Status=4
FOX2_SessCur=15
FOX2_Status=4

 

Link to comment
Share on other sites

  • Administrators

Cool that it's working. I think it would be pretty easy to modify the existing DS to pull from separate pages, then LM can aggregate if you need it but also show individual stats as well. Is there a programmatic way to discover the addresses of all the pages?

Link to comment
Share on other sites

I know all the addresses they would be 8881-4  with the master aggregated page as 8880.

I can see how we would be able to use the json slurper to pull the individual pages but that just seems like a waste of processes when we can just parse the aggregate page with some sort of web CVS slurper.  

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share