• 0

How to stagger datasource instance collection?


Question

I'm working with custom Meraki API datasources and have an issue where the collector can get into a state such that all the instances attempt to collect simultaneously. Or, at least closely enough together it triggers Meraki's rate limiting, and even my backoff/retry isn't doing the job. 

I would think that if my collector script for my customer datasource is set to sleep for a rand() number of seconds, I should be able to avoid this. Any thoughts about what I might be doing wrong, or is there "LogicMonitor way" I should be handling this?

The below is what I'm doing to try to account for 429 Rate Limit responses. (Not saying this is 'correct' in any way, but I thought the retry logic should've worked.)

def getAPIQueryOutput(String api_uri)
{
  url = 'https://api.meraki.com' + api_uri
  req = getHTTPResponse(url)

  // Got it the first try
  if(req.responseCode == 200){
    return new JsonSlurper().parseText(req.inputStream.getText('UTF-8'))
  } else if(req.responseCode == 204) {
    // Entry exists but did not have data.
    return null
  } else if(req.responseCode == 400) {
    // Bad request.
    return null
  } else if(req.responseCode == 404) {
    // Whatever we tried to find didn't exist. Return null.
    return null
  }

  // 429 received due to rate limiting, backoff and try again.
  count = 1
  while(req.responseCode == 429 && count < 4){
    backoff = getBackOffMs(count)
    // Wait for increasing amount of time.
    sleep(backoff)
    req = getHTTPResponse(url)
    count++
  }

  // Return whatever we ended up with.
  return new JsonSlurper().parseText(req.inputStream.getText('UTF-8'))
}

def getBackOffMs(count){
  // Get a random int between 1 - 3 inclusive.
  backoff_seed = new Random().nextInt(3) + 1
  return backoff_seed * 1000 * count
}

 

Link to post
Share on other sites

2 answers to this question

Recommended Posts

  • 0
  • Administrators

There is a more LogicalMonitor way to do it.  (see what i did there?)

I believe we're working on getting information ready to present, but I think we're shifting the way LM monitors Meraki to utilize the API instead of SNMP (like you have done).  I think it does something like breaking it up so each network is represented as a different device in LM splitting the monitoring across parallel tasks. Should skirt the 429 problem.

How urgent is this for you?

Link to post
Share on other sites
  • 0

Ah already ahead of you there, I came to the same conclusion after attempting to do a 'single device' model, which just couldn't do it. (If I weren't trying to do API switchport monitoring, it probably would've been fine).

I use the MX at each location as the 'anchor' for all of the Meraki monitoring at the location:

meraki_api_2.PNG.b62b25e8822710b6e7c746b2158f6ab2.PNG

 

But they somehow still manage to sync up. I had the AMP and IPS checks at 4 hour intervals, and after restarting the collector, started getting buckets of 429's:

meraki_api_usage.thumb.PNG.7b0e9b9ec45abcc34339aa9daa1a88b4.PNG

 

And now what's really got me confused ... I set both of those to 5 minute collection intervals about 30 minutes ago and now everything's fine!

 

  • Like 1
Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Answer this question...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.