API Rate Limits
AgentSync APIs implement rate limiting to ensure fair usage and maintain service stability. This guide explains our rate limiting policies and how to handle them.
Rate Limit Overview
| Endpoint Type | Limit | Window |
|---|---|---|
| API Endpoints | 300 requests | Per minute |
| Token Endpoints | 200 requests | Per minute |
| Bulk Operations | 50 requests | Per minute |
Rate Limit Headers
Every API response includes rate limit headers:
X-RateLimit-Limit: 300
X-RateLimit-Remaining: 295
X-RateLimit-Reset: 1705320000
| Header | Description |
|---|---|
X-RateLimit-Limit | Maximum requests allowed per window |
X-RateLimit-Remaining | Requests remaining in current window |
X-RateLimit-Reset | Unix timestamp when the window resets |
Rate Limit Response
When you exceed the rate limit, you'll receive a 429 Too Many Requests response:
{
"error": "rate_limit_exceeded",
"error_description": "You have exceeded the rate limit. Please retry after 45 seconds.",
"retry_after": 45
}
The response includes a Retry-After header indicating when you can retry.
Handling Rate Limits
Exponential Backoff
Implement exponential backoff when you hit rate limits:
import time
import requests
from requests.exceptions import HTTPError
def make_request_with_retry(url, headers, max_retries=3):
for attempt in range(max_retries):
response = requests.get(url, headers=headers)
if response.status_code == 429:
retry_after = int(response.headers.get("Retry-After", 60))
wait_time = retry_after * (2 ** attempt) # Exponential backoff
print(f"Rate limited. Waiting {wait_time} seconds...")
time.sleep(wait_time)
continue
response.raise_for_status()
return response
raise Exception("Max retries exceeded")
Proactive Rate Limiting
Check remaining requests and slow down before hitting limits:
def check_rate_limit(response):
remaining = int(response.headers.get("X-RateLimit-Remaining", 0))
reset_time = int(response.headers.get("X-RateLimit-Reset", 0))
if remaining < 10:
wait_time = max(0, reset_time - time.time())
print(f"Approaching rate limit. Waiting {wait_time} seconds...")
time.sleep(wait_time)
Best Practices
Do
- Monitor rate limit headers in every response
- Implement backoff when rate limited
- Cache responses when possible
- Batch requests to reduce API calls
- Use pagination efficiently
Don't
- Ignore 429 responses - always implement retry logic
- Hammer the API after receiving a rate limit error
- Make unnecessary requests - cache what you can
- Rely on timing - use the headers provided
Rate Limits by Environment
| Environment | API Limit | Token Limit |
|---|---|---|
| Development | 300/min | 200/min |
| Test | 300/min | 200/min |
| Sandbox | 300/min | 200/min |
| Production | 300/min | 200/min |
Increasing Rate Limits
If you need higher rate limits for your application:
- Document your use case - Explain why you need higher limits
- Contact DevOps - Submit a request in
#devops-support - Provide metrics - Show current usage patterns
- Review alternatives - Consider caching or batching
Monitoring Your Usage
Track your API usage to avoid hitting rate limits:
class RateLimitMonitor:
def __init__(self):
self.requests_made = 0
self.window_start = time.time()
def track_request(self, response):
self.requests_made += 1
remaining = int(response.headers.get("X-RateLimit-Remaining", 0))
limit = int(response.headers.get("X-RateLimit-Limit", 300))
usage_percent = ((limit - remaining) / limit) * 100
print(f"Rate limit usage: {usage_percent:.1f}%")
if usage_percent > 80:
print("Warning: Approaching rate limit!")
Common Issues
"I'm hitting rate limits with normal usage"
- Review your code for duplicate requests
- Implement caching for frequently accessed data
- Check for retry loops without backoff
"My batch job is rate limited"
- Spread requests over time
- Use bulk endpoints where available
- Consider running during off-peak hours
"Different services have different limits"
- Some internal services have custom limits
- Check service-specific documentation
- Contact DevOps for clarification
Need higher rate limits? Contact #devops-support with your use case.