API

B2B Leads API Pagination: How to Pull Large Lead Lists Safely

Fetching thousands of B2B leads via API sounds straightforward until you hit timeouts, duplicate records, or rate limit errors. This guide walks through pagination strategies tailored to B2B lead data workflows—cursor-based vs offset approaches, handling rate limits gracefully, implementing retry logic, and structuring loops that survive production environments. Includes code snippets, common pitfalls checklist, and links to related API workflow articles.

May 28, 202610 min readDievio TeamGrowth Systems
Primary domain SEOAuto-updating CMS routeStrapi-backed content
B2B Leads API Pagination: How to Pull Large Lead Lists Safely article cover image

Introduction

If you've ever tried to pull 50,000 B2B leads through an API in one go, you already know the pain. Timeouts. Duplicate records halfway through. A 429 rate limit error right when you're at page 42, and suddenly you have no idea where you left off. It's the kind of problem that turns a straightforward lead generation API integration into a debugging nightmare.

Pagination is the mechanism APIs use to deliver large datasets in manageable chunks. Without it, every request would either time out or crash your application's memory. But not all pagination strategies are created equal, especially when you're dealing with dynamic B2B lead data—datasets that shift as companies change size, people change jobs, and filters return different results over time.

This guide is written for developers and RevOps teams who need to fetch thousands (or hundreds of thousands) of B2B leads from an API reliably. We'll cover the two dominant pagination methods—cursor-based and offset—and walk through production-ready patterns for handling rate limits, recovering from failures, and building loops that actually survive in the wild. By the end, you'll have a clear playbook for pulling large lead lists safely, without losing data or wasting credits.

What Is API Pagination?

API pagination is the practice of splitting a large response into multiple smaller responses (pages). Instead of returning 10,000 leads in one JSON blob, the API returns the first 100 (or 500, or whatever batch size the provider allows) along with a token or pointer to fetch the next chunk. This protects both the server and the client from overwhelming loads.

For B2B lead data, pagination is especially important because queries can be complex: filtering by industry, company size, job role, location, and technology stack. Those filters can return tens of thousands of results, and the API needs to balance request speed with data freshness. As HubSpot notes in their prospecting content, lead datasets are rarely static—companies update their statuses, contacts change roles, and your filters may return slightly different results from one call to the next. Pagination strategies must account for that volatility.

Offset vs Cursor-Based Pagination

These are the two most common pagination methods in B2B leads APIs. Let's compare them head-to-head.

Factor Offset Pagination Cursor-Based Pagination
How it works Uses page and limit or offset parameters. Page 1 returns items 1–100, page 2 returns 101–200, etc. Returns a cursor (usually an opaque string or token) in the response. You pass that cursor in the next request to get the next page.
Performance on deep pages Degrades. The database must skip all prior rows on each request (e.g., offset 10000 means scanning 10,000+ rows). Constant time per page, regardless of depth. Cursor points directly to the starting record.
Consistency with dynamic data Fragile. If a new lead is added or a record is removed between page requests, you can skip or duplicate items. Stable. The cursor often encodes a position in time or a unique ID, so new/moved records don't shift the results under you.
Idempotency Difficult. Page numbers are sensitive to changes in result ordering. Good. A cursor returns a deterministic set of records regardless of concurrent changes (assuming the API implements it correctly).
Implementation complexity Low. Simple loop over incrementing page numbers. Moderate. You need to manage cursor state, handle cursor expiry, and possibly handle null cursors.
Best use case Small, static datasets (e.g., fewer than 1,000 records) where data doesn't change mid-extraction. Large, dynamic datasets—which is most B2B lead lists. Recommended by vendors like LinkedIn Sales Navigator for their bulk APIs.

Recommendation: For pulling B2B leads at scale, always prefer cursor-based pagination. Offset works for quick exports of small static lists, but it will break your workflow on datasets of 10,000+ leads, especially if you're refreshing that data regularly.

Pagination Parameters Reference

Before we write code, let's decode the typical parameters you'll encounter in B2B leads APIs.

  • page or offset: The page number (starting at 1) or row offset (starting at 0). Used in offset-based pagination.
  • limit or per_page: How many records to return per page. Common defaults are 50 or 100; max might be 500 or 1000 depending on the provider.
  • cursor or next_token: An opaque string provided in the response. Pass it in the next request to get the next page. Often available in response headers or body.
  • has_more or next_page: A boolean or URL indicating whether more pages exist. Some APIs return a next field with the full URL for the next request.
  • sort and order: Some APIs require a sort field to make cursors stable. Always check the documentation.

Always read the API's responses for metadata. For example, the response body might include:

<code>{
  "data": [...],
  "pagination": {
    "cursor": "abc123",
    "has_more": true
  }
}</code>

Or the cursor might be in a response header like X-Next-Page-Token.

Building a Safe Pagination Loop

The core pattern for safe pagination is straightforward: initialize a cursor, fetch a page, process the records, check for more, and advance the cursor. But the devil is in the details—especially error handling and idempotency.

Step-by-Step Workflow

  1. Initialize state. Set cursor to null (or use the provided starting cursor from the API).
  2. Make the request. Include cursor in the query params or body. Also pass any filters, sort, and limit.
  3. Check response. If HTTP 200, extract the list of leads and the next cursor. If not, handle errors (see next section).
  4. Process data. Validate and store the leads. Deduplicate if necessary (e.g., using a HashSet of lead IDs).
  5. Check for more pages. If the API signals has_more: false or next cursor is null, stop. Otherwise, set cursor to the new value and loop.
  6. Respect rate limits. Throttle requests based on headers or predefined limits (more on this below).

Here's a pseudocode version:

<code>cursor = null
while True:
    response = api.get_leads(cursor=cursor, limit=100, filters=...)
    if response.status == 429:
        wait_and_retry(response)
        continue
    leads = response['data']
    process(leads)
    cursor = response['pagination']['next_cursor']
    if not cursor:
        break
    throttle_if_needed()</code>

This pattern is stateless—you can resume from any cursor. That's key for production robustness.

Handling Rate Limits and Throttling

Every API enforces rate limits. If you ignore them, you'll get 429 Too Many Requests errors and potentially get banned or have your credits wasted. For B2B leads APIs, rate limits are usually expressed in requests per second (RPS) or per minute (RPM), and often include a burst limit.

Identify the Limits

  • Check response headers like X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset (Unix timestamp).
  • Look at the API documentation for the exact limits. Some providers also return a Retry-After header on 429 responses.

Implement Exponential Backoff with Jitter

When you hit a 429, don't just retry immediately. Use exponential backoff:

<code>import time, random

def wait_with_backoff(attempt, base=1, cap=60):
    wait = min(base * (2 ** attempt), cap)
    jitter = random.uniform(0, wait * 0.5)
    time.sleep(wait + jitter)</code>

Add a small jitter to avoid all clients retrying simultaneously. Also, respect the Retry-After header if provided—it beats any algorithm.

Checkpoint Pattern for Resume Capability

If your fetch is interrupted (e.g., the script crashes), you want to resume from the last successful cursor, not start over. Store the cursor after each successful page fetch—ideally in a persistent store like a file or database.

<code>def fetch_all(start_cursor=None):
    cursor = start_cursor
    while True:
        data, next_cursor = request_page(cursor)
        save_last_cursor(next_cursor)  # persist checkpoint
        if not next_cursor:
            break
        cursor = next_cursor</code>

On restart, read the last saved cursor and pass it as start_cursor. This is especially valuable when you're pulling many thousands of leads overnight.

Error Recovery and Idempotency

Errors happen: network blips, server timeouts, malformed responses. Your pagination loop must distinguish between transient and permanent errors.

Checklist for Robust Error Recovery

  • Detect partial failures: If you receive a successful page but only partially processed it (e.g., your database connection broke midway), your next request should still advance the cursor—don't re-fetch the same page unless you've already processed all records.
  • Deduplicate records: Use a set of lead IDs to ensure you never insert the same lead twice, even if your loop restarts a page.
  • Store last successful cursor: As mentioned, persist it after each successful page. If your process crashes and restarts, it can pick up from the exact point of failure.
  • Graceful degradation: If an error appears permanent (e.g., 401 unauthorized, 400 bad request), halt the loop and log the error for human review. Don't retry indefinitely.
  • Distinguish transient vs permanent errors: Retry 429, 503 (Service Unavailable), and connection timeouts. Do not retry 400, 401, 403, or 404—those require fixing the request.

According to the Salesforce Lead Management implementation guide, handling partial data and implementing idempotent inserts is crucial when syncing lead data to a CRM—you don't want to create duplicate records in your sales system because your pagination loop hiccupped.

Code Example: Node.js Pagination Pattern

Here's a production-ready Node.js snippet using async/await, exponential backoff, and cursor tracking. You can adapt this pattern to Python, Go, or any language.

<code>const axios = require('axios');

async function fetchAllLeads(apiKey, filters, options = {}) {
  const { maxPages = Infinity, initialCursor = null } = options;
  let cursor = initialCursor;
  let pageCount = 0;
  let allLeads = [];
  const seenIds = new Set();

  while (cursor !== null && pageCount < maxPages) {
    let attempts = 0;
    while (attempts < 5) {
      try {
        const response = await axios.get('https://api.example.com/v1/leads', {
          params: { cursor, limit: 100, ...filters },
          headers: { 'Authorization': `Bearer ${apiKey}` }
        });
        const leads = response.data.data;
        // Deduplicate
        for (const lead of leads) {
          if (!seenIds.has(lead.id)) {
            seenIds.add(lead.id);
            allLeads.push(lead);
          }
        }
        cursor = response.data.pagination.next_cursor;
        break; // success, exit retry loop
      } catch (err) {
        if (err.response?.status === 429) {
          const retryAfter = err.response.headers['retry-after'];
          const wait = retryAfter ? parseInt(retryAfter) : Math.pow(2, attempts) * 1000;
          await new Promise(r => setTimeout(r, wait + Math.random() * 1000));
          attempts++;
        } else if (err.response?.status >= 500) {
          await new Promise(r => setTimeout(r, 2000));
          attempts++;
        } else {
          throw err; // permanent error
        }
      }
    }
    if (attempts === 5) {
      console.error('Failed after retries, last cursor:', cursor);
      break;
    }
    pageCount++;
    // Optional: store checkpoint
    await storeCheckpoint(cursor);
  }
  return allLeads;
}</code>

This pattern respects rate limits, deduplicates, and persists checkpoints—exactly what you need for pulling B2B leads API pagination safely at scale.

Performance Optimization Tips

Fetching leads faster isn't just about raw speed—it's about using credits efficiently and avoiding unnecessary load.

  • Tune batch size per query complexity. A simple filter (e.g., title = "VP Sales") can handle larger batches (500) without timeout. A complex query with multiple filters and sorting may need smaller batches (100).
  • Parallelize independent requests cautiously. If you're pulling leads for multiple different filters, you can run them in parallel (one per filter). But respect the overall rate limit per API key. Use a semaphore to cap concurrency.
  • Prefer field filtering to reduce payload. If you only need email and company name, specify that in the API's fields parameter. Smaller payloads mean faster responses and less processing.
  • Cache stable reference data. Company industry codes, location hierarchies, or job title normalizations rarely change. Cache them locally so you don't re-fetch the same metadata on every pagination loop.
  • Stream results to disk. If you're pulling 100,000 leads, don't accumulate everything in memory. Write each page to a file or database as you go.

Common Pagination Pitfalls to Avoid

Even experienced developers make these mistakes. Here's a checklist of what to watch for:

  • Ignoring rate limit headers. You blast through 429s and get your IP throttled. Always check remaining credits and slow down proactively.
  • Using offset on dynamic datasets. Leads are added or removed between page 1 and page 50. You end up missing some or seeing duplicates.
  • Missing cursor in error recovery. You store the page number but not the cursor. On restart, you try page X, but the dataset has shifted. Use the API's cursor, not a page index.
  • Processing duplicates without deduplication. If your loop restarts from an earlier page, you re-process the same leads. Use a dedup set or upsert operation.
  • Over-fetching unused fields. Every extra field increases response size and parsing time. Request only what you need.
  • Assuming all rows are returned. Some APIs have a hard limit on total results (e.g., 10,000 rows max). Check the documentation—if you need more, you may need to refine your filters.

Integrating Paginated Lead Data Into Your Stack

Once you have a reliable pagination loop, the next step is feeding those leads into your CRM, marketing automation platform, or custom outreach tool. The same principles apply: deduplication, idempotency, and batch processing.

For teams building recurring list workflows—like white-label lead search workflows—pagination is the foundation. You'll reuse the same cursor-based loop to pull new leads weekly, resuming from the last checkpoint to avoid re-fetching stale data.

If you're also enriching leads with additional data (phone numbers, LinkedIn URLs, technographics), consider using a message queue or a pipeline pattern. That way, pagination becomes one stage, and enrichment another. For a deeper dive into data coverage and accuracy validation, check that resource—it'll help you evaluate the quality of the leads you're paginating through.

Summary and Next Steps

Pagination is the unsung hero of reliable B2B lead extraction. Use cursor-based pagination unless you're absolutely sure your dataset is tiny and static. Implement exponential backoff with jitter, store checkpoints for resume, and always deduplicate your results.

Ready to put this into practice? Our B2B Leads API supports cursor-based pagination with flexible batch sizes, making it easy to build robust integrations. Start pulling your first list with confidence.

And if you want to go further, read about building a white-label lead search workflow or validating your data coverage before purchase—both will help you get the most out of your lead data pipeline.

Build Your First Outbound List to validate the segment before you commit to full outreach.

Keep Reading

More operating notes from the journal.

Related stories stay on the primary domain and expand automatically as new articles appear in Strapi.

Sales Intelligence Integration: How to Feed Real-Time Prospect Data Into Your Outbound Sequences article thumbnail for API
API

Sales Intelligence Integration: How to Feed Real-Time Prospect Data Into Your Outbound Sequences

This article walks through the technical and operational aspects of integrating sales intelligence into outbound workflows. It covers the integration stack, real-time data pipeline architecture, enrichment triggers, CRM synchronization, and sequence automation. Designed for B2B operators and sales ops teams who want to move beyond manual data entry and build scalable, intelligence-driven outbound motions.

May 13, 202613 min readDievio Team
How to Sync Dievio Leads With Salesforce Without Losing Data Quality article thumbnail for API
API

How to Sync Dievio Leads With Salesforce Without Losing Data Quality

This article walks through the end-to-end process of syncing Dievio leads with Salesforce using API-based workflows. It covers authentication setup, field mapping strategy, deduplication logic, error handling, and post-sync data validation. Designed for sales ops teams and developers who need reliable, repeatable lead flow without manual cleanup. Includes a checklist for pre-sync validation and a decision table for field mapping choices.

March 31, 202615 min readDievio Team
Programmatic Lead Enrichment for Internal Tools: A Technical Implementation Guide article thumbnail for API
API

Programmatic Lead Enrichment for Internal Tools: A Technical Implementation Guide

This article provides a technical blueprint for building programmatic lead enrichment into internal B2B tools. It covers API integration fundamentals, workflow design patterns, data handling strategies, and scaling considerations for teams that need to move beyond manual enrichment. The piece targets product managers, ops engineers, and sales development leaders who are evaluating or building enrichment pipelines for their own platforms.

March 30, 202614 min readDievio Team