Documentation
Unify Data
Logo
Polling with Pagination and Offsets

Polling with Pagination and Offsets

Logo

3 mins READ

Overview

Polling with Pagination and Offsets is a fundamental data retrieval strategy employed in UnifyApps Data Pipelines when working with API-based data sources that organize results into discrete pages. This approach enables systematic processing of large datasets that cannot be returned in a single API response.

What is Polling with Pagination and Offsets?

Pagination is a technique where API results are divided into sequential "pages" of data, with each page containing a limited number of records. Offset-based pagination uses numeric position indicators to navigate through these pages of data. When combined with polling, this creates a reliable mechanism for systematically processing large datasets.

Key Concepts

Pagination Parameters

Most paginated APIs utilize two primary parameters:

  • Limit: The number of records to return per page (often called "page size" or "count")

  • Offset: The position in the dataset where retrieval should begin (may be called "start" or "skip")

Offset Calculation

The offset for each subsequent page is typically calculated as:

offset = (page_number - 1) * limit

Implementation Process

  1. Begin with an initial offset of 0 (the first page)

  2. Process the returned records

  3. Increment the offset by the page size

  4. Request the next page

  5. Continue until receiving fewer records than requested or an empty result set

Example Data Retrieval Process

Initial Request (Page 1)

Parameters: limit=5, offset=0

Record #Customer IDCustomer NameEmailCreated Date
1CUST-001Acme Corporationcontact@acmecorp.com2025-01-15
2CUST-002TechSolutions Incinfo@techsolutions.com2025-01-16
3CUST-003Global Enterprisessales@globalent.com2025-01-17
4CUST-004Pacific Distributorsorders@pacificdist.com2025-01-18
5CUST-005Sunrise Industriesinfo@sunriseind.com2025-01-19

Calculation for next page: offset = 0 + 5 = 5

Second Request (Page 2)

Parameters: limit=5, offset=5

Record #Customer IDCustomer NameEmailCreated Date
6CUST-006Quantum Innovationssupport@quantuminv.com2025-01-20
7CUST-007Highland Servicesinfo@highlandserv.com2025-01-21
8CUST-008Coastal Solutionshelp@coastalsol.com2025-01-22
9CUST-009Metro Logisticssales@metrolog.com2025-01-23
10CUST-010Atlas Technologiescontact@atlastech.com2025-01-24

Calculation for next page: offset = 5 + 5 = 10

Third Request (Page 3)

Parameters: limit=5, offset=10

Record #Customer IDCustomer NameEmailCreated Date
11CUST-011Pinnacle Systemsinfo@pinnaclesys.com2025-01-25
12CUST-012Horizon Enterprisessales@horizonent.com2025-01-26
13CUST-013Silverline Partnerscontact@silverlinepr.com2025-01-27
14CUST-014Northern Solutionssupport@northernsol.com2025-01-28
15CUST-015Evergreen Industriesorders@evergreenind.com2025-01-29

Calculation for next page: offset = 10 + 5 = 15

Final Request (Page 4)

Parameters: limit=5, offset=15

Record #Customer IDCustomer NameEmailCreated Date
16CUST-016Sapphire Analyticsinfo@sapphireana.com2025-01-30
17CUST-017Redwood Partnerscontact@redwoodp.com2025-01-31

Result: Only 2 records returned (less than the requested limit of 5), indicating we've reached the end of the dataset.

Challenges and Considerations

Performance Degradation

Offset-based pagination can experience performance issues with very large datasets, as the database must still process all records up to the offset point. For example, retrieving records 10,000-10,100 requires the database to count through the first 10,000 records before returning results.

Consistency Issues

If data is being added or removed during the polling process, offset-based pagination can lead to:

  • Missed records (if items are added before the current position)

  • Duplicate records (if items are removed before the current position)

API Limitations

Many APIs impose:

  • Maximum offset values

  • Maximum page size values

  • Rate limits on the number of requests

Best Practices

  • Use reasonable page sizes that balance between minimizing API calls and processing efficiency

  • Implement retry logic with exponential backoff for failed requests

  • Store pagination state to resume interrupted processes

  • Track progress metrics to identify performance issues