Skip to main content

Fetching Data with Polling Triggers and Loops

Written by Product Team
Updated today

In Pipefy Integrations, there are two primary ways to receive data from external systems:

  1. Webhooks (Push): The external system sends data to Pipefy exactly when an event occurs.

  2. Polling (Pull): Pipefy actively "asks" the external system if there is any new data at regular intervals.

While webhooks are ideal for real-time events, many external APIs do not support them. In these cases, you must use Polling Triggers.

This article explains how polling works, how to prevent duplicate data, and how to process lists of records.


How Polling Triggers Work

A Polling Trigger acts as an automated schedule. By default, the integration engine "wakes up" every 5 minutes, connects to the external API, and asks: "Are there any new records?"

If there are new records, the external system typically responds with an Array (a list of data items). The integration flow will then process this list. If there is no new data, the flow simply goes back to sleep until the next 5-minute check.


Deduplication Strategies: Avoiding Duplicate Data

Because a polling trigger asks the external API for data every 5 minutes, you must ensure that it only fetches new data. Otherwise, your flow will process the same records over and over again.

To achieve this, you need a Deduplication Strategy. You can manage this state using the native Storage utility. The two most common strategies are:

1. Timebased Strategy

This strategy relies on timestamps.

  • You save the timestamp of the last successful run in the Storage.

  • On the next 5-minute interval, your API request includes a filter like: ?created_after=[timestamp].

  • The external API only returns items created after that exact moment.

  • Finally, you update the Storage with the new current timestamp.

2. Last ID Strategy

This strategy relies on sequential IDs (e.g., Ticket #100, Ticket #101).

  • You save the highest ID you processed during the last run in the Storage.

  • On the next run, your API request includes a filter like: ?since_id=[Last ID].

  • The API returns only records with an ID greater than the one you saved.

  • You then update the Storage with the newest highest ID.


Flow Control: Processing Lists with "Loop on Items"

When a polling trigger fetches new data, the API usually returns an Array (multiple records grouped together). However, integration actions (like "Create a Card in Pipefy" or "Send an Email") usually expect a single item at a time.

To solve this, you must use the Loop on Items piece.

This flow control tool takes the array of data from your polling step and creates a repetition loop. It isolates each item from the list, allowing you to process them individually one by one within the loop.

How to configure the Loop:

  1. Add the Loop on Items utility right after your polling/HTTP step.

  2. In the "Items" field, map the array variable returned by your API request.

  3. Add the next actions (e.g., "Create Card") inside the loop block.

  4. When mapping data into these nested actions, make sure to select the variables from the Loop on Items step (which represents the current single item), not the original HTTP step.


Common Use Cases

When should you use Polling and Loops instead of standard triggers? Here are the most common scenarios:

  • Legacy Systems and ERPs: Many older databases or on-premise ERPs do not support Webhooks (they cannot push data out). Polling allows Pipefy to securely pull new records from them on a regular schedule.

  • Daily or Hourly Batch Processing: Instead of triggering a flow for every single update, you can set a polling trigger to run once a day (e.g., at 6:00 PM) to fetch an array of all tasks marked as "Done" and create a daily report.

  • Fetching Unreliable Data: If a third-party app has a history of failing to send webhooks, setting up a polling flow using the "Last ID Strategy" guarantees that Pipefy will eventually catch and process every missing record.


Best Practices for Polling and Looping

To ensure your polling flows are efficient and do not fail unexpectedly, follow these architectural best practices:

1. Always Handle Empty Arrays

If the polling trigger runs but there is no new data, the external API will likely return an empty array (e.g., []). If you pass an empty array directly into a Loop, the flow might result in an error or waste processing power.

  • Recommendation: Add a Branch piece immediately after the HTTP request to check if the array is empty. If it has data, proceed to the Loop. If it is empty, stop the flow.

2. Update Storage After Processing

If you are using a Deduplication Strategy (like saving the Last ID in the Storage piece), do not update the Storage before the loop finishes.

  • Recommendation: Place the "Put Storage" action at the very end of your flow (outside and after the loop). This ensures that if the flow fails halfway through processing the list, it will retry fetching those same records on the next run, preventing data loss.

3. Respect API Rate Limits

When a Loop processes an array of 50 items, it executes the nested actions (like sending an HTTP request for each item) extremely fast. Some external APIs might block your integration for sending too many requests per second (Rate Limiting / Error 429).

  • Recommendation: If the destination API has strict rate limits, you can add a small Delay piece (e.g., 1 or 2 seconds) inside the loop to pace the requests.

Did this answer your question?