A continuous-pool alternative to cr_parallel()'s synchronous batches. The
streaming engine keeps requests in flight at all times (via async promises,
httr2::req_perform_promise()): the moment one request finishes, its handler
runs and the next request is pulled from the queue to refill the slot. Under
heterogeneous response latency this avoids the "wait for the slowest in the
batch" stall and improves throughput.
Arguments
- crawler
A Crawler.
- concurrency
Number of requests to keep in flight (the fixed target, or the starting point is
minwhenadaptive = TRUE).- adaptive
If
TRUE, adapt the in-flight target within[min, max].- min, max
Bounds for the adaptive target.
maxdefaults toconcurrency.
Details
With adaptive = TRUE the in-flight target adapts at run time (AIMD on
back-pressure, like cr_autoscale()), within [min, max].
Launches are paced per host: a host is not hit again until delay /
robots.txt Crawl-delay has elapsed, while different hosts run in
parallel. With delay = 0 and no Crawl-delay, pacing is a no-op.
Requires the promises and later packages, and the HTTP backend.