A deduplicating, FIFO-with-priority request queue, the in-memory engine
behind every crawler(). Requests are keyed by a normalised unique_key
(see cr_normalize_url()) so the same URL is never enqueued twice. The
queue tracks which requests have been handled, which makes a crawl
resumable: when given a path, its state (pending requests, seen keys,
handled count) can be saved to and restored from disk — see cr_persist().
This class is exported mainly for advanced use and introspection; most users
interact with it indirectly through the cr_* verbs.
Methods
RequestQueue$new()
Create a new, empty request queue.
Arguments
path
Optional path to an .rds file backing the queue state.
RequestQueue$add()
Add a request to the queue.
Usage
RequestQueue$add(
url,
label = NULL,
depth = 0L,
user_data = list(),
method = "GET",
force_unique = FALSE
)
Arguments
url
Character scalar URL.
label
Optional handler label routing this request.
depth
Integer crawl depth (distance from a start URL).
user_data
Optional named list carried with the request.
method
HTTP method, defaults to "GET".
force_unique
If TRUE, skip deduplication.
Returns
Invisibly, TRUE if added, FALSE if a duplicate.
RequestQueue$pop()
Pop the next request from the front of the queue.
Returns
A request list, or NULL when the queue is empty.
RequestQueue$reschedule()
Re-queue a request for another attempt, incrementing its
retry counter.
Usage
RequestQueue$reschedule(request)
Arguments
request
A request list previously obtained from pop().
RequestQueue$mark_handled()
Mark a request as successfully handled.
Usage
RequestQueue$mark_handled()
RequestQueue$pending_count()
Number of requests waiting to be processed.
Usage
RequestQueue$pending_count()
RequestQueue$handled()
Number of requests handled so far.
RequestQueue$is_empty()
Whether the queue has no pending requests.
RequestQueue$set_path()
Set (or clear) the persistence path.
Usage
RequestQueue$set_path(path)
Arguments
path
Path to an .rds file, or NULL.
RequestQueue$has_saved_state()
Whether a persisted state file exists at the queue's path.
Usage
RequestQueue$has_saved_state()
RequestQueue$save()
Persist the queue state to its path (a no-op without one).
RequestQueue$restore()
Replace the in-memory state with the one persisted at path.
RequestQueue$clone()
The objects of this class are cloneable with this method.
Usage
RequestQueue$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.