Use the headless-browser fetch backend — cr_use

Switches the crawler to render pages with a headless Chrome/Chromium via the chromote package — for JavaScript-heavy sites where the plain HTTP backend would see an empty shell. Handlers work exactly as with cr_use_http() (ctx$page, enqueue_links(), ...), and additionally gain ctx$screenshot().

Usage

cr_use_browser(crawler, wait = 0, wait_selector = NULL)

Arguments

crawler: A Crawler.
wait: Seconds to wait after page load before capturing the DOM (useful for late-rendering content).
wait_selector: Optional CSS selector to wait for before capturing.

Value

The crawler, invisibly.

Details

Requires chromote and a Chrome/Chromium installation. PDF extraction still requires the HTTP backend.

Examples

if (FALSE) { # \dontrun{
crawler("https://example.com") |>
  cr_use_browser(wait_selector = ".results") |>
  cr_on_html(\(ctx) {
    ctx$push_data(list(url = ctx$request$url))
    ctx$screenshot()
  }) |>
  cr_run()
} # }