Registers a handler invoked for responses classified as PDF — by
Content-Type (application/pdf) or a .pdf URL. The handler context adds
PDF-specific helpers on top of the usual ones.
Arguments
- crawler
A Crawler.
- handler
A function of one argument (the context). See Context.
- label
Optional handler label;
NULLregisters the default PDF handler.
Details
Requests carrying an explicit label are always routed to the handler
registered for that label (regardless of content kind); label = NULL
registers the default PDF handler.
Context
In addition to the elements documented in cr_on_html(), a PDF handler's
context provides:
kind"pdf".pdf_text()Extract text per page (requires the pdftools package), returning a character vector.
body_raw()The raw PDF bytes.
save_body(key, ext)Persist the PDF to the KeyValueStore.