Package index
-
crawler() - Create a crawler
-
cr_options() - Set crawler options
-
cr_use_http() - Use the HTTP fetch backend
-
cr_use_browser() - Use the headless-browser fetch backend
-
cr_parallel() - Enable parallel (concurrent) fetching
-
cr_autoscale() - Enable autoscaled parallel fetching
-
cr_stream() - Enable the streaming scheduler
-
cr_run() - Run a crawl
-
cr_collect() - Collect crawl results
-
cr_from_sitemap() - Discover URLs from a sitemap
-
cr_from_rss() - Discover URLs from an RSS or Atom feed
-
cr_on_html() - Register an HTML handler
-
cr_on_pdf() - Register a PDF handler
-
cr_chunk() - Chunk text for retrieval-augmented generation
-
cr_embed() - Attach embeddings to chunks
-
cr_export() - Export chunks (and embeddings) for retrieval
-
cr_persist() - Persist a crawl to a run directory (and resume it)
-
cr_dataset() - Configure the dataset backend
-
cr_close() - Release a crawler's resources
-
Crawler-classCrawler - Crawler
-
RequestQueue - Request queue
-
Dataset - Dataset
-
cr_store() - Configure the key-value store for binary content
-
KeyValueStore - Key-value store
-
cr_normalize_url() - Normalise a URL into a canonical form