Convert web pages and docs sites to a clean PDF/DOCX
Start with depth=0 to test the start page quickly.
0 = just the start page. 1 = links on that page. 2 = links of links, etc.
Upper limit for how many pages to include in the bundle.
PDF uses WeasyPrint. DOCX uses Pandoc if installed (falls back to HTML).
Keep the crawl on the same registered domain. Usually keep this on.
Best-effort. For personal use, you may disable. For public service, leave on.
If set, we extract only from this element (useful for docs main column). Otherwise we auto-detect.
If set, we only follow links found inside this element (e.g., a left docs menu). Great for staying on-topic.
Only URLs matching these patterns are crawled. Leave blank to let the preset pick.
URLs matching any pattern here are skipped (e.g., search pages, images).
We'll save as your-title.pdf / .docx / .html.
your-title.pdf
.docx
.html
Your file has been successfully processed.