crawlkit

Versions

latest
stable

Description

A crawler based on PhantomJS. Allows discovery of dynamic content and supports custom scrapers. For all your ajaxy crawling & scraping needs. * Parallel crawling/scraping via Phantom pooling. * Custom-defined link discovery. * Custom-defined runners (scrape, test, validate, etc.) * Can follow redirects (and because it's based on PhantomJS, JavaScript redirects will be followed as well as <meta> redirects.) * Streaming * Resilient to PhantomJS crashes * Ignores page errors