scrapeshell

Many times, especially during development, it is useful to open an interactive shell to tinker with a page. Often the HTML being returned is slightly out of sync with what is being seen in the browser, and it can be difficult to detect these differences without firing up an interactive python shell and inspecting what urlopen() is returning.

If scrapelib is installed on your path it provides scrapeshell, an entrypoint that will open an IPython shell. It will present the user with an instance of ResultStr with the contents of the scraped page and if lxml is installed, an lxml.html.HtmlElement instance as well.

scrapeshell arguments

url

scrapeshell requires a URL, which will then be retrieved via a urlopen() call.

--ua user_agent

Set a custom user agent (useful for seeing if a site is returning different results based on UA).

--robots

Obey robots.txt (default is to ignore).

--noredirect

Don’t follow redirects.

Project Versions

Table Of Contents

Previous topic

scrapelib overview

Next topic

scrapelib changelog

This Page