Importing a page with `process_url`¶

`process_url`¶

vintage.archiveurl.process_url()¶

Parameters:	url (str) – Required The URL to retrieve kwargs (dict) – Arguments for BeautifulSoup indicating which part of the document to return. If not otherwise specified, the `<body>` contents are returned.
Return type:	`list` plus attribute `metadata`
Returns:	A `list` of `BeautifulSoup` tags plus a `dict` of metadata. The metadata includes the contents of the `<title>` tag and `<meta>` tags. The keys for the `<meta>` data is the contents of its `name` or `http-equiv` attribute.

All kwargs are passed to BeautifulSoup’s findAll method. Valid keyword args:

name

Restricts the set of tags by name. examples:

name='b'
name=re.compile('^b')
name=['title', 'p']
name={'title': True, 'p': True}
name=True  # this returns all tags. Useful when limiting by attrs
name=lambda tag: len(tag.attrs) == 2

attrs

A dictionary that acts just like the BeautifulSoup keyword arguments, but works for situations where there are already function arguments with the same name (like name), or are python keywords (like class).

recursive

Doesn’t really have any effect as you are starting from the top

text

Lets you search for NavigableString objects instead of Tags. Its value can be a string, a regular expression, a list or dictionary, True or None, or a callable that takes a NavigableString object as its argument

limit

Stop after finding this many elements

Any other keyword arguments impose restrictions on Tag attributes.

The list returned also includes a metadata dictionary. This attribute aggregates all the metadata information from the page <head>. The <title> is included under the key title and any <meta> tags it finds. The metadata dictionary uses either the name attribute or http-equiv attribute of the <meta> tag for the key and the content attribute for the value.

Importing a page with `process_url`¶

`process_url`¶

Project Versions

Table Of Contents

Previous topic

Navigation

Importing a page with process_url¶

process_url¶

Project Versions

RTD Search

Table Of Contents

Previous topic

Quick search

Navigation

Importing a page with `process_url`¶

`process_url`¶