Installation is easy using pip.
pip install django-vintage
Then add 'vintage' to INSTALLED_APPS in your project’s settings.py
Create an entry in your urls.py for all your archived pages:
urlpatterns = patterns('', # ... (r'^archive/', include('vintage.urls')), # ... )
Create a directory called vintage in your template directory
Create a new template called default.html in that directory
The template receives an object context variable that contains the ArchivedPage instance. Here’s a basic example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | {% extends "base.html" %}
{% block head_title %}
{{ object.title }}
{% endblock head_title %}
{% block head_metadata %}
<meta name="description" content="{{ object.metadata.description }}" />
<meta name="keywords" content="{{ object.metadata.keywords }}" />
<meta name="author" content="{{ object.metadata.author }}">
<!--Facebook Metadata /-->
<meta property="fb:page_id" content="{{ object.metadata.page_id }}" />
<meta property="og:image" content="{{ object.metadata.image }}" />
<meta property="og:description" content="{{ object.metadata.description }}"/>
<meta property="og:title" content="{{ object.metadata.title }}"/>
<!--Google+ Metadata /-->
<meta itemprop="name" content="{{ object.metadata.title }}">
<meta itemprop="description" content="{{ object.metadata.description }}">
<meta itemprop="image" content="{{ object.metadata.image }}">
{% endblock head_metadata %}
{% block content %}
{{ object.content }}
{% endblock content %}
|
Django Vintage provides a method of retreiving a URL and saving it and the files to which it links. The example code below reads URLs from a file and extracts a specific part of the document. Then it runs it through a function to strip out additional parts of the document it doesn’t want, such as copyright information that is already in the template, and forms and scripts. See process_url for more information.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | from urlparse import urlparse
from BeautifulSoup import BeautifulSoup
from vintage.archiveurl import process_url
from vintage.models import ArchivedPage, ArchivedFile
URLS = open(os.path.join(path, 'miscurls.txt')).readlines()
def post_process(main_content):
for tag in main_content.contents:
if hasattr(tag, 'name'):
if tag.name == 'form':
tag.extract()
if tag.name == 'div' and tag.get('class', '') == 'emailthis':
tag.extract()
if tag.name == 'div' and tag.get('class', '') == 'copyright':
tag.extract()
if tag.name == 'script':
tag.extract()
return "".join([str(i) for i in main_content.contents])
for url in URLS:
print "Archiving %s" % url
initial_html = process_url(url, name="td", width="531", limit=1)
metadata = initial_html.metadata.copy()
html = post_process(initial_html[0])
urlpath = urlparse(url).path.strip()
ap = ArchivedPage(
title=metadata.get('title', urlpath) or '',
url=urlpath,
original_url=url,
content=html,
metadata=metadata)
ap.save()
|