Getting Started

Installation

Installation is easy using pip.

pip install django-vintage

Then add 'vintage' to INSTALLED_APPS in your project’s settings.py

Setting it up

  1. Create an entry in your urls.py for all your archived pages:

    urlpatterns = patterns('',
        # ...
        (r'^archive/', include('vintage.urls')),
        # ...
    )
    
  2. Create a directory called vintage in your template directory

  3. Create a new template called default.html in that directory

  4. The template receives an object context variable that contains the ArchivedPage instance. Here’s a basic example:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    {% extends "base.html" %}
    {% block head_title %}
        {{ object.title }}
    {% endblock head_title %}
    {% block head_metadata %}
        <meta name="description" content="{{ object.metadata.description }}" />
        <meta name="keywords" content="{{ object.metadata.keywords }}" />
        <meta name="author" content="{{ object.metadata.author }}">
    
        <!--Facebook Metadata /-->
        <meta property="fb:page_id" content="{{ object.metadata.page_id }}" />
        <meta property="og:image" content="{{ object.metadata.image }}" />
        <meta property="og:description" content="{{ object.metadata.description }}"/>
        <meta property="og:title" content="{{ object.metadata.title }}"/>
    
        <!--Google+ Metadata /-->
        <meta itemprop="name" content="{{ object.metadata.title }}">
        <meta itemprop="description" content="{{ object.metadata.description }}">
        <meta itemprop="image" content="{{ object.metadata.image }}">
    {% endblock head_metadata %}
    {% block content %}
        {{ object.content }}
    {% endblock content %}
    

Creating a page manually

  1. Click on the new page button in the admin.
  2. Enter the URL path of the old page. Should be everything after the http://domainname.com. For Example: /articles/2010/may/01/bigfoot-sighted/.
  3. Enter the page’s title.
  4. Enter the original URL of the page. The original URL is used in updating links and images.
  5. View the source of the page you are archiving.
  6. Select the appropriate HTML content (such as everything between the <body> tags).
  7. Paste it into the content field.
  8. Select a template (leave blank for default) see How Django Vintage selects the template
  9. Enter in any of the page’s metadata fields: page_id, image, description, keywords, and author.

Importing a page programatically

Django Vintage provides a method of retreiving a URL and saving it and the files to which it links. The example code below reads URLs from a file and extracts a specific part of the document. Then it runs it through a function to strip out additional parts of the document it doesn’t want, such as copyright information that is already in the template, and forms and scripts. See process_url for more information.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
from urlparse import urlparse
from BeautifulSoup import BeautifulSoup
from vintage.archiveurl import process_url
from vintage.models import ArchivedPage, ArchivedFile

URLS = open(os.path.join(path, 'miscurls.txt')).readlines()

def post_process(main_content):
    for tag in main_content.contents:
        if hasattr(tag, 'name'):
            if tag.name == 'form':
                tag.extract()
            if tag.name == 'div' and tag.get('class', '') == 'emailthis':
                tag.extract()
            if tag.name == 'div' and tag.get('class', '') == 'copyright':
                tag.extract()
            if tag.name == 'script':
                tag.extract()
    return "".join([str(i) for i in main_content.contents])


for url in URLS:
    print "Archiving %s" % url
    initial_html = process_url(url, name="td", width="531", limit=1)
    metadata = initial_html.metadata.copy()
    html = post_process(initial_html[0])
    urlpath = urlparse(url).path.strip()
    ap = ArchivedPage(
        title=metadata.get('title', urlpath) or '',
        url=urlpath,
        original_url=url,
        content=html,
        metadata=metadata)
    ap.save()

Project Versions

Table Of Contents

Previous topic

django-vintage v 0.1

Next topic

Using templates