django-feedmapper¶
django-feedmapper is a library for synchronizing data from feeds with Django models. The process of synchronizing the data requires the use of three pieces: a parser, a mapping, and a schedule.
Installation¶
Install from PyPI:
pip install django-feedmapper
Add feedmapper
to your settings.py
file:
INSTALLED_APPS = (
...
'feedmapper',
...
)
If you are using South, run the migrations:
./manage.py migrate feedmapper
Otherwise, run syncdb
:
./manage.py syncdb
Parsers¶
A parser defines methods for validating and parsing data from incoming feeds. There are two built-in parsers, XMLParser
and AtomParser
. You can write your own parser by subclassing the base Parser
class.
Mapping¶
A mapping is written in JSON and describes how and when data from an incoming feed should be mapped to Django models. You can perform the following types of mappings:
- One field in a model to one field from a feed
- One field in a model to multiple fields from a feed
- One field in a model to a transformer method on the model
You can also set the following properties on a mapping through the Django admin:
- Data source
- Synchronization schedule
- Purge existing data
An example: users¶
Let’s get into an example. Suppose we have the following incoming XML data and we want to map each <user>
to Django’s User
model:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | <?xml version="1.0" ?>
<auth>
<users>
<user>
<username>vader</username>
<first_name>Anakin</first_name>
<last_name>Skywalker</last_name>
<email>vader@sith.org</email>
<date_joined>2050-01-31T20:00-4:00</date_joined>
</user>
<user>
<username>kenobi</username>
<first_name>Obi-Wan</first_name>
<last_name>Kenobi</last_name>
<email>kenobi@jedi.org</email>
<date_joined>2000-01-31T20:00-4:00</date_joined>
</user>
</users>
</auth>
|
We need to specify a JSON map:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | {
"models": {
"myapp.Thing": {
"nodePath": "users.user",
"identifier": "username",
"fields": {
"username": "username",
"email": "email",
"name": ["first_name", "last_name"],
"date_joined": {
"transformer": "convert_date",
"fields": ["date_joined"]
},
}
}
}
}
|
Let’s break this down a bit. First, we can specify one or more models to map:
1 2 | "models": {
"myapp.Thing": {
|
We need to tell the parser the path to all of the <user>
elements:
1 | "nodePath": "users.user",
|
If the mapping has purging turned off, we need to supply a unique idenfier for Django ORM get
calls. In this case our resulting ORM call would be User.objects.get(username=username)
:
1 | "identifier": "username",
|
Now the fun part. Mapping the fields:
1 2 3 4 5 6 7 8 9 | "fields": {
"username": "username",
"email": "email",
"name": ["first_name", "last_name"],
"date_joined": {
"transformer": "convert_date",
"fields": ["date_joined"]
},
}
|
We’ve got example of all three types of field mappings here.
username
and email
are one-to-one mappings:
1 2 | "username": "username",
"email": "email",
|
name
is mapped to multiple fields. The parser will concatenate these fields, putting a space between them:
1 | "name": ["first_name", "last_name"],
|
date_joined
uses a transformer, which is simply a method defined on your model to do some manipulation to the incoming data before inserting it in a field. Here we tell the parser that the date_joined
field should map to the date_joined
field in the XML but use the convert_date
method to transform the incoming data:
1 2 3 4 | "date_joined": {
"transformer": "convert_date",
"fields": ["date_joined"]
},
|
Scheduling¶
There are two ways to schedule the synchonization of mappings.
Using django-celery¶
The first scheduling method, and the preferred, is to use django-celery. To take advantage of this scheduling method, take the following steps:
- Install django-celery. If you’ve never done this before, it can be a little complicated. You’ll want to read through the official docs. An example of some basic settings is in
example/settings_celery.py
:
1 2 3 4 5 6 7 8 9 10 11 12 | from .settings import *
import djcelery
djcelery.setup_loader()
INSTALLED_APPS += ('djcelery',)
CELERYBEAT_SCHEDULER = 'djcelery.schedulers.DatabaseScheduler'
BROKER_HOST = "localhost"
BROKER_PORT = 5672
BROKER_USER = "guest"
BROKER_PASSWORD = "guest"
BROKER_VHOST = "/"
|
Make sure you enable the Django database scheduler of django-celery by adding the following to your
settings.py
file:CELERYBEAT_SCHEDULER = 'djcelery.schedulers.DatabaseScheduler'
Now every time you save a mapping, it will either create or update a matching django-celery PeriodicTask in the database. By default the periodic task will run once an hour. If you want to change this, visit the PeriodicTask in the Django admin (/admin/djcelery/periodictask/
by default) and modify the interval or crontab settings:
Using feedmapper_sync¶
Of course, not everyone has resources or need to use a message queue solution. The second scheduling method is by setting up a cron job and using the feedmapper_sync
management command. Make sure you have the DJANGO_SETTINGS_MODULE
environment variable set and add the following to your crontab:
* * * * * /full/path/to/bin/django-admin.py feedmapper_sync
If you only want to sync a subset of the mappings you can supply one or more mapping IDs to the management command:
* * * * * /full/path/to/bin/django-admin.py feedmapper_sync 3 8 22
Contributing¶
To contribute to django-feedmapper create a fork on github. Clone your fork, make some changes, and submit a pull request.
Issues¶
Use the github issue tracker for django-feedmapper to submit bugs, issues, and feature requests.