scrapy-mongoengine-item

PyPI Version Build Status License

scrapy-mongoengine-item is an extension that allows you to define Scrapy items using existing MongoEngine documents.

This utility provides a new class, named MongoEngineItem, that you can use as a regular Scrapy item and link it to a MongoEngine document with its mongoengine_document attribute. Start using it right away by importing it from this package:

from scrapy_mongoengine_item import MongoEngineItem

Installation

Both Python 2.7 and Python 3.5/3.6 are supported. For Python 3 you need Scrapy v1.1 or above.

Latest tested MongoEngine version is MongoEngine 0.17.0.

Install from PyPI using:

pip install scrapy-mongoengine-item

Introduction

MongoEngineItem is a class of item that gets its fields definition from a MongoEngine document, you simply create a MongoEngineItem and specify what MongoEngine document it relates to.

Besides of getting the document fields defined on your item, MongoEngineItem provides a method to create and populate a MongoEngine document instance with the item data.

Usage

MongoEngineItem works as follows: you create a subclass and define its mongoengine_document attribute to be a valid MongoEngine document. With this you will get an item with a field for each MongoEngine document field.

In addition, you can define fields that aren’t present in the document and even override fields that are present in the model defining them in the item.

Let’s see some examples:

Creating a MongoEngine document for the examples:

from mongoengine import fields, document

class Person(document.Document):

    name = fields.StringField(max_length=255)
    age = fields.IntField()

Defining a basic MongoEngineItem:

from scrapy_mongoengine_item import MongoEngineItem

class PersonItem(MongoEngineItem):

    mongoengine_document = Person

MongoEngineItem works just like Scrapy items:

p = PersonItem()
p['name'] = 'John'
p['age'] = 22

To obtain the MongoEngine document from the item, we call the extra method MongoEngineItem.save() of the MongoEngineItem:

person = p.save()
person.name
# 'John'
person.age
# 22
person.id
# 1

The document is already saved when we call MongoEngineItem.save(), we can prevent this by calling it with commit=False. We can use commit=False in MongoEngineItem.save() method to obtain an unsaved document:

person = p.save(commit=False)
person.name
# 'John'
person.age
# 22
person.id
# None

As said before, we can add other fields to the item:

import scrapy
from scrapy_mongoengine_item import MongoEngineItem

class PersonItem(MongoEngineItem):

    mongoengine_document = Person
    sex = scrapy.Field()
p = PersonItem()
p['name'] = 'John'
p['age'] = 22
p['sex'] = 'M'

And we can override the fields of the document with your own:

class PersonItem(MongoEngineItem):

    mongoengine_document = Person
    name = scrapy.Field(default='No Name')

This is useful to provide properties to the field, like a default or any other property that your project uses. Those additional fields won’t be taken into account when doing a MongoEngineItem.save().

Development

Testing

To run a tests in your working environment type:

./runtests.py

To test with all supported Python versions type:

tox

Running MongoDB

The easiest way is to run it via Docker:

docker pull mongo:latest
docker run -p 27017:27017 mongo:latest

Writing documentation

Keep the following hierarchy.

=====
title
=====

header
======

sub-header
----------

sub-sub-header
~~~~~~~~~~~~~~

sub-sub-sub-header
^^^^^^^^^^^^^^^^^^

sub-sub-sub-sub-header
++++++++++++++++++++++

sub-sub-sub-sub-sub-header
**************************

License

GPL 2.0/LGPL 2.1

Support

For any issues contact me at the e-mail given in the Author section.

Author

Artur Barseghyan <artur.barseghyan@gmail.com>

Documentation

Contents:

Release history and notes

Sequence based identifiers are used for versioning (schema follows below):

major.minor[.revision]
  • It’s always safe to upgrade within the same minor version (for example, from 0.3 to 0.3.4).
  • Minor version changes might be backwards incompatible. Read the release notes carefully before upgrading (for example, when upgrading from 0.3.4 to 0.4).
  • All backwards incompatible changes are mentioned in this document.

0.1.4

2019-03-16

  • Clean up. Add proper docs.

0.1.3

2019-03-14

  • Beta release.

0.1.2

2019-03-14

  • Initial alpha release.

scrapy_mongoengine_item package

Module contents

Scrapy extension to write scraped items using mongoengine documents

class scrapy_mongoengine_item.MongoEngineItem(*args, **kwargs)[source]

Bases: scrapy.item.Item

errors
fields = {}
instance
is_valid()[source]
mongoengine_document = None
save(commit=True)[source]

Indices and tables