django-zombodb documentation¶
Easy Django integration with Elasticsearch through ZomboDB Postgres Extension. Thanks to ZomboDB, your Django models are synced with Elasticsearch at transaction time! Searching is also very simple: you can make Elasticsearch queries by just calling one of the search methods on your querysets. Couldn’t be easier!
Installation and Configuration¶
Example¶
You can check a fully configured Django project with django-zombodb at https://github.com/vintasoftware/django-zombodb/tree/master/example
Requirements¶
Python: 3.5, 3.6, 3.7
Django: 2.0, 2.1
Settings¶
Set ZOMBODB_ELASTICSEARCH_URL
on your settings.py. That is the URL of the ElasticSearch cluster used by ZomboDB.
ZOMBODB_ELASTICSEARCH_URL = 'http://localhost:9200/'
Move forward to learn how to integrate your models with Elasticsearch.
Integrating with Elasticsearch¶
ZomboDB integrates Postgres with Elasticsearch through Postgres indexes. If you don’t know much about ZomboDB, please read its tutorial before proceeding.
Installing ZomboDB extension¶
Since ZomboDB is a Postgres extension, you must install and activate it. Follow the official ZomboDB installation instructions.
Activating ZomboDB extension¶
django-zombodb provides a Django migration operation to activate ZomboDB extension on your database. To run it, please make sure your database user is a superuser:
psql -d your_database -c "ALTER USER your_database_user SUPERUSER"
Then create an empty migration on your “main” app (usually called “core” or “common”):
python manage.py makemigrations core --empty
Add the django_zombodb.operations.ZomboDBExtension
operation to the migration you’ve just created:
import django_zombodb.operations
class Migration(migrations.Migration):
dependencies = [
('restaurants', '0001_initial'),
]
operations = [
django_zombodb.operations.ZomboDBExtension(),
...
]
Alternatively, you can activate the extension manually with a command. But you should avoid this because you’ll need to remember to run this on production, on tests, and on the machines of all your co-workers:
psql -d django_zombodb -c "CREATE EXTENSION zombodb"
Creating an index¶
Imagine you have the following model:
class Restaurant(models.Model):
name = models.TextField()
street = models.TextField()
To integrate it with Elasticsearch, we need to add a ZomboDBIndex
to it:
from django_zombodb.indexes import ZomboDBIndex
class Restaurant(models.Model):
name = models.TextField()
street = models.TextField()
class Meta:
indexes = [
ZomboDBIndex(fields=[
'name',
'street',
]),
]
After that, create and run the migrations:
python manage.py makemigrations
python manage.py migrate
Warning
During the migration, ZomboDBIndex
reads the value at settings.ZOMBODB_ELASTICSEARCH_URL
. That means if settings.ZOMBODB_ELASTICSEARCH_URL
changes after the ZomboDBIndex
migration, the internal index stored at Postgres will still point to the old URL. If you wish to change the URL of an existing ZomboDBIndex
, change both settings.ZOMBODB_ELASTICSEARCH_URL
and issue a ALTER INDEX index_name SET (url='http://some.new.url');
(preferably inside a migrations.RunSQL
in a new migration).
Now the Restaurant
model will support Elasticsearch queries for both name
and street
fields. But to perform those searches, we need it to use the custom queryset SearchQuerySet
:
from django_zombodb.indexes import ZomboDBIndex
from django_zombodb.querysets import SearchQuerySet
class Restaurant(models.Model):
name = models.TextField()
street = models.TextField()
objects = models.Manager.from_queryset(SearchQuerySet)()
class Meta:
indexes = [
ZomboDBIndex(fields=[
'name',
'street',
]),
]
Note
If you already have a custom queryset on your model, make it inherit from SearchQuerySetMixin
.
Field mapping¶
From Elasticsearch documentation:
“Mapping is the process of defining how a document, and the fields it contains, are stored and indexed. For instance, use mappings to define:
which string fields should be treated as full text fields.
which fields contain numbers, dates, or geolocations.
whether the values of all fields in the document should be indexed into the catch-all _all field.
the format of date values.
custom rules to control the mapping for dynamically added fields.”
If you don’t specify a mapping for your ZomboDBIndex
, django-zombodb uses ZomboDB’s default mappings, which are based on the Postgres type of your model fields.
To customize mapping, specify a field_mapping
parameter to your ZomboDBIndex
like below:
from django_zombodb.indexes import ZomboDBIndex
from django_zombodb.querysets import SearchQuerySet
class Restaurant(models.Model):
name = models.TextField()
street = models.TextField()
objects = models.Manager.from_queryset(SearchQuerySet)()
class Meta:
indexes = [
ZomboDBIndex(
fields=[
'name',
'street',
],
field_mapping={
'name': {"type": "text",
"copy_to": "zdb_all",
"analyzer": "fulltext_with_shingles",
"search_analyzer": "fulltext_with_shingles_search"},
'street': {"type": "text",
"copy_to": "zdb_all",
"analyzer": "brazilian"},
}
)
]
Note
You probably wish to have "copy_to": "zdb_all"
on your textual fields to match ZomboDB default behavior. From ZomboDB docs: “zdb_all
is ZomboDB’s version of Elasticsearch’s “_all” field, except zdb_all
is enabled for all versions of Elasticsearch. It is also configured as the default search field for every ZomboDB index”. For more info, read Elasticsearch docs take on the “_all” field.
Move forward to learn how to perform Elasticsearch queries through your model.
Searching¶
On models with ZomboDBIndex
, use methods from SearchQuerySet
/SearchQuerySetMixin
to perform various kinds of Elasticsearch queries:
query_string_search¶
The query_string_search()
method implements the simplest type of Elasticsearch queries: the ones with the query string syntax. To use it, just pass as an argument a string that follows the query string syntax.
Restaurant.objects.query_string_search("brasil~ AND steak*")
dsl_search¶
The query string syntax is user-friendly, but it’s limited. For supporting all kinds of Elasticsearch queries, the recommended way is to use the dsl_search()
method. It accepts arguments of elasticsearch-dsl-py Query
objects. Those objects have the same representation power of the Elasticsearch JSON Query DSL. You can do “match”, “term”, and even compound queries like “bool”.
Here we’re using the elasticsearch-dsl-py Q
shortcut to create Query
objects:
from elasticsearch_dsl import Q as ElasticsearchQ
query = ElasticsearchQ(
'bool',
must=[
ElasticsearchQ('match', name='pizza'),
ElasticsearchQ('match', street='school')
]
)
Restaurant.objects.dsl_search(query)
dict_search¶
If you already have a Elasticsearch JSON query mounted as a dict
, use the dict_search()
method. The dict
will be serialized using the JSONSerializer
of elasticsearch-py, the official Python Elasticsearch client. This means dict values of date
, datetime
, Decimal
, and UUID
types will be correctly serialized.
Validation¶
If you’re receiving queries from the end-user, particularly query string queries, you should call the search methods with validate=True
. This will perform Elasticsearch-side validation through the Validate API. When doing that, InvalidElasticsearchQuery
may be raised.
from django_zombodb.exceptions import InvalidElasticsearchQuery
queryset = Restaurant.objects.all()
try:
queryset = queryset.query_string_search("AND steak*", validate=True)
except InvalidElasticsearchQuery:
messages.error(request, "Invalid search query. Not filtering by search.")
Sorting by score¶
By default, the resulting queryset from the search methods is unordered. You can get results ordered by Elasticsearch’s score passing sort=True
.
Restaurant.objects.query_string_search("brasil~ AND steak*", sort=True)
Alternatively, if you want to combine with your own order_by
, you can use the method annotate_score()
:
Restaurant.objects.query_string_search(
"brazil* AND steak*"
).annotate_score(
attr='zombodb_score'
).order_by('-zombodb_score', 'name', 'pk')
Limiting¶
It’s a good practice to set a hard limit to the number of search results. For most search use cases, you shouldn’t need more than a certain number of results, either because users will only consume some of the high scoring results, or because documents with lower scores aren’t relevant to your process. To limit the results, use the limit
parameter on search methods:
Restaurant.objects.query_string_search("brasil~ AND steak*", limit=1000)
Lazy and Chainable¶
The search methods are like the traditional filter
method: they return a regular Django QuerySet
that supports all operations, and that’s lazy and chainable. Therefore, you can do things like:
Restaurant.objects.filter(
name__startswith='Pizza'
).query_string_search(
'name:Hut'
).filter(
street__contains='Road'
)
Warning
It’s fine to call filter
/exclude
/etc. before and after search. If possible, the best would be using only a Elasticsearch query. However, it’s definitely slow to call search methods multiple times on the same queryset! Please avoid this:
Restaurant.objects.query_string_search(
'name:Pizza'
).query_string_search(
'name:Hut'
)
While that may work as expected, it’s extremely inneficient. Instead, use compound queries like “bool”. They’ll be much faster. Note that “bool” queries might be quite confusing to implement. Check tutorials about them, like this one.
Missing features¶
Currently django-zombodb doesn’t support ZomboDB’s offset and sort functions that work on the Elasticsearch side. Regular SQL LIMIT/OFFSET/ORDER BY works fine, therefore traditional QuerySet
operations work, but aren’t as performant as doing the same on ES side.
django_zombodb package¶
Submodules¶
django_zombodb.admin_mixins module¶
django_zombodb.apps module¶
django_zombodb.base_indexes module¶
django_zombodb.exceptions module¶
django_zombodb.helpers module¶
django_zombodb.indexes module¶
-
class
django_zombodb.indexes.
ZomboDBIndex
(*, shards=None, replicas=None, alias=None, refresh_interval=None, type_name=None, bulk_concurrency=None, batch_size=None, compression_level=None, llapi=None, field_mapping=None, **kwargs)[source]¶ Bases:
django.contrib.postgres.indexes.PostgresIndex
-
suffix
= 'zombodb'¶
-
django_zombodb.operations module¶
django_zombodb.querysets module¶
-
class
django_zombodb.querysets.
SearchQuerySet
(model=None, query=None, using=None, hints=None)[source]¶ Bases:
django_zombodb.querysets.SearchQuerySetMixin
,django.db.models.query.QuerySet
django_zombodb.serializers module¶
Module contents¶
Change Log¶
0.3.0 (2019-07-18)¶
Support for custom Elasticsearch mappings through
field_mapping
parameter onZomboDBIndex
.Support to
limit
parameter on search methods.
0.2.1 (2019-06-13)¶
Dropped support for Python 3.4.
Added missing imports to docs.
0.2.0 (2019-03-01)¶
Removed parameter
url
fromZomboDBIndex
. This simplifies the support of multiple deployment environments (local, staging, production), because the ElasticSearch URL isn’t copied to inside migrations code (see Issue #17).
0.1.0 (2019-02-01)¶
First release on PyPI.
Contributing¶
Contributions are welcome, and they are greatly appreciated! Every little bit helps, and credit will always be given.
You can contribute in many ways:
Types of Contributions¶
Report Bugs¶
Report bugs at https://github.com/vintasoftware/django-zombodb/issues. Please fill the fields of the issue template.
Fix Bugs¶
Look through the GitHub issues for bugs. Anything tagged with “bug” is open to whoever wants to implement it.
Implement Features¶
Look through the GitHub issues for features. Anything tagged with “feature” is open to whoever wants to implement it.
Write Documentation¶
django-zombodb could always use more documentation, whether as part of the official django-zombodb docs, in docstrings, or even on the web in blog posts, articles, and such.
Submit Feedback¶
The best way to send feedback is to file an issue at https://github.com/vintasoftware/django-zombodb/issues.
If you are proposing a feature:
Explain in detail how it would work.
Keep the scope as narrow as possible, to make it easier to implement.
Remember that this is a volunteer-driven project, and that contributions are welcome :)
Get Started!¶
Ready to contribute? Here’s how to set up django-zombodb for local development.
Fork the django-zombodb repo on GitHub.
Clone your fork locally:
$ git clone git@github.com:your_name_here/django-zombodb.git
Install your local copy into a virtualenv. Assuming you have virtualenvwrapper installed, this is how you set up your fork for local development:
$ mkvirtualenv django-zombodb $ cd django-zombodb/ $ pip install -e . $ make install_requirements
Create a branch for local development:
$ git checkout -b name-of-your-bugfix-or-feature
Now you can make your changes locally.
When you’re done making changes, check that your changes pass the linters and the tests, including testing other Python versions with tox:
$ make lint $ make test $ make test-all
Commit your changes and push your branch to GitHub:
$ git add . $ git commit -m "Your detailed description of your changes." $ git push origin name-of-your-bugfix-or-feature
Submit a pull request through the GitHub website.
Pull Request Guidelines¶
Before you submit a pull request, check that it meets these guidelines:
The pull request should include tests.
If the pull request adds functionality, the docs should be updated.
The pull request should pass CI. Check https://travis-ci.org/vintasoftware/django-zombodb/pull_requests and make sure that the tests pass for all supported Python versions.
Credits¶
Development Lead¶
Flávio Juvenal <flavio@vinta.com.br>
Contributors¶
None yet. Why not be the first?