Welcome to Data Hub API’s documentation!¶
Contents:
Data Hub API Overview¶
API for the UKTI Data Hub.
Official docs on Read the Docs
Dependencies¶
- Virtualenv
- Most recent version of pip
- Python 3.5 (Can be installed using
brew
) - Postgres 9.5+
Installation¶
Clone the repository:
git clone git@github.com:UKTradeInvestment/data-hub-api.git
Next, create the environment and start it up:
cd data-hub-api
virtualenv env --python=python3.5
source env/bin/activate
Update pip to the latest version:
pip install -U pip
Install python dependencies:
pip install -r requirements/local.txt
Create the database in postgres called data-hub-api.
For OSX, update the PATH
and DYLD_LIBRARY_PATH
environment
variables if necessary:
export PATH="/Applications/Postgres.app/Contents/MacOS/bin/:$PATH"
export DYLD_LIBRARY_PATH="/Applications/Postgres.app/Contents/MacOS/lib/:$DYLD_LIBRARY_PATH"
Create a local.py
settings file from the example file and set the CDMS settings/credentials:
cp data-hub-api/settings/local.example.py data-hub-api/settings/local.py
Sync and migrate the database:
./manage.py migrate
Start the server:
./manage.py runserver 8000
CDMS Sync¶
The problem, the options and our approach¶
The problem¶
We are migrating away from Microsoft Dynamics 2011 (CDMS) and decided to build a new CRM system (Data Hub) using a gradual incremental approach.
During a period of several months, the following constraints apply:
- data between CDMS and Data Hub needs to be kept in sync
- Data Hub needs allow re-modeling by adding/removing types/properties
- some users would continue to use CDMS whilst we transition from one system to the other
The options¶
We considered different approaches including:
- use CDMS as data store and access it directly. This has many disadvantages including hosting CDMS, not being able to easily change the schemas, architecture complexity etc.
- use two data stores with some sort of low level synchronization (via database or processes). This as well has many disadvantages including integrating with old technologies (Dynamics 2011), two separate layers (code and sync logic) depending on each other tightly and hard to manage, synchronisation conflicts etc.
- use two data stores with code-managed synchronization. This is the chosen architecture and has some disadvantages as well that we will explain later.
The chosen approach¶
Two data stores with reads and writes to CDMS happening as usual and synchronisation triggered from io actions in Data Hub.
- Writes to Data Hub will:
- get the object from CDMS (if it exists)
- apply the changes and write to CDMS
- apply the changes in Data Hub
- Reads from Data Hub will:
- get the object from the Data Hub data store
- get the related object from CDMS
- check if CDMS was updated after the last synchronisation
- if so, update the Data Hub object
- return the local results
Read and write operations are performed as a single transaction so that changes are rolled back in case of exceptions with CDMS.
The same object on both systems is considered in sync if the modified field value is the same. If the modified value of the CDMS version is more recent, it means that the Data Hub object has to be updated from the CDMS one. If the modified value of the Data Hub version is more recent, an exception is triggered as this should never happen. This is because writes on the Data Hub always generate writes in CDMS but the vice versa is obviously not true.
The possibility of conflicts is low as:
- objects on the two systems are kept in sync via the modified field updated after each CDMS get
- concurrent operations to a single object are low or non-existent in volume
In case two updates happen at approximately the same time, the last one wins. This should not be a problem as the system keeps a history of the changes.
Limitations¶
There are some limitations in using this approach:
- Amount of requests. This has not been measured yet but could (and should) be partially addressed by using some sort of caching strategy
- The synchronisation happens using one common CDMS user
- Some Django ORM API cannot be easily implemented. E.g.
Model.objects.count()
,Model.objects.filter(field1__field2='something')
. This is mainly because of the old CDMS technologies- It might not be easy to change the Django schema in many cases as the sync layer prefers a one-to-one mapping.
Integration instructions¶
How it works¶
A custom django manager / queryset intercepts reads / writes and takes care of all the CDMS operations. This means that developers can ignore this extra complexity and use the django orm api as usual.
That being said, only a subset of the django orm api have been implemented and are even possible. Check Django ORM integration for the full list of the ORM calls supported.
Project setup¶
The cdms_api
app contains the CDMS API library whilst the migrator
app defines all the code needed for
the synchronisation.
Note
It’s really important that you keep all the logic related to the CDMS sync in one place and to a minimum so that it’s easy to get rid of it when it’s time to shut down CDMS and delete the sync layer altogether.
For this reason, we decided to keep this logic in the migrator
app and in one single file
per django app (conventionally called cdms_migrator.py
).
Your app and your Django model¶
- Set up Django app/model
Create a new Django app or simply a new Django Model as needed.
- CDMSMigrator
In a module called
<your-app>/cdms_migrator.py
, subclassmigrator.cdms_migrator.BaseCDMSMigrator
and define the mapping fields and the CDMS service.Add the CDMSMigrator to the Django Model as per step 3.
- Configure your model
Change your model so that it looks like the one below:
Note
for Foreign key fields, you should use
core.fields.UKTIForeignKey
instead of the Django one.from reversion import revisions as reversion from django.db import models from core.models import CRMBaseModel from core.managers import CRMManager from .cdms_migrator import MyModelMigrator @reversion.register() class MyModel(CRMBaseModel): .... objects = CRMManager() cdms_migrator = MyModelMigrator()
- Create a migration for your model as usual
./manage.py makemigrations ./manage.py migrate
CDMSMigrator¶
The mapping between your model and the CDMS one is defined in your model’s CDMSMigrator class which should
be in <your-app>/cdms_migrator.py
.
Extend the migrator.cdms_migrator.BaseCDMSMigrator
class and define the fields
and service
attributes.
For example
from cdms_api import fields as cdms_fields
from migrator.cdms_migrator import BaseCDMSMigrator
class OrganisationMigrator(BaseCDMSMigrator):
fields = {
'name': cdms_fields.StringField('Name'),
'uk_organisation': cdms_fields.BooleanField('optevia_ukorganisation'),
...
}
service = 'Account' # this is the Dynamics resource name
Django ORM integration¶
Operations that cause synchronisation¶
.filter(...)
operations make a CDMS API call to get the CDMS objects with the same translated filtering,
refresh the local objects by updating or creating them and then return the standard Django results.
.get(...)
operations get the object in local and in CDMS, compare the two, update the local one if
needed and then return the standard Django result.
.create(...)
or .save()
operations create the object in local and in CDMS. In case of exceptions
with CDMS the local changes are rolled back.
.save()
operations update the object in local and in CDMS. In case of exceptions with CDMS the local
changes are rolled back.
.delete()
operations delete the object in local and in CDMS. In case of exceptions with CDMS the local
changes are rolled back.
✔ Supported ✘ Not supported
API | Description |
---|---|
✔ Klass.objects.filter(field__exact=...) | |
✔ Klass.objects.filter(field__iexact=...) | |
✔ Klass.objects.filter(field__contains=...) | |
✔ Klass.objects.filter(field__icontains=...) | |
✘ Klass.objects.filter(field__in=...) | |
✔ Klass.objects.filter(field__gt=...) | |
✔ Klass.objects.filter(field__gte=...) | |
✔ Klass.objects.filter(field__lt=...) | |
✔ Klass.objects.filter(field__lte=...) | |
✔ Klass.objects.filter(field__startswith=...) | |
✔ Klass.objects.filter(field__istartswith=...) | |
✔ Klass.objects.filter(field__endswith=...) | |
✔ Klass.objects.filter(field__iendswith=...) | |
✘ Klass.objects.filter(field__range=...) | |
✔ Klass.objects.filter(field__year=...) | |
✔ Klass.objects.filter(field__day=...) | |
✘ Klass.objects.filter(field__week_day=...) | |
✔ Klass.objects.filter(field__hour=...) | |
✔ Klass.objects.filter(field__minute=...) | |
✔ Klass.objects.filter(field__second=...) | |
✘ Klass.objects.filter(field__isnull=...) | Not yet implemented but we should really support it. |
✘ Klass.objects.filter(field__search=...) | |
✘ Klass.objects.filter(field__regex=...) | |
✘ Klass.objects.filter(field__iregex=...) |
API | Description |
---|---|
✔ Klass.objects.all() | It only syncs the top 50 objects from CDMS as it would be infeasible to sync all of them. |
✔ Klass.objects.filter(field=...) | |
✔ Klass.objects.filter(Q(field=...)) | |
✔ Klass.objects.filter(field1=..., field2=...) | |
✔ Klass.objects.filter(Q(field1=...) & Q(field2=...)) | |
✔ Klass.objects.filter(Q(field1=...) | Q(field2=...)) | |
✔ Klass.objects.filter(field1=...).filter(field2=...) | |
✔ Klass.objects.filter(Q(Q(field1=...) & Q(field2=...)) & Q(field3=...)) | |
✔ Klass.objects.exclude(field=...) | |
✔ Klass.objects.exclude(field1=..., field2=...) | |
✔ Klass.objects.exclude(field1=...).exclude(field2=...) | |
✔ Klass.objects.exclude(Q(field1=...) | Q(field2=...)) | |
✔ Klass.objects.exclude(Q(field1=...) & Q(field2=...)) | |
✔ Klass.objects.filter(field1=...).exclude(field2=...) | |
✔ Klass.objects.filter(Q(field1=...) | Q(field2=...)).exclude(Q(field3=...) & Q(field4=...)) |
API | Description |
---|---|
✔ Klass.objects.all().order_by(‘field’) | |
✔ Klass.objects.all().order_by(‘-field’) | |
✔ Klass.objects.all().order_by(‘field1’, ‘-field2’) | |
✘ Klass.objects.all().order_by(‘?’) |
API | Description |
---|---|
✔ Klass.objects.get(pk=...) | Gets the obj from local, the one in CDMS, compares the two and updates the local before returning it if necessary |
✔ Klass.objects.get(cdms_pk=...) | Gets the obj from local or CDMS if doesn’t exist in local, updates or creates the local before returning it if necessary |
✔ Klass.objects.get(field=...) | Like .get(pk=...) |
API | Description |
---|---|
✔ obj = Klass(field=...); obj.save() | |
✔ Klass.objects.create(field=...) | |
✘ Klass.objects.bulk_create(...) |
API | Description |
---|---|
✔ obj.save() | |
✘ Klass.objects.filter(field=...).update(...) | |
✘ Klass.objects.select_for_update(...) |
API | Description |
---|---|
✔ obj.delete() | |
✘ Klass.objects.filter(field=...).delete() |
API | Description |
---|---|
✘ Klass.objects.annotate(...) | |
✘ Klass.objects.reverse(...) | |
✘ Klass.objects.distinct(...) | |
✘ Klass.objects.values(...) | |
✘ Klass.objects.values_list(...) | |
✘ Klass.objects.dates(...) | |
✘ Klass.objects.datetimes(...) | |
✔ Klass.objects.none() | |
✘ Klass.objects.select_related(...) | |
✘ Klass.objects.prefetch_related(...) | |
✘ Klass.objects.extra(...) | |
✘ Klass.objects.defer(...) | |
✘ Klass.objects.only(...) | |
✘ Klass.objects.raw(...) | |
✘ Klass.objects.get_or_create(...) | |
✘ Klass.objects.update_or_create(...) | |
✘ Klass.objects.count(...) | |
✘ Klass.objects.in_bulk(...) | |
✘ Klass.objects.latest(...) | |
✘ Klass.objects.earliest(...) | |
✘ Klass.objects.first(...) | |
✘ Klass.objects.last(...) | |
✘ Klass.objects.aggregate(...) | |
✘ Klass.objects.exists(...) |
Operations that skip synchronisation¶
Most of the time, you can skip CDMS operations by using the skip_cdms()
method on the manager
or the skip_cdms
param on the save/delete methods.
Note
Do not skip the cdms operations when writing as the objects would then become out of sync. If this is really required, maybe we need to rename the modified field into something like cdms_modified and have a different one for modified.
✔ Supported ✘ Not supported
API | Description |
---|---|
✔ Klass.objects.skip_cdms().all() | |
✔ Klass.objects.skip_cdms().filter(...) | |
✔ Klass.objects.skip_cdms().exclude(...) | |
✔ Klass.objects.skip_cdms().all().order_by(...) |
API | Description |
---|---|
✔ Klass.objects.skip_cdms().get() |
API | Description |
---|---|
✔ obj = Klass(field=...); obj.save(skip_cdms=True) | |
✔ Klass.objects.skip_cdms().create(field=...) | |
✔ Klass.objects.skip_cdms().bulk_create(field=...) |
API | Description |
---|---|
✔ obj.save(skip_cdms=True) | |
✔ Klass.objects.skip_cdms().filter(field=...).update(...) | |
✔ Klass.objects.skip_cdms().select_for_update(...) |
API | Description |
---|---|
✔ obj.delete(skip_cdms=True) | |
✔ Klass.objects.skip_cdms().filter(field=...).delete() |
API | Description |
---|---|
✔ Klass.objects.skip_cdms().annotate(...) | |
✔ Klass.objects.skip_cdms().reverse(...) | |
✔ Klass.objects.skip_cdms().distinct(...) | |
✔ Klass.objects.skip_cdms().values(...) | |
✔ Klass.objects.skip_cdms().values_list(...) | |
✔ Klass.objects.skip_cdms().dates(...) | |
✔ Klass.objects.skip_cdms().datetimes(...) | |
✔ Klass.objects.skip_cdms().none() | |
✔ Klass.objects.skip_cdms().select_related(...) | |
✘ Klass.objects.skip_cdms().prefetch_related(...) | |
✔ Klass.objects.skip_cdms().extra(...) | |
✔ Klass.objects.skip_cdms().defer(...) | |
✔ Klass.objects.skip_cdms().only(...) | |
✔ Klass.objects.skip_cdms().raw(...) | |
✔ Klass.objects.skip_cdms().get_or_create(...) | |
✔ Klass.objects.skip_cdms().update_or_create(...) | |
✔ Klass.objects.skip_cdms().count(...) | |
✔ Klass.objects.skip_cdms().in_bulk(...) | |
✔ Klass.objects.skip_cdms().latest(...) | |
✔ Klass.objects.skip_cdms().earliest(...) | |
✔ Klass.objects.skip_cdms().first(...) | |
✔ Klass.objects.skip_cdms().last(...) | |
✔ Klass.objects.skip_cdms().aggregate(...) | |
✔ Klass.objects.skip_cdms().exists(...) |
Revisions¶
We use django-reversion for creating revisions and versions.
How django-reversion works¶
django-reversion uses revisions and versions.
Revisions are blocks of code where some changes happen. One or more objects could potentially change in the same block.
Versions are changes to an object in a given revision. Versions always have a foreign key to the related revision.
Revisions can have the following metadata:
- user: who made the changes
- comment: optional text
Metadata has to be set manually for obvious reasons.
Usually you implement django-reversion in various ways:
- via the admin integration so that every time a user uses the admin, changes are saved automatically
- via an explicit context manager with the possibility to set metadata programmatically
How django-reversion is used¶
As we wanted to create revisions/versions automatically and not lose any changes, we implemented django-reversion at a lower level.
In our system we have 2 types of changes:
- CDMS refresh changes: where we refresh a local object (update or create) from CDMS. This happens automatically by creating a version of the object with the comment CDMS refresh.
- local changes: where we make a change to the objects of our system.
This happens every time the
.save()
method is called and it’s automatic.
Note
As we can’t access the user automatically, we are currently not setting the related metadata on the revision. We need to look into this, it might just be a matter of using the context manager in API views.
Shutting down CDMS¶
If you are reading this it means that it’s probably time to shut down CDMS and get rid of all that crazy sync shit. Congratulations and well done!
Hopefully, the past developers made your life easier and removing all dependencies means that you only need to:
- change
core.models.CRMBaseModel
so that it extendscore.lib_models.TimeStampedModel
instead ofmigrator.models.CDMSModel
- change
core.models.managers.CDMSManager
so that it extends the django default manager instead ofmigrator.managers.CDMSManager
- delete the
migrator
and thecdms_api
apps- delete the
cdms_migrator
file in every django app- clean up the settings with all unused values
- run
makemigrations
andmigrate