ArchitectureΒΆ
The trickiest thing to get right in Chronograph
is the ability to properly
manage the state of a Job
, i.e. reliably determining whether or not a
job is or isn’t running, if it has been killed or terminated prematurely. In
the first version of Chronograph
this issue was “solved” by keeping track
of the PID of each running job and using the ps
command to have the
operating system tell us if the job was still running. However, this route was
less than ideal, for a few reasons, but most importantly because isn’t wasn’t
cross-platform. Additionally, using a series of subprocess.Popen
calls was
leading to path-related issues for some users, even on “supported” platforms.
Newer version of Chronograph
have attempted to solve this problem in the
following way:
- Get a list of
Job
s that are “due”- For each
Job
, launch amultiprocessing.Process
instance, which internally callsdjango.core.management.call_command
- When the
Job
is run, we spawn athreading.Thread
instance whose sole purpose is to keep track of a lock file. This thread exists only while the Job is running and updates the file every second. We store the path to this temporary file (an instance oftempfile.NamedTemporaryFile
) on theJob
model (which is then stored in the database). When we want to check if aJob
is running we do the following:
- If
is_running
equalsTrue
, andlock_file
point to a file, then:
- If the lock file actually exists and has been updated more recently than
CHRONOGRAPH_LOCK_TIMEOUT
seconds, then we can assume that theJob
is still running- Else we assume the
Job
is not running and update the database accordingly
This new method should would much more reliably across all platforms that support the threading and multiprocess libraries.