Welcome to Lobster’s documentation!¶
Contents:
Installation¶
Warning
These steps should always be performed in a valid CMSSW environment (after executing cmsenv)! Lobster will need python 2.7 or greater (but not python 3). CMSSW version 8_0_15 or above are recommended to use.
Prerequisites¶
Check you python version after running cmsenv:
$ python -V
Python 2.7.11
Dependencies¶
CClab tools
Download the most recent version of the cctools from the Notre Dame Cooperative Computing Lab and install them by unpacking the tarball and adding the bin directory to your path:
export cctools=lobster-160-679ce223-cvmfs-70dfa0d6 wget -O - http://ccl.cse.nd.edu/software/files/${cctools}-x86_64-redhat6.tar.gz export PATH=$PWD/${cctools}-x86_64-redhat6/bin:$PATH export PYTHONPATH=$PWD/${cctools}-x86_64-redhat6/lib/python2.6/site-packages:$PYTHONPATH
Note
At Notre Dame, a development version can be accessed via:
export cctools=lobster-160-679ce223-cvmfs-70dfa0d6 export PYTHONPATH=$PYTHONPATH:/afs/crc.nd.edu/group/ccl/software/x86_64/redhat6/cctools/$cctools/lib/python2.6/site-packages export PATH=/afs/crc.nd.edu/group/ccl/software/x86_64/redhat6/cctools/$cctools/bin:$PATH
For tcsh users these lines have to be adapted [1]. You might want to add these lines to the shell startup file.
Setuptools
Install the python package manager
pip
, if not already present (may also be installed aspip2.7
), with:wget -O - https://bootstrap.pypa.io/get-pip.py|python - --user
This installs pip in your ~/.local directory. In order to access these executables, add them to your path with:
export PATH=$HOME/.local/bin:$PATH
Setup¶
It is recommended to use lobster in a virtualenv, which is used to keep all dependencies of lobster within one directory and not interfere with other python packages and their dependencies (e.g. CRAB3):
pip install --user virtualenv
virtualenv --system-site-packages ~/.lobster
And activate the virtualenv. This step has to be done every time lobster is run, to set the right paths for dependencies:
. ~/.lobster/bin/activate
To exit the virtualenv, use:
deactivate
Installation as package¶
Install Lobster with:
wget -O - https://raw.githubusercontent.com/NDCMS/lobster/master/install_dependencies.sh|sh -
pip install https://github.com/NDCMS/lobster/tarball/master
Installation from source¶
Lobster can also be installed from a local checkout, which will allow for easy modification of the source:
git clone git@github.com:NDCMS/lobster.git
cd lobster
./install_dependencies.sh
pip install --upgrade .
Footnotes
[1] |
setenv cctools lobster-160-679ce223-cvmfs-70dfa0d6
setenv PYTHONPATH ${PYTHONPATH}:/afs/crc.nd.edu/group/ccl/software/x86_64/redhat6/cctools/$cctools/lib/python2.6/site-packages
setenv PATH /afs/crc.nd.edu/group/ccl/software/x86_64/redhat6/cctools/$cctools/bin:${PATH}
|
Configuration¶
Example configurations¶
All configurations in this section are also available in the examples directory of the Lobster source.
Simple dataset processing¶
The following is an example of a simple python configuration used to process a single dataset:
import datetime
from lobster import cmssw
from lobster.core import AdvancedOptions, Category, Config, StorageConfiguration, Workflow
version = datetime.datetime.now().strftime('%Y%m%d_%H%M')
storage = StorageConfiguration(
output=[
"hdfs://eddie.crc.nd.edu:19000/store/user/$USER/lobster_test_" + version,
"file:///hadoop/store/user/$USER/lobster_test_" + version,
# ND is not in the XrootD redirector, thus hardcode server.
# Note the double-slash after the hostname!
"root://deepthought.crc.nd.edu//store/user/$USER/lobster_test_" + version,
"chirp://eddie.crc.nd.edu:9094/store/user/$USER/lobster_test_" + version,
"gsiftp://T3_US_NotreDame/store/user/$USER/lobster_test_" + version,
"srm://T3_US_NotreDame/store/user/$USER/lobster_test_" + version
]
)
processing = Category(
name='processing',
cores=1,
runtime=900,
memory=1000
)
workflows = []
ttH = Workflow(
label='ttH',
dataset=cmssw.Dataset(
dataset='/ttHToNonbb_M125_13TeV_powheg_pythia8/RunIIFall15MiniAODv2-PU25nsData2015v1_76X_mcRun2_asymptotic_v12-v1/MINIAODSIM',
events_per_task=50000
),
category=processing,
command='cmsRun simple_pset.py',
publish_label='test',
merge_size='3.5G',
outputs=['output.root']
)
workflows.append(ttH)
config = Config(
workdir='/tmpscratch/users/$USER/lobster_test_' + version,
plotdir='~/www/lobster/test_' + version,
storage=storage,
workflows=workflows,
advanced=AdvancedOptions(
bad_exit_codes=[127, 160],
log_level=1
)
)
And using a custom ROOT macro over the same dataset:
import datetime
from lobster import cmssw
from lobster.core import AdvancedOptions, Category, Config, StorageConfiguration, Workflow
version = datetime.datetime.now().strftime('%Y%m%d_%H%M')
storage = StorageConfiguration(
output=[
"hdfs://eddie.crc.nd.edu:19000/store/user/$USER/lobster_test_" + version,
"file:///hadoop/store/user/$USER/lobster_test_" + version,
# ND is not in the XrootD redirector, thus hardcode server.
# Note the double-slash after the hostname!
"root://deepthought.crc.nd.edu//store/user/$USER/lobster_test_" + version,
"chirp://eddie.crc.nd.edu:9094/store/user/$USER/lobster_test_" + version,
"gsiftp://T3_US_NotreDame/store/user/$USER/lobster_test_" + version,
"srm://T3_US_NotreDame/store/user/$USER/lobster_test_" + version
]
)
processing = Category(
name='processing',
cores=1,
runtime=900,
memory=1000
)
workflows = []
ttH = Workflow(
label='ttH',
dataset=cmssw.Dataset(
dataset='/ttHToNonbb_M125_13TeV_powheg_pythia8/RunIIFall15MiniAODv2-PU25nsData2015v1_76X_mcRun2_asymptotic_v12-v1/MINIAODSIM',
lumis_per_task=20,
file_based=True
),
category=processing,
command='root -b -q -l script_macro.C @outputfiles @inputfiles',
extra_inputs=['script_macro.C'],
publish_label='test',
merge_command='hadd @outputfiles @inputfiles',
merge_size='3.5G',
outputs=['output.root']
)
workflows.append(ttH)
config = Config(
workdir='/tmpscratch/users/$USER/lobster_test_' + version,
plotdir='~/www/lobster/test_' + version,
storage=storage,
workflows=workflows,
advanced=AdvancedOptions(
bad_exit_codes=[127, 160],
log_level=1
)
)
MC generation à la production¶
Lobster has the ability to reproduce the production workflows used in CMS for Monte-Carlo production. As a first step, the steps of a workflow have to be downloaded and the release areas prepared:
#!/bin/sh
set -e
mkdir -p mc_gen
cd mc_gen
source /cvmfs/cms.cern.ch/cmsset_default.sh
curl -k https://cms-pdmv.cern.ch/mcm/public/restapi/requests/get_setup/HIG-RunIIWinter15wmLHE-00196 > setup01_lhe.sh
curl -k https://cms-pdmv.cern.ch/mcm/public/restapi/requests/get_setup/HIG-RunIISummer15GS-00177 > setup02_gs.sh
curl -k https://cms-pdmv.cern.ch/mcm/public/restapi/requests/get_setup/HIG-RunIIFall15DR76-00243 > setup03_dr.sh
curl -k https://cms-pdmv.cern.ch/mcm/public/restapi/requests/get_setup/HIG-RunIIFall15MiniAODv2-00224 > setup04_v4.sh
sed -i 's@/afs/.*@/cvmfs/cms.cern.ch/cmsset_default.sh@g' setup*.sh
sed -i 's@export X509.*@@' setup*.sh
for f in *.sh; do
sh $f;
done
cat <<EOF >> HIG-RunIIWinter15wmLHE-00196_1_cfg.py
process.maxEvents.input = cms.untracked.int32(1)
process.externalLHEProducer.nEvents = cms.untracked.uint32(1)
EOF
cat <<EOF >> HIG-RunIISummer15GS-00177_1_cfg.py
process.source = cms.Source("PoolSource", fileNames = cms.untracked.vstring('file:HIG-RunIIWinter15wmLHE-00196.root'))
EOF
cat <<EOF >> HIG-RunIIFall15DR76-00243_1_cfg.py
process.source.fileNames = cms.untracked.vstring('file:HIG-RunIISummer15GS-00177.root')
EOF
cat <<EOF >> HIG-RunIIFall15MiniAODv2-00224_1_cfg.py
process.source.fileNames = cms.untracked.vstring('file:HIG-RunIIFall15DR76-00243.root')
EOF
if [ -n "$RUN_MC" ]; then
cd CMSSW_7_1_16_patch1/; cmsenv; cd -
cmsRun -n 4 HIG-RunIIWinter15wmLHE-00196_1_cfg.py
cd CMSSW_7_1_18/; cmsenv; cd -
cmsRun -n 4 HIG-RunIISummer15GS-00177_1_cfg.py
cd CMSSW_7_6_1/; cmsenv; cd -
cmsRun -n 4 HIG-RunIIFall15DR76-00243_1_cfg.py
cmsRun -n 4 HIG-RunIIFall15DR76-00243_2_cfg.py
cd CMSSW_7_6_3/; cmsenv; cd -
cmsRun -n 4 HIG-RunIIFall15MiniAODv2-00224_1_cfg.py
fi
The setup created by this script can be run by the following Lobster configuration, which sports a workflow for each step of the official MC production chain:
import datetime
from lobster import cmssw
from lobster.core import AdvancedOptions, Category, Config, StorageConfiguration, Workflow
from lobster.core import ParentDataset, ProductionDataset
version = datetime.datetime.now().strftime('%Y%m%d')
storage = StorageConfiguration(
output=[
"hdfs://eddie.crc.nd.edu:19000/store/user/$USER/lobster_mc_" + version,
"file:///hadoop/store/user/$USER/lobster_mc_" + version,
"root://deepthought.crc.nd.edu//store/user/$USER/lobster_mc_" + version,
"chirp://eddie.crc.nd.edu:9094/store/user/$USER/lobster_test_" + version,
"gsiftp://T3_US_NotreDame/store/user/$USER/lobster_mc_" + version,
"srm://T3_US_NotreDame/store/user/$USER/lobster_mc_" + version,
]
)
workflows = []
lhe = Workflow(
label='lhe_step',
pset='mc_gen/HIG-RunIIWinter15wmLHE-00196_1_cfg.py',
sandbox=cmssw.Sandbox(release='mc_gen/CMSSW_7_1_16_patch1'),
merge_size='125M',
dataset=ProductionDataset(
total_events=25000,
events_per_lumi=25,
lumis_per_task=10
),
category=Category(
name='lhe',
cores=1,
memory=2000
)
)
gs = Workflow(
label='gs_step',
pset='mc_gen/HIG-RunIISummer15GS-00177_1_cfg.py',
sandbox=cmssw.Sandbox(release='mc_gen/CMSSW_7_1_18'),
merge_size='500M',
dataset=ParentDataset(
parent=lhe,
units_per_task=1
),
category=Category(
name='gs',
cores=1,
memory=2000,
runtime=45 * 60
)
)
digi = Workflow(
label='digi_step',
pset='mc_gen/HIG-RunIIFall15DR76-00243_1_cfg.py',
sandbox=cmssw.Sandbox(release='mc_gen/CMSSW_7_6_1'),
merge_size='1G',
dataset=ParentDataset(
parent=gs,
units_per_task=10
),
category=Category(
name='digi',
cores=1,
memory=2600,
runtime=45 * 60,
tasks_max=100
)
)
reco = Workflow(
label='reco_step',
pset='mc_gen/HIG-RunIIFall15DR76-00243_2_cfg.py',
sandbox=cmssw.Sandbox(release='mc_gen/CMSSW_7_6_1'),
# Explicitly specify outputs, since the dependency processing only
# works for workflows with one output file, but the configuration
# includes two.
outputs=['HIG-RunIIFall15DR76-00243.root'],
merge_size='1G',
dataset=ParentDataset(
parent=digi,
units_per_task=6
),
category=Category(
name='reco',
cores=4,
memory=2800,
runtime=45 * 60,
tasks_min=10
)
)
maod = Workflow(
label='mAOD_step',
pset='mc_gen/HIG-RunIIFall15MiniAODv2-00224_1_cfg.py',
sandbox=cmssw.Sandbox(release='mc_gen/CMSSW_7_6_3'),
merge_size='500M',
dataset=ParentDataset(
parent=reco,
units_per_task=60
),
category=Category(
name='mAOD',
cores=2,
memory=2000,
runtime=30 * 60
)
)
config = Config(
workdir='/tmpscratch/users/$USER/lobster_mc_' + version,
plotdir='~/www/lobster/mc_' + version,
storage=storage,
workflows=[lhe, gs, digi, reco, maod],
advanced=AdvancedOptions(log_level=1)
)
The configuration components¶
General Options¶
-
class
lobster.core.config.
Config
(workdir, storage, workflows, label=None, advanced=None, plotdir=None, foremen_logs=None, base_directory=None, base_configuration=None, startup_directory=None, elk=None)[source]¶ Top-level Lobster configuration object
This configuration object will fully specify a Lobster project, including several
Workflow
instances and aStorageConfiguration
.Parameters: - workdir (str) – The working directory to be used for the project. Note that this should be on a local filesystem to avoid problems with the database.
- storage (StorageConfiguration) – The configuration for the storage element for output and input files.
- workflows (list) – A list of
Workflow
to process. - label (str) – A string to identify this project by. This will be used in the
CMS dashboard, where it appears as
lobster_<user>_<label>_<hash>
, and in conjunction with WorkQueue, where the project will be referred to aslobster_<user>_<label>
. The default is the date in the format YYYYmmdd. - advanced (AdvancedOptions) – More options for advanced users.
- plotdir (str) – A directory to store monitoring pages in.
- foremen_logs (list) – A list of
str
pointing to the WorkQueue foremen logs.
-
class
lobster.core.config.
AdvancedOptions
(abort_threshold=10, abort_multiplier=4, bad_exit_codes=None, dashboard=None, dump_core=False, email=None, full_monitoring=False, log_level=2, osg_version=None, payload=10, proxy=None, threshold_for_failure=30, threshold_for_skipping=30, wq_max_retries=10, wq_port=-1, xrootd_servers=None)[source]¶ Advanced options for tuning Lobster
Attributes modifiable at runtime:
- payload
- threshold_for_failure
- threshold_for_skipping
Parameters: - abort_threshold (int) – After how many successful tasks outliers in runtime should be killed.
- abort_multiplier (int) – How many standard deviations a task is allowed to go over the average task runtime.
- bad_exit_codes (list) – A list of exit codes that are considered to come from bad workers. As soon as a task returns with an exit code from this list, the worker it ran on will be blacklisted and no more tasks send to it.
- dashboard (
Dashboard
) – Use the CMS dashboard to report task status. Set or False to disable. - dump_core (bool) – Produce core dumps. Useful to debug WorkQueue.
- email (str) – The email address you want to receive emails from Lobster.
- full_monitoring (bool) – Produce full monitoring output. Useful to debug WorkQueue.
- log_level (int) – How much logging output to show. Goes from 1 to 5, where 1 is the most verbose (including a lot of debug output), and 5 is practically quiet.
- osg_version (str) – The version of OSG you want lobster to run on.
- payload (int) – How many tasks to keep in the queue (minimum). Note that the payload will increase with the number of cores available to Lobster. This is just the minimum with no workers connected.
- proxy (
Proxy
) – An authentication mechanism to access data. Set to False to disable. - threshold_for_failure (int) – How often a single unit may fail to be processed before Lobster will not attempt to process it any longer.
- threshold_for_skipping (int) – How often a single file may fail to be accessed before Lobster will not attempt to process it any longer.
- wq_max_retries (int) – How often WorkQueue will attempt to process a task before handing it back to Lobster. WorkQueue will only reprocess evicted tasks automatically.
- wq_port (int) – WorkQueue Master port number. Defaults to -1 to look for an available port.
- xrootd_servers (list) – A list of xrootd servers to use to access remote data. Defaults to cmsxrootd.fnal.gov.
-
class
lobster.se.
StorageConfiguration
(output, input=None, use_work_queue_for_inputs=False, use_work_queue_for_outputs=False, shuffle_inputs=False, shuffle_outputs=False, disable_input_streaming=False, disable_stage_in_acceleration=False)[source]¶ Container for storage element configuration.
Uses URLs of the form {protocol}://{server}/{path} to specify input and output locations, where the server is omitted for file and hadoop access. All output URLs should point to the same physical storage, to ensure consistent data handling. Storage elements within CMS, as in T2_CH_CERN will be expanded for the srm and root protocol. Protocols supported:
- file
- gsiftp
- hadoop
- chirp
- srm
- root
The chirp protocol requires an instance of a Chirp server.
Attributes modifiable at runtime:
- input
- output
Parameters: - output (list) – A list of URLs to access output storage
- input (list) – A list of URLs to access input data
- use_work_queue_for_inputs (bool) – Use WorkQueue to transfer input data. Will encur a severe running penalty when used.
- use_work_queue_for_outputs (bool) – Use WorkQueue to transfer output data. Will encur a severe running penalty when used.
- shuffle_inputs (bool) – Shuffle the list of input URLs when passing them to the task. Will provide basic load-distribution.
- shuffle_outputs (bool) – Shuffle the list of output URLs when passing them to the task. Will provide basic load-distribution.
- disable_input_streaming (bool) – Turn of streaming input data via, e.g., XrootD.
- disable_stage_in_acceleration (bool) – By default, tasks with many input files will test input URLs for the first successful one, which will then be used to access the remaining input files. By using this setting, all input URLs will be attempted for all input files.
Workflow specification¶
-
class
lobster.core.workflow.
Category
(name, mode='max_throughput', cores=None, memory=None, disk=None, runtime=None, tasks_max=None, tasks_min=None)[source]¶ Resource specification for one or more
Workflow
.This information will be passed on to WorkQueue, which will forcibly terminate tasks of
Workflow
in the group that exceed the specified resources.Attributs modifiable at runtime:
- tasks_min
- tasks_max
- runtime
Parameters: - name (str) – The name of the resource group.
- mode (str) – Dictates how WorkQueue handles exhausted resources. Possible values are: fixed (task fails), max (the maximum allowed resource consumption is set by the maximum seen in tasks of that category; tasks are automatically adjusted and retried), min_waste (same as max, but allocations prioritize minimizing waste), or max_throughput (same as max, but allocations prioritize maximizing throughput.)
- cores (int) – The max number of cores required (fixed mode), or the first guess for WorkQueue to determine the number of cores required (all other modes).
- memory (int) – How much memory a task is allowed to use, in megabytes (fixed mode), or the starting guess for WorkQueue to determine how much memory a task requires (all other modes).
- disk (int) – How much disk a task is allowed to use, in megabytes (fixed mode), or the starting guess for WorkQueue to determine how much disk a task requires (all other modes.)
- runtime (int) – The runtime of the task in seconds. Lobster will add a grace period to this time, and try to adjust the task size such that this runtime is achieved.
- tasks_max (int) – How many tasks should be in the queue (running or waiting) at the same time.
- tasks_min (int) – The minimum of how many tasks should be in the queue (waiting) at the same time.
-
class
lobster.core.workflow.
Workflow
(label, dataset, command, category=Category( default, mode=<Mock id='140224503018512'> ), publish_label=None, cleanup_input=False, merge_size=-1, sandbox=None, unique_arguments=None, extra_inputs=None, outputs=None, output_format='{base}_{id}.{ext}', local=False, globaltag=None, merge_command='cmsRun')[source]¶ A specification for processing a dataset.
Parameters: - label (str) – The shorthand name of the workflow. This is used as a reference throughout Lobster.
- dataset (Dataset) – The specification of data to be processed. Can be any of the dataset related classes.
- category (Category) – The category of resource specification this workflow belongs to.
- publish_label (str) – The label to be used for the publication database.
- cleanup_input (bool) – Delete input files after processing.
- merge_size (str) – Activates output file merging when set. Accepts the suffixes k, m, g for kilobyte, megabyte, …
- sandbox (Sandbox or list of Sandbox) – The sandbox(es) to use. Currently can be a
Sandbox
. When multiple sandboxes are used, one sandbox per computing architecture to be run on is expected, containing the same release, and anValueError
will be raised otherwise. - command (str) –
The command to run when executing the workflow.
The command string may contain @args, @outputfiles, and @inputfiles, which will be replaced by unique arguments and output as well as input files, respectively. For running CMSSW workflows, it is sufficient to use:
cmsRun pset.py
where the file pset.py will be automatically added to the sandbox and the input source of the parameter set will be modified to use the correct input files. Note that otherwise, any used files will have to be included in extra_inputs.
- extra_inputs (list) – Additional inputs outside the sandbox needed to process the workflow.
- unique_arguments (list) – A list of arguments. Each element of the dataset is processed once for each argument in this list. The unique argument is also passed to the executable.
- outputs (list) – A list of strings which specifies the files produced by the workflow. If outputs=[], no output files will be returned. If outputs=None, outputs will be automatically determined for CMSSW workflows.
- output_format (str) – How the output files should be renamed on the storage element. This is a new-style format string, allowing for the fields base, id, and ext, for the basename of the output file, the ID of the task, and the extension of the output file.
- local (bool) – If set to True, Lobster will assume this workflow’s input is present on the output storage element.
- globaltag (str) – Which GlobalTag this workflow uses. Needed for publication of CMSSW workflows, and can be automatically determined for these.
- merge_command (str) –
Accepts cmsRun (the default), or a custom command. Tells Lobster what command to use for merging. If outputs are autodetermined (outputs=None), cmsRun will be used for EDM output and hadd will be used otherwise.
When merging plain ROOT files the following should be used:
merge_command="hadd @outputfiles @inputfiles"
See the specification for the command parameter about passing input and output file values.
Dataset specification¶
-
class
lobster.core.dataset.
Dataset
(files, files_per_task=1, patterns=None)[source]¶ A simple dataset specification.
Runs over files found in a list of directories or specified directly.
Parameters: - files (list) – A list of files or directories to process. May also be a str pointing to a single file or directory.
- files_per_task (int) – How many files to process in one task. Defaults to 1.
- patterns (list) – A list of shell-style file patterns to match filenames against. Defaults to None and will use all files considered.
-
class
lobster.core.dataset.
EmptyDataset
(number_of_tasks=1)[source]¶ Dataset specification for non-cmsRun workflows with no input files.
Parameters: number_of_tasks (int) – How many tasks to run.
-
class
lobster.core.dataset.
ParentDataset
(parent, units_per_task=1)[source]¶ Process the output of another workflow.
Parameters:
-
class
lobster.core.dataset.
ProductionDataset
(total_events, events_per_lumi=500, lumis_per_task=1, randomize_seeds=True)[source]¶ Dataset specification for Monte-Carlo event generation.
Parameters: - total_events (int) – How many events to generate.
- events_per_lumi (int) – How many events to generate in one luminosity section.
- lumis_per_task (int) – How many lumis to produce per task. Can be changed by Lobster to match the user-specified task runtime.
- randomize_seeds (bool) – Use random seeds every time a task is run.
-
class
lobster.core.dataset.
MultiProductionDataset
(gridpacks, events_per_gridpack, events_per_lumi=500, lumis_per_task=1, randomize_seeds=True)[source]¶ Dataset specification for Monte-Carlo event generation from a set of gridpacks.
Parameters: - gridpacks (list) – A list of gridpack files or directories to process. May also be a str pointing to a single gridpack file or directory.
- events_per_gridpack (int) – How many events to generate per gridpack.
- events_per_lumi (int) – How many events to generate in one luminosity section.
- lumis_per_task (int) – How many lumis to produce per task. Can be changed by Lobster to match the user-specified task runtime.
- randomize_seeds (bool) – Use random seeds every time a task is run.
Monitoring¶
-
class
lobster.monitor.elk.
ElkInterface
(es_host, es_port, kib_host, kib_port, project, dashboards=None, refresh_interval=300)[source]¶ Enables ELK stack monitoring for the current Lobster run using an existing Elasticsearch cluster.
Attributs modifiable at runtime: * es_host * es_port * kib_host * kib_port * dashboards * refresh_interval
Parameters: - es_host (str) – Host running Elasticsearch cluster.
- es_port (int) – Port number running Elasticsearch HTTP service.
- kib_host (str) – Host running Kibana instance connected to Elasticsearch cluster.
- kib_port (int) – Port number running Kibana HTTP service.
- user (str) – User ID to label Elasticsearch indices and Kibana objects.
- project (str) – Project ID to label Elasticsearch indices and Kibana objects.
- dashboards (list) – List of dashboards to include from the Kibana templates. Defaults to including only the core dashboard. Available dashboards: Core, Advanced, Tasks.
- refresh_interval (int) – Refresh interval for Kibana dashboards, in seconds. Defaults to 300 seconds = 5 minutes.
Note
At Notre Dame, Elasticsearch is accessible at elk.crc.nd.edu:9200
and Kibana is accessible at elk.crc.nd.edu:5601
Running¶
Note
These steps should always be performed in a valid CMSSW environment (after executing cmsenv), and with the virtualenv activated!
After configuring Lobster, and ensuring that both the CMSSW environment and the virtualenv are loaded, the project can be processed.
Basic procedure¶
This is the most basic usage of Lobster. For more information about the
commands, use lobster -h
and lobster <cmd> -h
.
Obtain a valid proxy:
voms-proxy-init -voms cms -valid 192:00
Start Lobster:
lobster process config.py
Submit workers, see also Submitting workers:
label=shuffle cores=4 condor_submit_workers -N lobster_$label --cores $cores \ --memory $(($cores * 1100)) --disk $(($cores * 4500)) 10
Follow the log output in the working directory of the
Config
. Log output can be found in process.log, and the error output in process.err:tail -f /my/working/directory/process.{err,log}
Additional useful commands¶
Print the configuration of a working directory:
lobster configuration /my/working/directory
And to change the configuration:
lobster configuration /my/working/directory
See also Changing configuration options.
Print the current status of processing on the terminal with:
lobster status /my/working/directory
Validate the output directory to catch any stray or missing files:
lobster validate --dry-run /my/working/directory
and, after verifying the printout from the above, run it again without the
--dry-run
argument.Stop a Lobster run cleanly:
lobster terminate /my/working/directory
This will give Lobster a chance to gracefully exit, but may take a few minutes to take effect (at least one iteration of sending out and receiving tasks).
Note
To immediately stop Lobster from running, use:
kill -9 `cat /my/working/directory/lobster.pid`
This may result in database corruption, though, and should only be used when the running Lobster project is not to be continued.
Submitting workers¶
Workers can be either submitted directly, as shown above, or by using the WorkQueue factory, which allows for dynamic scaling of the processing. The factory uses a config written in JSON, like:
{
"max-workers": 100,
"min-workers": 0,
"cores": 4,
"memory": 4400,
"disk": 12000,
"workers-per-cycle": 50,
"condor-requirements": "OpSysAndVer == \"RedHat6\""
}
This configuration sets up 4 core workers providing 4.4 GB of RAM and 12 GB of disk space each. It is included in the examples directory in the Lobster source distribution. The factory can thus be started from within the Lobster source directory with:
nohup work_queue_factory -T condor -M "lobster_$USER.*" -dall -o /tmp/${USER}_factory.debug -C examples/factory.json > /tmp/${USER}_factory.log &
If the log of the factory grows too large, removing the -dall
will
disable debug output, and considerably lessen disk usage.
Note
At Notre Dame, the following login nodes are connected to the opportunistic resource: crcfe01, crcfe02, and condorfe. The resources have monitoring pages for condor and the external bandwidth, and the squid server.
The opportunistic resources have now largely migrated to RHEL7. While using CMSSW releases that aren’t available in RHEL7, you need to ask Work Queue to run your job using Singularity. This requires a change to the factory invocation as follows:
nohup work_queue_factory -T condor -M "lobster_$USER.*" -d all -o /tmp/${USER}_lobster_factory.debug -C examples/factory_singularity.json --wrapper "python /afs/crc.nd.edu/group/ccl/software/runos/runos.py rhel6" --extra-options="--workdir=/disk" --worker-binary=/afs/crc.nd.edu/group/ccl/software/x86_64/redhat6/cctools/$cctools/bin/work_queue_worker >&! /tmp/${USER}_lobster_factory.log &
Using a Chirp server¶
Using Chirp for stage-in and stage-out can be helpful when standard CMS tools for file handling, i.e., XrootD and SRM, are not available.
At Notre Dame, a Chirp server is running under eddie.crc.nd.edu:9094.
Make sure that your output directory can be reached with globus
authentication by looking for a line starting with globus:
that should
match your personal information in:
cat /hadoop/store/user/$USER/.__acl
If this line is not present or the file does not exist, create it with:
echo "globus:`voms-proxy-info -identity|sed 's/ /_/g'` rwlda" >> /hadoop/store/user/$USER/.__acl
Then verify that you can write to your output directory:
chirp eddie.crc.nd.edu:9094 mkdir /store/user/$USER/test
chirp eddie.crc.nd.edu:9094 rm /store/user/$USER/test/.__acl
chirp eddie.crc.nd.edu:9094 rmdir /store/user/$USER/test
After this, the Chirp server can be added to the input and output settings of the configuration, as done in the examples.
Running a separate Chirp server as user¶
Create a file called acl with default access permissions in, e.g., your home directory via (you will need a valid proxy for this!):
echo "globus:`voms-proxy-info -identity|sed 's/ /_/g'` rwlda" > ~/acl
Then do something akin to the following command:
chirp_server --root=<some_directory> -A ~/acl -p <your_port>
where the default port is 9094, but may be occupied, in which case it should be best to linearly increment this port until you find a free one. The root directory given to the Chirp server can be the desired stage-out location, or any directory above it.
Note
If you are using Chirp to stage out to a server that cannot handle a
high load, limit the connections by adding -M 50
to the arguments.
You should test Chirp from another machine:
voms-proxy-init -voms cms -valid 192:00
chirp_put <some_file> <your_server>:<your_port> spam
If this command fails with a permission issue, make sure you do not have any .__acl files lingering around in your stageout directory:
find <your_stageout_directory> -name .__acl -exec rm \{} \;
and try again. Then add the following line to either the input or output
argument of the StorageConfiguration
:
"chirp://<your_server>:<your_port>/<your_stageout_directory_minus_chirp_root>"
Using a Hadoop backend¶
Running Chirp with a direct connection to a Hadoop storage element may increase performance. Setting it up, which can end up quite complex, at Notre Dame would look akin to the following:
cd /var/tmp/
cp -r /usr/lib/hadoop/ .
cp /usr/lib64/libhdfs* hadoop/lib/
env JAVA_HOME=/etc/alternatives/java_sdk/ HADOOP_HOME=$PWD/hadoop chirp_server \
--root=hdfs://eddie.crc.nd.edu:19000/<your_stageout_directory_wo_leading_hadoop> \
-A ~/acl -p <your_port>
It may be necessary to adjust memory setting of the Java VM with, e.g.,
the option LIBHDFS_OPTS=-Xmx100m
.
Changing configuration options¶
Lobster copies the initial configuration to its working directory as config.py. This configuration can be changed to modify the settings of a running Lobster instance. These changes will be propagated when the configuration is re-read by the Lobster main-loop after saving the file. This may take a few minutes for changes to have an effect, Lobster show logging messages about changes in both the main log and configure.log in the working directory.
Only attributes mentioned as modifiable in the documentation of each class can be changed.
Lobster also provides a configure
convenience command to edit the
configuration, which will launch an editor to edit the current
configuration:
lobster configure /my/working/directory
Note
If one of the services used in the original configuration has changed or
become unavailable, executing lobster configure
with said
configuration may fail. It is recommended to pass the working directory
on the command line.
Note
The configure
command uses the environment variable EDITOR
to
determine which editor to use, and uses vi as a default.
Monitoring¶
Lobster produces monitoring plots, which are saved into a directory either as specified in the configuring Lobster, or by issuing the following command:
lobster plot --outdir <monitoring directory> <configuration>
The monitoring is split into a Lobster overview page and per-category pages displaying progress and task status.
ELK Commands¶
Lobster has a few commands to help manage ELK monitoring:
lobster elkdownload <configuration>
lobster elkupdate <configuration>
lobster elkcleanup <configuration>
elkdownload
downloads templates of all dashboards listed in the
configuration with the user/project prefix specified in the configuration and
all visualizations on those dashboards, as well as all index patterns matching
the user/run prefix.
elkupdate
generates dashboards, visualizations, and index patterns from the
saved templates according to the dashboards specified in the configuration.
elkcleanup
deletes all Kibana objects and Elasticsearch indices that match
the user/run prefix in the configuration.
Task Exit Codes¶
Lobster uses the following error codes, which are referred to in the Failed Tasks section of the category monitoring pages:
Code | Reason |
---|---|
169 | Unable to run parrot |
170 | Sandbox unpacking failure |
171 | Failed to determine base release |
172 | Failed to find old releasetop |
173 | Failed to create new release area |
174 | cmsenv failure |
175 | Failed to source the environment (may be parrot related) |
179 | Stagein failure |
180 | Prologue failure |
185 | Failed to run command |
190 | Failed to parse report.xml |
191 | Failed to parse wrapper timing information |
199 | Epilogue failure |
200 | Generic parrot failure |
210 | Stageout failure during transfer |
211 | Stageout failure cross-checking transfer |
500 | Publish failure |
10001 | Generic task failure reported by WorkQueue |
10010 | Task timed out |
10020 | Task exceeded maximum number of retries |
10030 | Task exceeded maximum runtime |
10040 | Task exceeded maximum memory |
10050 | Task exceeded maximum disk |
Error codes lower than 170 may indicate a cmsRun
problem, codes
O(1k) may hint at a CMS configuration or runtime problem.
Codes O(10k) are internal Work Queue error codes and may be bitmasked
together, i.e., 100514 is a combination of errors 100512 and 100002.
Troubleshooting¶
Contents
Common Problems¶
Installation¶
Python crashes due to library mismatches¶
Make sure that all commands are issued in after executing cmsenv
. Also
attempt to clear out $HOME/.local and re-install.
Matplotlib fails to install¶
When using older CMSSW releases, matplotlib may fail to install. This
can be solved by installing numpy first with pip install numpy
.
Configuration¶
Finding the right settings for resources¶
When running Lobster, the resources should get adjusted automatically. To avoid WorkQueue retrying tasks too often to find optimal settings, they can be gathered from an old project via:
find /my/working/directory -name "*.summary"|grep successful > mysummaries
resource_monitor_histograms -L mysummaries -j 32 /my/web/output/directory my_project_name
Where -j 32
should be adjusted to match the cores of the machine.
The output directory /my/web/output/directory then contains a file
stats.json with the resources that WorkQueue deems optimal, and
index.html has a tabular representation.
Mixing DBS and local datasets¶
Mixing datasets with inputs accessed via AAA and custom input configuration is currently not supported.
Example Hacks¶
Configuration¶
Changing an immutable attribute in the pickled configuration¶
You should never need to do this. You should only change mutable configuration
attributes using the configure
command via lobster configure config.py.
The point of the PartiallyMutable metaclass is to restrict sensitive
attributes from changing unless they have been declared mutable and a callback
function has been defined indicating how Lobster should deal with the change.
Unexpected things can happen otherwise. First ask yourself why you’re doing
this before how to do it. If you’re still determined, here’s an example which
changes the label attribute:
import datetime
import os
import shutil
from lobster.core import config
from lobster import util
wdir = "/path/to/working/directory"
shutil.copy(
os.path.join(wdir, "config.pkl"),
os.path.join(wdir, "config.pkl.{:%Y-%m-%d_%H%M%S}".format(datetime.datetime.now())))
cfg = config.Config.load("/path/to/working/directory")
with util.PartiallyMutable.unlock():
cfg.label = 'foo'
cfg.save()