Welcome to the OpenFlightHPC Build Knowledgebase!¶
This site contains the documentation for the OpenFlightHPC project. It contains tips and tools for streamlining the building and management of a HPC research environments. While the documentation will mainly focus on the OpenFlightHPC Research Environment workflow there will be notes and tips to customising the process for varying workflows.
Documentation Goal¶
The purpose of this documentation is to provide understandable guidance to delivering the OpenFlightHPC Research Environment. This includes deploying resources on a given platform and installing the relevant software.
Environment Delivery is the installation of software for the user experience on the research environment. This usually involves some sort of resource management/queuing system and application installation.
Note
It is recommended to read through all of the documentation before starting to design the HPC platform to understand the scope and considerations.
Acknowledgements¶
We recognise the respect the trademarks of all third-party providers referenced in this documentation. Please see the respective EULAs for software packages used in configuring your own environment based on this knowledgebase.
License¶
This documentation is released under the Creative-Commons: Attribution-ShareAlike 4.0 International license.
Table of Contents¶
Workflow: AWS¶
Overview¶
This workflow describes deploying cloud resources, consisting of:
- A ‘domain’ containing a network and security group
- A gateway node with internet access
- 2 to 8 compute nodes with internet access
Prerequisites¶
This document presumes the following situation:
- The appropriate cloud tool is installed and configured with access details
- There is enough availability in the upstream cloud region of your account to deploy these resources
- A suitable CentOS 7/8 source image is available in the cloud provider for basing nodes off of
- This source image should be at least 16GB in size to allow enough space for applications and data
Note
OpenFlight provides various images that are ready for cloud, these can be found at https://openflighthpc.org/images/
Template Parameters¶
There are multiple parameters in the template, some are required and others are optional, these are outlined below:
sourceimage
- The AMI ID to use for both the gateway and compute nodesclustername
- The name of the cluster, to be used as part of the FQDN for nodescustomdata
- A base64 encoded cloudinit string for both the gateway and compute nodescomputeNodesCount
- The number of compute nodes to deploy, a value between 2 and 8 (default: 2)gatewayinstancetype
- The instance type to be used for the gateway node (default: t2.small)computeinstancetype
- The instance type to be used for the compute nodes (default: t2.small)
Deploy Resources¶
Download the OpenFlight AWS template:
$ curl -o cluster.yaml https://raw.githubusercontent.com/openflighthpc/openflight-compute-cluster-builder/master/templates/aws/cluster.yaml
Generate base64 customdata string to set the default username to
flight
and add an SSH public key for authentication:$ DATA=$(cat << EOF #cloud-config system_info: default_user: name: flight runcmd: - echo "ssh-rsa MySSHpublicKeyHere user@host" >> /home/flight/.ssh/authorized_keys EOF ) $ echo "$DATA" |base64 -w 0 I2Nsb3VkLWNvbmZpZwpzeXN0ZW1faW5mbwogIGRlZmF1bHRfdXNlcjoKICAgIG5hbWU6IGZsaWdodApydW5jbWQ6CiAgLSBlY2hvICJzc2gtcnNhIE15U1NIcHVibGljS2V5SGVyZSB1c2VyQGhvc3QiID4+IC9ob21lL2ZsaWdodC8uc3NoL2F1dGhvcml6ZWRfa2V5cwo=
Note
The base64 value will differ from the above depending on the SSH key specified. It does not need to match the example output above.
Deploy a cluster (with a gateway and 2 nodes):
$ aws cloudformation deploy --template-file cluster.yaml --stack-name mycluster \ --parameter-overrides sourceimage="AMI_ID_HERE" \ clustername="mycluster" \ customdata="I2Nsb3VkLWNvbmZpZwpzeXN0ZW1faW5mbwogIGRlZmF1bHRfdXNlcjoKICAgIG5hbWU6IGZsaWdodApydW5jbWQ6CiAgLSBlY2hvICJzc2gtcnNhIE15U1NIcHVibGljS2V5SGVyZSB1c2VyQGhvc3QiID4+IC9ob21lL2ZsaWdodC8uc3NoL2F1dGhvcml6ZWRfa2V5cwo="
Note
The above command can be modified to override the other parameters mentioned at the beginning of this page. The 3 parameters used in the above command are the minimum required to bring a cluster up
Configure
/etc/hosts
on gateway and nodes:$ cat << EOF >> /etc/hosts 10.10.3.67 gateway1 10.10.8.10 node01 10.10.4.54 node02 EOF
Note
The IP addresses of your nodes may differ. Use the CLI tools or GUI to determine the internal IP addresses to be used in the hosts file.
- Setup passwordless root SSH to all compute nodes from the gateway. This can be done by generating a public key with
ssh-keygen
and adding it to/root/.ssh/authorized_keys
on the gateway and all other nodes.
Workflow: Azure¶
Overview¶
This workflow describes deploying cloud resources, consisting of:
- A ‘domain’ containing a network and security group
- A gateway node with internet access
- 2 to 8 compute nodes with internet access
Prerequisites¶
This document presumes the following situation:
- The appropriate cloud tool is installed and configured with access details
- There is enough availability in the upstream cloud region of your account to deploy these resources
- A suitable CentOS 7/8 source image is available in the cloud provider for basing nodes off of
- This source image should be at least 16GB in size to allow enough space for applications and data
Note
OpenFlight provides various images that are ready for cloud, these can be found at https://openflighthpc.org/images/
Template Parameters¶
There are multiple parameters in the template, some are required and others are optional, these are outlined below:
sourceimage
- The AMI ID to use for both the gateway and compute nodesclustername
- The name of the cluster, to be used as part of the FQDN for nodescustomdata
- A base64 encoded cloudinit string for both the gateway and compute nodescomputeNodesCount
- The number of compute nodes to deploy, a value between 2 and 8 (default: 2)gatewayinstancetype
- The instance type to be used for the gateway node (default: Standard DS1 v2)computeinstancetype
- The instance type to be used for the compute nodes (default: Standard DS1 v2)
Deploy Resources¶
Download the OpenFlight Azure template:
$ curl -o cluster.json https://raw.githubusercontent.com/openflighthpc/openflight-compute-cluster-builder/master/templates/azure/cluster.json
Generate base64 customdata string to set the default username to
flight
and add an SSH public key for authentication:$ DATA=$(cat << EOF #cloud-config system_info: default_user: name: flight runcmd: - echo "ssh-rsa MySSHpublicKeyHere user@host" >> /home/flight/.ssh/authorized_keys EOF ) $ echo "$DATA" |base64 -w 0 I2Nsb3VkLWNvbmZpZwpzeXN0ZW1faW5mbwogIGRlZmF1bHRfdXNlcjoKICAgIG5hbWU6IGZsaWdodApydW5jbWQ6CiAgLSBlY2hvICJzc2gtcnNhIE15U1NIcHVibGljS2V5SGVyZSB1c2VyQGhvc3QiID4+IC9ob21lL2ZsaWdodC8uc3NoL2F1dGhvcml6ZWRfa2V5cwo=
Note
The base64 value will differ from the above depending on the SSH key specified. It does not need to match the example output above.
Create resource group for the cluster:
$ az group create --name mycluster --location "UK South"
Deploy a cluster (with a gateway and 2 nodes):
$ az group deployment create --name mycluster --resource-group mycluster \ --template-file cluster.json \ --parameters sourceimage="SOURCE_IMAGE_PATH_HERE" \ clustername="mycluster" \ customdata="I2Nsb3VkLWNvbmZpZwpzeXN0ZW1faW5mbwogIGRlZmF1bHRfdXNlcjoKICAgIG5hbWU6IGZsaWdodApydW5jbWQ6CiAgLSBlY2hvICJzc2gtcnNhIE15U1NIcHVibGljS2V5SGVyZSB1c2VyQGhvc3QiID4+IC9ob21lL2ZsaWdodC8uc3NoL2F1dGhvcml6ZWRfa2V5cwo="
Note
The above command can be modified to override the other parameters mentioned at the beginning of this page. The 3 parameters used in the above command are the minimum required to bring a cluster up
- Setup passwordless root SSH to all compute nodes from the gateway. This can be done by generating a public key with
ssh-keygen
and adding it to/root/.ssh/authorized_keys
on the gateway and all other nodes.
Workflow: Ansible¶
Overview¶
This workflow describes configuring a simple HPC environment, consisting of:
- Shared NFS directories for users, data and applications
- SLURM queuing system for workload processing and management
- Flight Env for managing configurationg and applications available in the environment
Prerequisites¶
This document presumes the following situation:
- The cluster has a gateway node (for running various servers)
- The cluster has multiple compute nodes (for executing jobs)
- DNS is correctly configured to allow hostname connections between the nodes
- Firewall connections between the gateway and compute nodes are open to allow various services to communicate (e.g. queuing system, nfs, etc)
- SSH keys are correctly configured to allow the gateway to login to nodes (as root)
- There is sufficient storage space on the gateway and compute nodes (for applications and data, recommended 16GB+)
Configure Environment¶
Install ansible (>v2.8.0):
$ yum install -y epel-release $ yum install -y ansible
Create hosts file:
$ cat << EOF > /etc/ansible/hosts [gateway] gateway1 [compute] node01 node02 EOF
Setup playbook:
$ yum install -y git $ git clone https://github.com/openflighthpc/openflight-ansible-playbook
Warning
It is highly recommended to inspect all roles and edit them to your requirement or, alternatively, write your own roles. These roles are provided “as is” and no guarantee is made that the roles will function properly in environments different to that of the example environment used in this documentation.
Run playbook:
$ cd openflight-ansible-playbook $ ansible-playbook openflight.yml
Note
The playbook may hang trying to verify the SSH fingerprints of the hosts if none of them have been logged into before from the ansible host. It is recommended to have already established a trusted SSH connection to all systems first.
Workflow: Manual¶
Overview¶
This workflow describes configuring a simple HPC environment, consisting of:
- Shared NFS directories for users, data and applications
- SLURM queuing system for workload processing and management
- Flight Env for managing configurationg and applications available in the environment
Prerequisites¶
This document presumes the following situation:
- The cluster has a gateway node (for running various servers)
- The cluster has multiple compute nodes (for executing jobs)
- DNS is correctly configured to allow hostname connections between the nodes
- Firewall connections between the gateway and compute nodes are open to allow various services to communicate (e.g. queuing system, nfs, etc)
- SSH keys are correctly configured to allow the gateway to login to nodes (as root)
- There is sufficient storage space on the gateway and compute nodes (for applications and data, recommended 16GB+)
SLURM¶
Information on installing SLURM is available in the official SLURM documentation.
Flight User Suite¶
Information on manually installing the Flight User Suite is available in installation documentation.
OpenFlight Action¶
OpenFlight Action is a platform-agnostic management tool built with a server/client architecture. The server provides a simple, yet flexible, method for defining nodes and commands. The client provides a consistent, straightforward interface for performing actions on nodes of various platforms.
Utilisation of a server/client model allows for centralised configuration and distributed control of the nodes in a cluster, no matter what platform said nodes are running on. This can allow, for example, users to be able to perform superuser-level commands on many nodes from the client without needing sudo, giving the admin a greater level of control without needing to configure sudo on multiple systems.
Server¶
Overview¶
Project URL: https://github.com/openflighthpc/flight-action-api
The server provides a secure, centralised location for configuring platform-specific power commands and tracking the nodes in the cluster.
Configuration¶
Nodes¶
The nodes file is used to define the nodes in the system. An example of the content for this file is in /opt/flight/opt/action-api/co
nfig/nodes.example.yaml
, this is a useful reference for understanding the different ways that nodes can be configured.
Some example node entries are below:
node01:
ranks: [metal]
ip: 192.168.1.1
cnode01:
ranks: [aws]
ec2_id: i-1234567890
aws_region: eu-west-2
Besides ranks
, the key/value pairs for node entries are arbitrary and can be customised for whatever platform, use or metadata that fits your use case. The ranks
key is used during command lookup to run rank-specific variants of the command (if available).
Note
If no ranks are present then the default version of a command will be run
Commands¶
Commands are stored within /opt/flight/opt/action-api/libexec/
and are shell scripts. A command exists in a subdirectory of the aforementioned path. For example, a command called usage would be a directory at /opt/flight/opt/action-api/libexec/usage
and would contain, at least, the following files:
metadata.yaml
- Containing command descriptions and definitiondefault.sh
- The default script to run for the commands
Additionally, this directory can contain scripts for various arbitrary rank keys, such as:
aws.sh
- A version of the default script with amendments made to support AWS
Note
The script files need to be executable or they will not run
The metadata.yaml
file contains general command information and looks like the following:
help:
summary: "Get system usage info"
description: >
This command gets basic system usage information and reports it back
to the user
The default.sh
file runs a simple command and reports back on the load average of the system:
#!/bin/bash
LOAD="$(ssh $name 'uptime')"
echo "$name load and uptime: $LOAD"
The aws.sh
file includes the AWS instance ID in the script:
#!/bin/bash
LOAD="$(ssh $name 'uptime')"
echo "$name ($ec2_id) load and uptime: $LOAD"
Note
All node metadata is passed through as bash variables of the same name and case
Authentication¶
The server utilises JWT tokens to secure client access against unauthenticated clients attempting to interact with nodes. To generate a token that’s valid for 30 days, simple run:
flight action-api generate-token
This will print some information and then generate a token which can be set in the configuration of authorised clients.
Service¶
OpenFlight provides a service handler that hooks non-intrusively into the system beside systemd. In order for action requests from clients to be properly handled by the server, the action server needs to be started:
flight service start action-api
To ensure that the action server is running after reboot, enable it:
flight service enable action-api
Note
In order for the action server to be queryable, a webserver of some kind is needed. This could be a manual Apache/Nginx setup or by using the OpenFlight WWW service (flight service start www
). It’s also worth noting that only HTTPS is supported by the OpenFlight WWW service so ensure that it is suitably certified. The OpenFlight WWW service can assist with certificate generation, see flight www cert-gen
for more information.
Helpers¶
While the action server provides a generic framework for securely executing commands on nodes it’s a fairly blank slate to begin with. To address some of the common usages of Flight Action, there are various helper packages that can be installed to provide some commands that work out-of-the-box on various cloud & metal platforms.
Power¶
The OpenFlight package flight-action-api-power
provides power management commands for multiple platforms (IPMI, AWS & Azure). The specific commands it provides are:
power-off
- Power off a nodepower-on
- Power on a nodepower-cycle
- Cycle the power, reboot the nodepower-status
- Print the power status of the node
Estate¶
The OpenFlight package flight-action-api-estate
provides estate management commands for multiple platforms (AWS & Azure) for setting the instance size of cloud nodes. The specific commands it provides are:
estate-change
- Change the machine type of a nodeestate-show
- Show the machine type of a node
Client¶
Overview¶
Project URL: https://github.com/openflighthpc/flight-action
The Action client provides an integrated tool for communicating effectively with the API server.
Configuration¶
Before the client can be used it needs to be configured to look for the right server with the correct authentication token. An example configuration file can be found at /opt/flight/opt/action/etc/config.yaml.reference
. A simple configuration stored at /opt/flight/opt/action/etc/config.yaml
would be something like:
base_url: https://gateway1/action
jwt_token: 1a2b3c4d5e6f7g8h9i0j
Where base_url
is the hostname or IP address of the OpenFlight Action Server and jwt_token
is a valid token generated on the server. If using a self-signed SSL certificate the client will fail to run unless verify_ssl: false
is added to the configuration file.
Command Line¶
The command line provides a generalised client for accessing whatever commands have been created on the server, therefore there are only a couple of consistent subcommands for the client:
help
- The help page will show all available commands defined in the action serverestate-list
- This command lists all the nodes defined in the action server
When running an action from the command line - a nodename will be needed to direct the server to run the command on the correct system. To run for multiple nodes at once, use the -g
argument with a comma-separated list of nodes.
Helpers¶
To improve accessibility and ease-of-use for the client command line, there are helpers that provide shorter entrypoints for the additional content provided by the server helpers:
flight-power
- The entrypointpower
for managing node power state (flight power off node01
)flight-estate
- The entrypointestate
for managing node types (flight estate change node01
)
Workflow: AWS¶
This workflow is based on setting up the action server and client for an AWS cluster created as explained in the Cloud AWS Workflow.
Server Setup¶
Install Action Server¶
To install the action server on CentOS 7 x86_64, simply install the package with yum:
yum install flight-action-api
Note
This assumes that the OpenFlight Package Repositories are configured on the node
Install Power Scripts¶
To install the power scripts to jumpstart research environment management, install with yum:
yum install flight-action-api-power
Install AWS CLI¶
For managing the power state of the nodes, the AWS CLI will need to be installed with:
$ yum install awscli
Once installed, authenticate with AWS details:
$ aws configure
Add Nodes¶
Nodes are added to the config file at /opt/flight/opt/action-api/config/nodes.yaml
, below is an example of the node configuration for a simple 3 node cluster consisting of a gateway and two compute nodes:
gateway1:
ranks: [aws]
ec2_id: i-1234567890
aws_region: eu-west-2
node01:
ranks: [aws]
ec2_id: i-2345678901
aws_region: eu-west-2
node02:
ranks: [aws]
ec2_id: i-3456789012
aws_region: eu-west-2
Start Service¶
The OpenFlight Service utility is used to start the service:
$ flight service start action-api
Note
The general OpenFlight webserver service also needs to be running, this can be launched with flight service start www
Enable Service¶
The OpenFlight Service utility is used to enable the service:
$ flight service enable action-api
Note
In order to integrate the service enabling with systemd (to ensure the service starts at boot), the flight-plugin-system-systemd-service
package will need to be installed
Generate Token¶
To generate a 30-day authentication token for the client, run:
$ flight action-api generate-token
eyJhbGciOiJIUzI1NiJ9.eyJhZG1pbiI6bnVsbCwiZXhwIjoxNTgxODcyNjA2fQ.WXm-F07bAl78UTZHAsFKLXzYokBwVfgMabzIMgNfc5Y
This will return a token that can be entered into the config of a power client to authenticate against this server.
Client Setup¶
Install Action Client¶
To install the action client on CentOS 7 x86_64, simply install the package with yum:
yum install flight-action
Note
This assumes that the OpenFlight Package Repositories are configured on the node
Add Security Token¶
Add security token to /opt/flight/opt/action/etc/config.yaml
so the file looks similar to the following:
base_url: https://gateway1/action
jwt_token: eyJhbGciOiJIUzI1NiJ9.eyJhZG1pbiI6bnVsbCwiZXhwIjoxNTgxODcyNjA2fQ.WXm-F07bAl78UTZHAsFKLXzYokBwVfgMabzIMgNfc5Y
Note
If using a self-signed certificate then add verify_ssl: false
to the config
List Available Nodes¶
A list of the configured nodes can be seen from the client as so:
$ flight action estate-list
gateway1
node01
node02
Note
The flight
command can be executed with it’s full path (/opt/flight/bin/flight
) in the case that flight-starter
has not been installed to add the command to the environment.
Check Power Status of Nodes¶
To check the power status of all nodes, simply:
$ flight power status -g gateway1,node0[1-2]
gateway1: ON
node01: ON
node02: ON
Power Off a Node¶
A node can be powered off as follows:
$ flight power off node02
Power off node02, confirm to proceed [y/n]?
y
OK
Restart a Node¶
To power cycle a node, simply:
$ flight power cycle node02
Power cycle node02, confirm to proceed [y/n]?
y
OK
Workflow: Azure¶
This workflow is based on setting up the power server and client for an Azure cluster created as explained in the Cloud Azure Workflow
Server Setup¶
Install Action Server¶
To install the action server on CentOS 7 x86_64, simply install the package with yum:
yum install flight-action-api
Note
This assumes that the OpenFlight Package Repositories are configured on the node
Install Power Scripts¶
To install the power scripts to jumpstart research environment management, install with yum:
yum install flight-action-api-power
Install Azure CLI¶
Refer to the Azure CLI documentation for help with installing the repository and tool.
Once installed, authenticate the tool with your Azure account:
$ az login
Add Nodes¶
Nodes are added to the config file at /opt/flight/opt/action-api/config/nodes.yaml
, below is an example of the node configuration for a simple 3 node cluster consisting of a gateway and two compute nodes:
gateway1:
ranks: [azure]
azure_resource_group: mycluster-abcd123
azure_name: flightcloudclustergateway1
node01:
ranks: [azure]
azure_resource_group: mycluster-abcd123
azure_name: flightcloudclusternode01
node02:
ranks: [azure]
azure_resource_group: mycluster-abcd123
azure_name: flightcloudclusternode02
Start Service¶
The OpenFlight Service utility is used to start the service:
$ flight service start action-api
Note
The general OpenFlight webserver service also needs to be running, this can be launched with flight service start www
Enable Service¶
The OpenFlight Service utility is used to enable the service:
$ flight service enable action-api
Note
In order to integrate the service enabling with systemd (to ensure the service starts at boot), the flight-plugin-system-systemd-service
package will need to be installed
Generate Token¶
To generate a 30-day authentication token for the client, run:
$ flight action-api generate-token
eyJhbGciOiJIUzI1NiJ9.eyJhZG1pbiI6bnVsbCwiZXhwIjoxNTgxODcyNjA2fQ.WXm-F07bAl78UTZHAsFKLXzYokBwVfgMabzIMgNfc5Y
This will return a token that can be entered into the config of a power client to authenticate against this server.
Client Setup¶
Install Action Client¶
To install the action client on CentOS 7 x86_64, simply install the package with yum:
yum install flight-action
Note
This assumes that the OpenFlight Package Repositories are configured on the node
Add Security Token¶
Add security token to /opt/flight/opt/action/etc/config.yaml
so the file looks similar to the following:
base_url: https://gateway1/action
jwt_token: eyJhbGciOiJIUzI1NiJ9.eyJhZG1pbiI6bnVsbCwiZXhwIjoxNTgxODcyNjA2fQ.WXm-F07bAl78UTZHAsFKLXzYokBwVfgMabzIMgNfc5Y
Note
If using a self-signed certificate then add verify_ssl: false
to the config
List Available Nodes¶
A list of the configured nodes can be seen from the client as so:
$ flight action estate-list
gateway1
node01
node02
Note
The flight
command can be executed with it’s full path (/opt/flight/bin/flight
) in the case that flight-starter
has not been installed to add the command to the environment.
Check Power Status of Nodes¶
To check the power status of all nodes, simply:
$ flight power status -g gateway1,node0[1-2]
gateway1: ON
node01: ON
node02: ON
Power Off a Node¶
A node can be powered off as follows:
$ flight power off node02
Power off node02, confirm to proceed [y/n]?
y
OK
Restart a Node¶
To power cycle a node, simply:
$ flight power cycle node02
Power cycle node02, confirm to proceed [y/n]?
y
OK
Basic Research Environment Operation¶
Logging in¶
You can access the login node for your private Flight Compute research environment using SSH to connect to the externally facing IP address of the login node. You will need to use the SSH keypair configured for the research environment in order to access it.
When you login to the research environment via SSH, you are automatically placed in your home-directory. This area is shared across all compute nodes in the research environment, and is mounted in the same place on every compute. Data copied to the research environment or created in your home-directory on the login node is also accessible from all compute nodes.
Linux/Mac¶
To access the research environment login node from a Linux or Mac client, use the following command:
ssh -i mypublickey.pem flight@52.50.141.144
- Where:
mypublickey.pem
is the name of your private key associated with the public SSH key you set when launching the research environmentflight
is the username of the user on the research environment52.50.141.144
is the Access-IP address for the gateway node of the research environment
Windows¶
If you are accessing from a Windows client using the Putty utility, the private key associated with the account will need to be converted to ppk format from pem to be compatible with Putty. This can be done as follows:
- Open PuTTYgen (this will already be installed on your system if Putty was installed using .msi and not launched from the .exe - if you do not think you have this, download putty-installer from here http://www.chiark.greenend.org.uk/~sgtatham/putty/latest.html)
- Select Conversions -> Import Key
- Locate .pem file and click open
- Click Save Private Key
- Answer Yes to saving without a passphrase
- Input the name for the newly generated ppk to be saved as
To load the key in Putty, select Connection -> SSH -> Auth, click Browse and select the ppk that was generated from the above steps.

Next, enter the username and IP address of the research environment login node in the “Host Name” box provided (in the Session section):

The first time you connect to your research environment, you will be prompted to accept a new server SSH hostkey. This happens because you’ve never logged in to your research environment before - it should only happen the first time you login; click OK to accept the warning. Once connected to the research environment, you should be logged in to the research environment login node as your user.

Becoming the root user¶
Most research environment operations, including starting applications and running jobs, should be performed as the user created when the Flight Compute research environment was launched from the launch template. However - for some privileged operations, users may need to change to being the root user. Users can prefix any command they want to run as the root user with the sudo
command; e.g.
sudo yum install screen
For security reasons, SSH login as the root user is not permitted to a Flight Compute environment. To get a Linux shell with root privileges, please login as your standard user then execute the command sudo -s
.
Warning
Users must exercise caution when running commands as root, as they have the potential to disrupt research environment operations.
Moving between login and compute nodes¶
OpenFlight Compute research environments automatically configure a trust relationship between login and compute nodes in the same research environment to allow users to login between nodes via SSH without a password. This configuration allows moving quickly and easily between nodes, and simplifies running large-scale jobs that involve multiple nodes. From the command line, a user can simply use the ssh <node-name>
command to login to one of the compute nodes from the login node. For example, to login to a compute node named node01
from the login node, use the command:
ssh node01
Use the logout
command (or press CTRL+D) to exit the compute node and return to the login node.
Working with Data and Files¶
Organising Data on Your Research Environment¶
Your home directory¶
The shared filesystem includes the home-directory area for the flight
user which is created when your research environment is launched. Linux automatically places users in their home-directory when they login to a node. By default, Flight Compute will create your home-directory under the /users/
directory, named flight
(/users/flight
).
The Linux command line will accept the ~
(tilde) symbol as a substitute for the currently logged-in users’ home-directory. The environment variable $HOME
is also set to this value by default. Hence, the following three commands are all equivalent when logged in as the user flight:
ls /users/flight
ls ~
ls $HOME
The root user in Linux has special meaning as a privileged user, and does not have a shared home-directory across the research environment. The root account on all nodes has a home-directory in /root
, which is separate for every node. For security reasons, users are not permitted to login to a node as the root user directly - please login as a standard user and use the sudo
command to get privileged access.
Local scratch storage¶
Your compute nodes have an amount of disk space available to store temporary data under the /tmp
mount-point. This area is intended for temporary data created during compute jobs, and shouldn’t be used for long-term data storage. Compute nodes are configured to clear up temporary space automatically, removing orphan data left behind by jobs.
Users must make sure that they copy data they want to keep back to the shared filesystem after compute jobs have been completed.
Copying data between nodes¶
Note
This isn’t applicable to a single node research environment
Flight Compute research environment login and compute nodes all mount the shared filesystem, so it is not normally necessary to copy data directly between nodes in the research environment. Users simply need to place the data to be shared in their home-directory on the login node, and it will be available on all compute nodes in the same location.
If necessary, users can use the scp
command to copy files from the compute nodes to the login node; for example:
scp node01:/tmp/myfile.txt .
Alternatively, users could login to the compute node (e.g. ssh node01
) and copy the data back to the shared filesystem on the node:
ssh node01
cp /tmp/myfile ~/myfile
Copying data files to the research environment¶
Many compute workloads involve processing data on the research environment - users often need to copy data files to the research environment for processing, and retrieve processed data and results afterwards. This documentation describes a number of methods of working with data on your research environment, depending on how users prefer to transfer it.
Using command-line tools to copy data¶
The research environment login node is accessible via SSH, allowing use of the scp
and sftp
commands to transfer data from your local client machine.
Linux/Mac
Linux and Mac users can use in-built SSH support to copy files. To copy file mydata.zip to your research environment on IP address 52.48.62.34, use the command:
scp -i mykeyfile.pem mydata.zip flight@52.48.62.34:.
- replace
mykeyfile.pem
with the name of your SSH public key - replace
flight
with your username on the research environment
Windows
Windows users can download and install the pscp command to perform the same operation (for this you will need your .pem key in .ppk format, see connecting from Windows with Putty):
pscp -i mykeyfile.ppk mydata.zip flight@52.48.62.34:/users/flight/.
SCP/PSCP
Both the scp
and the pscp
commands take the parameter -r
to recursively copy entire directories of files to the research environment.
To retrieve files from the research environment, simply specify the location of the remote file first in the scp
command, followed by the location on the local system to put the file; e.g.
To copy file myresults.zip from your research environment on IP address 52.48.62.34 to your local Linux or Mac client:
scp -i mykeyfile.pem flight@52.48.62.34:/users/flight/myresults.zip .
Using a graphical client to copy data¶
There are also a number of graphical file-management interfaces available that support the SSH/SCP/SFTP protocols. A graphical interface can make it easier for new users to manage their data, as they provide a simple drag-and-drop interface that helps to visualise where data is being stored. The example below shows how to configure the WinSCP utility on a Windows client to allow data to be moved to and from a research environment.
- On a Windows client, download and install WinSCP
- Start WinSCP; in the login configuration box, enter the IP address of your Flight Compute research environment login node in the
Host name
box - Enter the username you configured for your research environment in the
User name
box (the default user isflight
) - Click on the
Advanced
box and navigate to theSSH
sub-menu, and theAuthentication
item - In the
Private key file
box, select your research environment access private key, and click theOK
box.

- Optionally click the
Save
button and give this session a name - Click the
Login
button to connect to your research environment - Accept the warning about adding a new server key to your cache; this message is displayed only once when you first connect to a new research environment
- WinSCP will login to your research environment; the window shows your local client machine on the left, and the research environment on the right
- To copy files to the research environment from your client, click and drag them from the left-hand window and drop them on the right-hand window
- To copy files from the research environment to your client, click and drag them from the right-hand window and drop them on the left-hand window

The amount of time taken to copy data to and from your research environment will depend on a number of factors, including:
- The size of the data being copied
- The speed of your Internet link to the research environment; if you are copying large amounts of data, try to connect using a wired connection rather than wireless
- The type and location of your research environment login node instance
Object storage for archiving data¶
As an alternative to copying data back to your client machine, users may prefer to upload their data to a cloud-based object storage service instead. Cloud storage solutions such as AWS S3, Dropbox and SWIFT have command-line tools which can be used to connect existing cloud storage to your research environment. Benefits of using an object-based storage service include:
- Data is kept safe and does not have to be independently backed-up
- Storage is easily scalable, with the ability for data to grow to practically any size
- You only pay for what you use; you do not need to buy expansion room in advance
- Storage service providers often have multiple tiers available, helping to reduce the cost of storing data
- Data storage and retrieval times may be improved, as storage service providers typically have more bandwidth than individual sites
- Your company, institution or facility may receive some storage capacity for free which you could use
Object storage is particularly useful for archiving data, as it typically provides a convenient, accessible method of storing data which may need to be shared with a wide group of individuals.
Saving data before terminating your research environment¶
When you’ve finished working with your OpenFlight Flight Compute research environment, you can select to terminate it in the console for your Cloud service. This will stop any running instances and wipe the shared storage area before returning the block storage volumes back to the provider. Before you shutdown your research environment, users must ensure that they store their data safely in a persistent service, using one of the methods described in this documentation. When you next launch a Flight Compute research environment, you can restore your data from the storage service to begin processing again.
Genders and PDSH¶
Note
Genders & PDSH functionality is not available or useful on a single node research environment, this page only applies to multi-node research environments
A combination of genders and pdsh can allow for management and monitoring of multiple nodes at a time. OpenFlight provides a build of PDSH that integrates with the rest of the User Suite.
Installing nodeattr and pdsh¶
Nodeattr and pdsh can be installed using the yum package manager, simply:
sudo yum -y install flight-pdsh
Once installed, the pdsh command will be available to active OpenFlight User Suite sessions (use flight start
to activate the session). The priority of the OpenFlight pdsh command in the path can be configured with:
flight config set pdsh.priority X
Where X
is one of:
system
- Prefer system PDSHembedded
- Prefer OpenFlight PDSHdisabled
- Do not include OpenFlight PDSH in PATH
Note
OpenFlight PDSH defaults system
priority, to ensure that the OpenFlight User Environment uses the OpenFlight PDSH, set the priority to embedded
after installation (flight config set pdsh.priority embedded
)
Finding the names of your compute nodes¶
An OpenFlight Compute research environment may contain any number of compute nodes depending on your research environment size. The hostnames of compute nodes usually follow a sequential order (e.g. node01, node02, node03… node10). OpenFlight Compute automatically creates a list of compute node names and uses them to populate a genders group called nodes. This genders file can be found at /opt/flight/etc/genders
.
Users can find the names of their compute nodes by using the nodeattr
command; e.g.
nodeattr -s nodes
- shows a space-separated list of current compute node hostnames
nodeattr -c nodes
- shows a comma-separated list of current compute node hostnames
nodeattr -n nodes
- shows a new-line-separated list of current compute node hostnames
The login node hostname for Flight Compute research environments launched using default templates is always gateway1
.
Using PDSH¶
Users can run a command across all compute nodes at once using the pdsh
command. This can be useful if users want to make a change to all nodes in the research environment - for example, installing a new software package. The pdsh
command can take a number of parameters that control how commands are processed; for example:
pdsh -g all uptime
- executes the
uptime
command on all available compute and login nodes in the research environment
pdsh -g nodes 'sudo yum -y install screen'
- use
yum
to install thescreen
package as the root user on all compute nodes
pdsh -g nodes -f 1 df -h /tmp
- executes the command
df -h /tmp
on all compute nodes of the research environment, one at a time (fanout=1)
pdsh -w node01,node03 which ldconfig
- runs the
which ldconfig
command on two named nodes only
Local Jobs¶
What is a Job?¶
A job can be loosely defined as an automated research task, for example, a bash script that runs various stages in an OpenFoam simulation on a model.
Jobs vary in size, resource usage and run time. A job could utilise multiple cores through parallel libraries or simply run on a single core.
Why Run a Local Job?¶
When using a personal research environment there isn’t a need to monitor resource usage as closely as multi-user systems where miscommunication and overloaded resources can negatively impact research progress. With all the resources available to one user it is quicker and easier to run jobs simply through a terminal than using a queue system.
Local jobs also have the benefit over schedulers by launching immediately and providing all output through a single terminal.
Running a Local Job¶
Local job scripts can be written in any language supported within the research environment. In most cases this is likely to be bash as it’s a flexible and functional shell scripting language which enables users to intuitely navigate the filesystem, launch applications and manage data output.
In the event that a job script is executable, contains a shebang specifying the launch language and is in the current directory, it’s as simple to run as:
[flight@gateway1 ~]$ ./myjob.sh
What is a Scheduler?¶
What is a batch job scheduler?¶
Most existing High-performance Compute Research Environments are managed by a job scheduler; also known as the batch scheduler, workload manager, queuing system or load-balancer. The scheduler allows multiple users to fairly share compute nodes, allowing system administrators to control how resources are made available to different groups of users. All schedulers are designed to perform the following functions:
- Allow users to submit new jobs to the research environment
- Allow users to monitor the state of their queued and running jobs
- Allow users and system administrators to control running jobs
- Monitor the status of managed resources including system load, memory available, etc.
When a new job is submitted by a user, the research environment scheduler software assigns compute cores and memory to satisfy the job requirements. If suitable resources are not available to run the job, the scheduler adds the job to a queue until enough resources are available for the job to run. You can configure the scheduler to control how jobs are selected from the queue and executed on research environment nodes, including automatically preparing nodes to run parallel MPI jobs. Once a job has finished running, the scheduler returns the resources used by the job to the pool of free resources, ready to run another user job.
Types of compute job¶
Users can run a number of different types of job via the research environment scheduler, including:
- Batch jobs; single-threaded applications that run only on one compute core
- Array jobs; two or more similar batch jobs which are submitted together for convenience
- SMP or multi-threaded jobs; multi-threaded applications that run on two or more compute cores on the same compute node
- Parallel jobs; multi-threaded applications making use of an MPI library to run on multiple cores spread over one or more compute nodes
The research environment job-scheduler is responsible for finding compute nodes in your research environment to run all these different types of jobs on. It keeps track of the available resources and allocates jobs to individual groups of nodes, making sure not to over-commit CPU and memory. The example below shows how a job-scheduler might allocate jobs of different types to a group of 8-CPU-core compute nodes:

Interactive and batch jobs¶
Users typically interact with compute research environments by running either interactive or batch (also known as non-interactive) jobs.
- An interactive job is one that the user directly controls, either via a graphical interface or by typing at the command-prompt.
- A batch job is run by writing a list of instructions that are passed to compute nodes to run at some point in the future.
Both methods of running jobs can be equally as efficient, particularly on a personal, ephemeral research environment. Both classes of job can be of any type - for example, it’s possible to run interactive parallel jobs and batch multi-threaded jobs across your research environment. The choice of which class of job-type you want to use will depend on the application you’re running, and which method is more convenient for you to use.
Why use a job-scheduler on a personal research environment?¶
Good question. On shared multi-user research environments, a job-scheduler is often used as a control mechanism to make sure that users don’t unfairly monopolise the valuable compute resources. In extreme cases, the scheduler may be wielded by system administrators to force “good behaviour” in a shared environment, and can feel like an imposition to research environment users.
With your own personal research environment, you have the ability to directly control the resources available for your job - you don’t need a job-scheduler to limit your usage.
However - there are a number of reasons why your own job-scheduler can still be a useful tool in your research environment:
- It can help you organise multi-stage work flows, with batch jobs launching subsequent jobs in a defined process.
- It can automate launching of MPI jobs, finding available nodes to run applications on.
- It can help prevent accidentally over-allocating CPUs or memory, which could lead to nodes failing.
- It can help bring discipline to the environment, providing a consistent method to replicate the running of jobs in different environments.
- Jobs queued in the scheduler can be used to trigger scaling-up the size of your research environment, with compute nodes released from the research environment when there are no jobs to run, saving you money.
Your OpenFlight Flight Compute research environment comes with a job-scheduler pre-installed, ready for you to start using. The scheduler uses very few resources when idle, so you can choose to use it if you find it useful, or run jobs manually across your research environment if you prefer.
Slurm Scheduler¶
The Slurm research environment job-scheduler is an open-source project used by many high performance computing systems around the world - including many of the TOP 500 supercomputers.
Running an interactive job¶
Note
If using a single node research environment with a scheduler then interactive jobs are unnecessary as the only available resources will be local
You can start a new interactive job on your Flight Compute research environment by using the srun
command; the scheduler will search for an available compute node, and provide you with an interactive login shell on the node if one is available.
[centos@gateway1 (scooby) ~]$ srun --pty /bin/bash
[centos@node01 (scooby) ~]$
[centos@node01 (scooby) ~]$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
3 all bash centos R 0:39 1 node01
In the above example, the srun
command is used together with two options: --pty
and /bin/bash
. The --pty
option executes the task in pseudo terminal mode, allowing the session to act like a standard terminal session. The /bin/bash
option is the command that you wish to run - here the default Linux shell, BASH.
Alternatively, the srun
command can also be executed from an interactive desktop session; the job-scheduler will automatically find an available compute node to launch the job on. Applications launched from within the srun
session are executed on the assigned research environment compute node.

Note
The Slurm scheduler does not automatically set up your session to allow you to run graphical applications inside an interactive session. Once your interactive session has started, you must run the following command before running a graphical application: export DISPLAY=gateway1$DISPLAY
Warning
Running X applications from a compute node may not work due to missing X libraries on the compute node, these can be installed from an SSH session into a compute node with sudo yum groupinstall "X Window System"
When you’ve finished running your application in your interactive session, simply type logout
, or press Ctrl+D to exit the interactive job.
If the job-scheduler could not satisfy the resource you’ve requested for your interactive job (e.g. all your available compute nodes are busy running other jobs), it will report back after a few seconds with an error:
[centos@gateway1 (scooby) ~]$ srun --pty /bin/bash
srun: job 20 queued and waiting for resources
Submitting a batch job¶
Batch (or non-interactive) jobs allow users to leverage one of the main benefits of having a research environment scheduler; jobs can be queued up with instructions on how to run them and then executed across the research environment while the user does something else. Users submit jobs as scripts, which include instructions on how to run the job - the output of the job (stdout and stderr in Linux terminology) is written to a file on disk for review later on. You can write a batch job that does anything that can be typed on the command-line.
We’ll start with a basic example - the following script is written in bash (the default Linux command-line interpreter). You can create the script yourself using the Nano command-line editor - use the command nano simplejobscript.sh
to create a new file, then type in the contents below. The script does nothing more than print some messages to the screen (the echo lines), and sleeps for 120 seconds. We’ve saved the script to a file called simplejobscript.sh
- the .sh
extension helps to remind us that this is a shell script, but adding a filename extension isn’t strictly necessary for Linux.
#!/bin/bash -l
echo "Starting running on host $HOSTNAME"
sleep 120
echo "Finished running - goodbye from $HOSTNAME"
Note
We use the -l
option to bash on the first line of the script to request a login session. This ensures that environment modules can be loaded as required as part of your script.
We can execute that script directly on the login node by using the command bash simplejobscript.sh
- after a couple of minutes, we get the following output:
Started running on host gateway1
Finished running - goodbye from gateway1
To submit your job script to the research environment job scheduler, use the command sbatch simplejobscript.sh
. The job scheduler should immediately report the job-ID for your job; your job-ID is unique for your current OpenFlight Flight Compute research environment - it will never be repeated once used.
[centos@gateway1 (scooby) ~]$ sbatch simplejobscript.sh
Submitted batch job 21
[centos@gateway1 (scooby) ~]$ ls
simplejobscript.sh slurm-21.out
[centos@gateway1 (scooby) ~]$ cat slurm-21.out
Starting running on host node01
Finished running - goodbye from node01
Viewing and controlling queued jobs¶
Once your job has been submitted, use the squeue
command to view the status of the job queue. If you have available compute nodes, your job should be shown in the R
(running) state; if your compute nodes are busy, or you’ve launched an auto-scaling research environment and currently have no running nodes, your job may be shown in the PD
(pending) state until compute nodes are available to run it. If a job is in PD
state - the reason for being unable to run will be displayed in the NODELIST(REASON)
column of the squeue
output.
[centos@gateway1 (scooby) ~]$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
41 all simplejo centos R 0:03 1 node01
42 all simplejo centos R 0:00 1 node01
You can keep running the squeue
command until your job finishes running and disappears from the queue. The output of your batch job will be stored in a file for you to look at. The default location to store the output file is your home directory. You can use the Linux more
command to view your output file:
[centos@gateway1 (scooby) ~]$ more slurm-42.out
Starting running on host node01
Finished running - goodbye from node01
Your job runs on whatever node the scheduler can find which is available for use - you can try submitting a bunch of jobs at the same time, and using the squeue
command to see where they run. The scheduler is likely to spread them around over different nodes (if you have multiple nodes). The login node is not included in your research environment for scheduling purposes - jobs submitted to the scheduler will only be run on your research environment compute nodes. You can use the scancel <job-ID>
command to delete a job you’ve submitted, whether it’s running or still in the queued state.
[centos@gateway1 (scooby) ~]$ sbatch simplejobscript.sh
Submitted batch job 46
[centos@gateway1 (scooby) ~]$ sbatch simplejobscript.sh
Submitted batch job 47
[centos@gateway1 (scooby) ~]$ sbatch simplejobscript.sh
Submitted batch job 48
[centos@gateway1 (scooby) ~]$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
43 all simplejo centos R 0:04 1 node01
44 all simplejo centos R 0:04 1 node01
45 all simplejo centos R 0:04 1 node02
46 all simplejo centos R 0:04 1 node02
47 all simplejo centos R 0:04 1 node03
48 all simplejo centos R 0:04 1 node03
[centos@gateway1 (scooby) ~]$ scancel 47
[centos@gateway1 (scooby) ~]$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
43 all simplejo centos R 0:11 1 node01
44 all simplejo centos R 0:11 1 node01
45 all simplejo centos R 0:11 1 node02
46 all simplejo centos R 0:11 1 node02
48 all simplejo centos R 0:11 1 node03
Viewing compute host status¶
Users can use the sinfo -Nl
command to view the status of compute node hosts in your Flight Compute research environment.
[centos@gateway1 (scooby) ~]$ sinfo -Nl
Fri Aug 26 14:46:34 2016
NODELIST NODES PARTITION STATE CPUS S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON
node01 1 all* idle 2 2:1:1 3602 20462 1 (null) none
node02 1 all* idle 2 2:1:1 3602 20462 1 (null) none
node03 1 all* idle 2 2:1:1 3602 20462 1 (null) none
node04 1 all* idle 2 2:1:1 3602 20462 1 (null) none
node05 1 all* idle 2 2:1:1 3602 20462 1 (null) none
node06 1 all* idle 2 2:1:1 3602 20462 1 (null) none
node07 1 all* idle 2 2:1:1 3602 20462 1 (null) none
The sinfo
output will show (from left-to-right):
- The hostname of your compute nodes
- The number of nodes in the list
- The node partition the node belongs to
- Current usage of the node - if no jobs are running, the state will be listed as
idle
. If a job is running, the state will be listed asallocated
- The detected number of CPUs (including hyper-threaded cores)
- The number of sockets, cores and threads per node
- The amount of memory in MB per node
- The amount of disk space in MB available to the /tmp partition per node
- The scheduler weighting
Default resources¶
In order to promote efficient usage of your research environment, the job-scheduler automatically sets a number of default resources for your jobs when you submit them. These defaults must be overridden by users to help the scheduler understand how you want it to run your job - if we don’t include any instructions to the scheduler, then our job will take the defaults shown below:
- Number of CPU cores for your job:
1
- Number of nodes for your job: the default behavior is to allocate enough nodes to satisfy the requirements of the number of CPUs requested
You can view all default resource limits by running the following command:
[root@gateway1(slurm) ~]# scontrol show config | grep Def
CpuFreqDef = Unknown
DefMemPerNode = UNLIMITED
MpiDefault = none
SallocDefaultCommand = (null)
This documentation will explain how to change these limits to suit the jobs that you want to run. You can also disable these limits if you prefer to control resource allocation manually by yourself.
Controlling resources¶
In order to promote efficient usage of the research environment - the job-scheduler is automatically configured with default run-time limits for jobs. These defaults can be overridden by users to help the scheduler understand how you want it to run your job. If we don’t include any instructions to the scheduler then the default limits are applied to a job.
Job instructions can be provided in two ways; they are:
- On the command line, as parameters to your
sbatch
orsrun
command. For example, you can set the name of your job using the--job-name=[name] | -J [name]
option:
[centos@gateway1 (scooby) ~]$ sbatch --job-name=mytestjob simplejobscript.sh
Submitted batch job 51
[centos@gateway1 (scooby) ~]$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
51 all mytestjo centos R 0:02 1 node01
- In your job script, by including scheduler directives at the top of your job script - you can achieve the same effect as providing options with the
sbatch
orsrun
commands. Create an example job script or modify your existing script to include a scheduler directive to use a specified job name:
#!/bin/bash -l
#SBATCH --job-name=mytestjob
echo "Starting running on host $HOSTNAME"
sleep 120
echo "Finished running - goodbye from $HOSTNAME"
Including job scheduler instructions in your job-scripts is often the most convenient method of working for batch jobs - follow the guidelines below for the best experience:
- Lines in your script that include job-scheduler directives must start with
#SBATCH
at the beginning of the line- You can have multiple lines starting with
#SBATCH
in your job-script, with normal script lines in-between- You can put multiple instructions separated by a space on a single line starting with
#SBATCH
- The scheduler will parse the script from top to bottom and set instructions in order; if you set the same parameter twice, the second value will be used.
- Instructions are parsed at job submission time, before the job itself has actually run. This means you can’t, for example, tell the scheduler to put your job output in a directory that you create in the job-script itself - the directory will not exist when the job starts running, and your job will fail with an error.
- You can use dynamic variables in your instructions (see below)
Dynamic scheduler variables¶
Your research environment job scheduler automatically creates a number of pseudo environment variables which are available to your job-scripts when they are running on research environment compute nodes, along with standard Linux variables. Useful values include the following:
$HOME
The location of your home-directory$USER
The Linux username of the submitting user$HOSTNAME
The Linux hostname of the compute node running the job%a / $SLURM_ARRAY_TASK_ID
Job array ID (index) number. The%a
substitution should only be used in your job scheduler directives%A / $SLURM_ARRAY_JOB_ID
Job allocation number for an array job. The%A
substitution should only be used in your job scheduler directives%j / $SLURM_JOBID
Job allocation number. The%j
substitution should only be used in your job scheduler directives
Simple scheduler instruction examples¶
Here are some commonly used scheduler instructions, along with some example of their usage:
Setting output file location¶
To set the output file location for your job, use the -o [file_name] | --output=[file_name]
option - both standard-out and standard-error from your job-script, including any output generated by applications launched by your job-script will be saved in the filename you specify.
By default, the scheduler stores data relative to your home-directory - but to avoid confusion, we recommend specifying a full path to the filename to be used. Although Linux can support several jobs writing to the same output file, the result is likely to be garbled - it’s common practice to include something unique about the job (e.g. it’s job-ID) in the output filename to make sure your job’s output is clear and easy to read.
Note
The directory used to store your job output file must exist and be writable by your user before you submit your job to the scheduler. Your job may fail to run if the scheduler cannot create the output file in the directory requested.
The following example uses the --output=[file_name]
instruction to set the output file location:
#!/bin/bash -l #SBATCH --job-name=myjob --output=output.%j echo "Starting running on host $HOSTNAME" sleep 120 echo "Finished running - goodbye from $HOSTNAME"
In the above example, assuming the job was submitted as the centos
user and was given the job-ID number 24
, the scheduler will save the output data from the job in the filename /home/centos/output.24
.
Setting working directory for your job¶
By default, jobs are executed from your home-directory on the research environment (i.e. /home/<your-user-name>
, $HOME
or ~
). You can include cd
commands in your job-script to change to different directories; alternatively, you can provide an instruction to the scheduler to change to a different directory to run your job. The available options are:
-D | --workdir=[dir_name]
- instruct the job scheduler to move into the directory specified before starting to run the job on a compute node
Note
The directory specified must exist and be accessible by the compute node in order for the job you submitted to run.
Waiting for a previous job before running¶
You can instruct the scheduler to wait for an existing job to finish before starting to run the job you are submitting with the -d [state:job_id] | --depend=[state:job_id]
option. For example, to wait until the job with ID 75 has finished before starting the job, you could use the following syntax:
[centos@gateway1 (scooby) ~]$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
75 all myjob centos R 0:01 1 node01
[centos@gateway1 (scooby) ~]$ sbatch --dependency=afterok:75 mytestjob.sh
Submitted batch job 76
[centos@gateway1 (scooby) ~]$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
76 all myjob centos PD 0:00 1 (Dependency)
75 all myjob centos R 0:15 1 node01
Running task array jobs¶
A common workload is having a large number of jobs to run which basically do the same thing, aside perhaps from having different input data. You could generate a job-script for each of them and submit it, but that’s not very convenient - especially if you have many hundreds or thousands of tasks to complete. Such jobs are known as task arrays - an embarrassingly parallel job will often fit into this category.
A convenient way to run such jobs on a research environment is to use a task array, using the -a [array_spec] | --array=[array_spec]
directive. Your job-script can then use the pseudo environment variables created by the scheduler to refer to data used by each task in the job. The following job-script uses the $SLURM_ARRAY_TASK_ID
/%a
variable to echo its current task ID to an output file:
#!/bin/bash -l
#SBATCH --job-name=array
#SBATCH -D $HOME/
#SBATCH --output=output.array.%A.%a
#SBATCH --array=1-1000
echo "I am $SLURM_ARRAY_TASK_ID from job $SLURM_ARRAY_JOB_ID"
[centos@gateway1 (scooby) ~]$ sbatch arrayjob.sh
Submitted batch job 77
[centos@gateway1 (scooby) ~]$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
77_[85-1000] all array centos PD 0:00 1 (Resources)
77_71 all array centos R 0:00 1 node03
77_72 all array centos R 0:00 1 node06
77_73 all array centos R 0:00 1 node03
77_74 all array centos R 0:00 1 node06
77_75 all array centos R 0:00 1 node07
77_76 all array centos R 0:00 1 node07
77_77 all array centos R 0:00 1 node05
77_78 all array centos R 0:00 1 node05
77_79 all array centos R 0:00 1 node02
77_80 all array centos R 0:00 1 node04
77_81 all array centos R 0:00 1 node01
77_82 all array centos R 0:00 1 node01
77_83 all array centos R 0:00 1 node02
77_84 all array centos R 0:00 1 node04
All tasks in an array job are given a job ID with the format [job_ID]_[task_number]
e.g. 77_81
would be job number 77, array task 81.
Array jobs can easily be cancelled using the scancel
command - the following examples show various levels of control over an array job:
scancel 77
- Cancels all array tasks under the job ID
77
scancel 77_[100-200]
- Cancels array tasks
100-200
under the job ID77
scancel 77_5
- Cancels array task
5
under the job ID77
Requesting more resources¶
By default, jobs are constrained to the default set of resources - users can use scheduler instructions to request more resources for their jobs. The following documentation shows how these requests can be made.
Running multi-threaded jobs¶
If users want to use multiple cores on a compute node to run a multi-threaded application, they need to inform the scheduler - this allows jobs to use multiple cores without needing to rely on any interconnect. Using multiple CPU cores is achieved by specifying the -n, --ntasks=<number>
option in either your submission command or the scheduler directives in your job script. The --ntasks
option informs the scheduler of the number of cores you wish to reserve for use. If the parameter is omitted, the default --ntasks=1
is assumed. You could specify the option -n 4
to request 4 CPU cores for your job. Besides the number of tasks, you will need to add --nodes=1
to your scheduler command or at the top of your job script with #SBATCH --nodes=1
, this will set the maximum number of nodes to be used to 1 and prevent the job selecting cores from multiple nodes.
Note
If you request more cores than are available on a node in your research environment, the job will not run until a node capable of fulfilling your request becomes available. The scheduler will display the error in the output of the squeue
command
Running Parallel (MPI) jobs¶
If users want to run parallel jobs via a messaging passing interface (MPI), they need to inform the scheduler - this allows jobs to be efficiently spread over compute nodes to get the best possible performance. Using multiple CPU cores across multiple nodes is achieved by specifying the -N, --nodes=<minnodes[-maxnodes]>
option - which requests a minimum (and optional maximum) number of nodes to allocate to the submitted job. If only the minnodes
count is specified - then this is used for both the minimum and maximum node count for the job.
You can request multiple cores over multiple nodes using a combination of scheduler directives either in your job submission command or within your job script. Some of the following examples demonstrate how you can obtain cores across different resources;
--nodes=2 --ntasks=16
- Requests 16 cores across 2 compute nodes
--nodes=2
- Requests all available cores of 2 compute nodes
--ntasks=16
- Requests 16 cores across any available compute nodes
For example, to use 64 CPU cores on the research environment for a single application, the instruction --ntasks=64
can be used. The following example shows launching the Intel Message-passing MPI benchmark across 64 cores on your research environment. This application is launched via the OpenMPI mpirun
command - the number of threads and list of hosts are automatically assembled by the scheduler and passed to the MPI at runtime. This jobscript loads the apps/imb
module before launching the
application, which automatically loads the module for OpenMPI.
#!/bin/bash -l
#SBATCH -n 64
#SBATCH --job-name=imb
#SBATCH -D $HOME/
#SBATCH --output=imb.out.%j
module load apps/imb
mpirun --prefix $MPI_HOME \
IMB-MPI1
We can then submit the IMB job script to the scheduler, which will automatically determine which nodes to use:
[centos@gateway1 (scooby) ~]$ sbatch imb.sh
Submitted batch job 1162
[centos@gateway1 (scooby) ~]$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
1162 all imb centos R 0:01 8 ip-10-75-1-[42,45,62,67,105,178,233,250]
[centos@gateway1 (scooby) ~]$ cat imb.out.1162
#------------------------------------------------------------
# Intel (R) MPI Benchmarks 4.0, MPI-1 part
#------------------------------------------------------------
# Date : Tue Aug 30 10:34:08 2016
# Machine : x86_64
# System : Linux
# Release : 3.10.0-327.28.3.el7.x86_64
# Version : #1 SMP Thu Aug 18 19:05:49 UTC 2016
# MPI Version : 3.0
# MPI Thread Environment:
#---------------------------------------------------
# Benchmarking PingPong
# #processes = 2
# ( 62 additional processes waiting in MPI_Barrier)
#---------------------------------------------------
#bytes #repetitions t[usec] Mbytes/sec
0 1000 3.17 0.00
1 1000 3.20 0.30
2 1000 3.18 0.60
4 1000 3.19 1.19
8 1000 3.26 2.34
16 1000 3.22 4.74
32 1000 3.22 9.47
64 1000 3.21 19.04
128 1000 3.22 37.92
256 1000 3.30 73.90
512 1000 3.41 143.15
1024 1000 3.55 275.36
2048 1000 3.75 521.04
4096 1000 10.09 387.14
8192 1000 11.12 702.51
16384 1000 12.06 1296.04
32768 1000 14.65 2133.32
65536 640 19.30 3238.72
131072 320 29.50 4236.83
262144 160 48.17 5189.77
524288 80 84.36 5926.88
1048576 40 157.40 6353.32
2097152 20 305.00 6557.31
4194304 10 675.20 5924.16
Note
If you request more CPU cores than your research environment can accommodate, your job will wait in the queue. If you are using the Flight Compute auto-scaling feature, your job will start to run once enough new nodes have been launched.
Requesting more memory¶
In order to promote best use of the research environment scheduler - particularly in a shared environment, it is recommended to inform the scheduler the maximum required memory per submitted job. This helps the scheduler appropriately place jobs on the available nodes in the research environment.
You can specify the maximum amount of memory required per submitted job with the --mem=<MB>
option. This informs the scheduler of the memory required for the submitted job. Optionally - you can also request an amount of memory per CPU core rather than a total amount of memory required per job. To specify an amount of memory to allocate per core, use the --mem-per-cpu=<MB>
option.
Note
When running a job across multiple compute hosts, the --mem=<MB>
option informs the scheduler of the required memory per node
Requesting a longer runtime¶
In order to promote best-use of the research environment scheduler, particularly in a shared environment, it is recommend to inform the scheduler the amount of time the submitted job is expected to take. You can inform the research environment scheduler of the expected runtime using the -t, --time=<time>
option. For example - to submit a job that runs for 2 hours, the following example job script could be used:
#!/bin/bash -l
#SBATCH --job-name=sleep
#SBATCH -D $HOME/
#SBATCH --time=0-2:00
sleep 7200
You can then see any time limits assigned to running jobs using the command squeue --long
:
[centos@gateway1 (scooby) ~]$ squeue --long
Tue Aug 30 10:55:55 2016
JOBID PARTITION NAME USER STATE TIME TIME_LIMI NODES NODELIST(REASON)
1163 all sleep centos RUNNING 0:07 2:00:00 1 ip-10-75-1-42
Further documentation¶
This guide is a quick overview of some of the many available options of the SLURM research environment scheduler. For more information on the available options, you may wish to reference some of the following available documentation for the demonstrated SLURM commands;
- Use the
man squeue
command to see a full list of scheduler queue instructions- Use the
man sbatch/srun
command to see a full list of scheduler submission instructions- Online documentation for the SLURM scheduler is available here