DIMS Administrator Guide v 0.1.18

This document (version 0.1.18) covers issues related to system administration of DIMS components from an administrator’s perspective.

Introduction

This chapter introduces the system administration policies, methodology for configuration file management, automated installation and configuration of DIMS components using Ansible, and use of continuous integration mechanisms used for deployment and testing of DIMS components.

This document is closely related to the DIMS Developer Guide v 1.0.0, which covers a number of related tasks and steps that will not be repeated here (rather, will be cross-referenced using intersphinx links.)

  • All documentation for the DIMS project is written using restructured text (reST) and Sphinx. Section Documenting DIMS Components of the DIMS Developer Guide v 1.0.0 covers how to use these tools for producing professional looking and cross-referenced on-line (HTML) and off-line (PDF) documentation.
  • DIMS software – including Ansible playbooks for installation and configuration of DIMS system components, Packer, Vagrant, and Docker subsystem creation scripts, are all maintained under version control using Git and the HubFlow methodology and tool set. Section Source Code Management with Git of the DIMS Developer Guide v 1.0.0 covers how these tools are used for source code, documentation, and system configuration files.
  • Changes to source code that are pushed to Git repositories trigger build processes using the Jenkins continuous integration environment. These triggers build and/or deploy software to specified locations, run tests, and/or configure service components. In most cases, Ansible is used as part of the process driven by Jenkins. Section Continuous Integration of the DIMS Developer Guide v 1.0.0 provides an overview of how this works and how to use it in development and testing DIMS components.
  • System software installation and configuration of DIMS components are managed using Ansible playbooks that are in turn maintained in Git repositories. Only a bare minimum of manual steps are required to bootstrap a DIMS deployment. After that, configuration changes are made to Git repositories and those changes trigger continuous integration processes to get these changes into the running system. Section Deployment and Configuration of the DIMS Developer Guide v 1.0.0 covers how to use this framework for adding or managing the open source components that are used in a DIMS deployment.

Overview

This document is focused on the system administrative tasks that are involved in adding open source software components to the DIMS framework, how to convert installation instructions into Ansible playbooks or Dockerfile instructions that can be used to instantiate a service or microservice, how a complete DIMS instance (i.e., a complementary set of service and microservice components that function together as a coherent system) is installed, configured, debugged and/or tuned, and kept in running order over time.

Referenced documents

  1. DIMS Developer Guide v 1.0.0
  2. ansibleinventory:ansibleinventory
  3. ansibleplaybooks:ansibleplaybooks
  4. dimsdockerfiles:usingdockerindims
  5. dimsdockerfiles:dockerincoreos
  6. dimspacker:dimspacker
  7. dimsciutils:dimsciutilities
  8. dimssr:dimssystemrequirements
  9. DIMS Architecture Design v 2.10.0
  10. dittrich:homepage home page.

Onboarding Developers

This chapter covers the process for onboarding new developers to provide them access to DevOps components necessary to work on elements of a DIMS deployment. In short, developers (and system administrators) will need the following:

  • An account in the Trident portal system for access to email lists, etc.
  • A GPG/PGP key pair. The public key will be loaded into the Trident portal so others can access the key and so it can be used for encrypted email.
  • A Google account for OpenID Connect authentication used for single-signon access to internal resources, along with an LDAP database entry that links to this Google account.
  • SSH public/private key pairs allowing access to Git repositories, Ansible control host, DIMS system components, etc.
  • Initial copies of Git repositories used to develop and build a DIMS deployment instance.

Once all of these resources have been procured, developers or system administrators are ready to work on a DIMS instance.

Initial Account Setup

The first step in adding a new DIMS developer is getting them set up with an account on our internal ops-trust portal instance.

Note

We will transition to using Trident, rather than the old Ops-Trust portal code base initially set up for this project, as soon as we are able. Trident has an internal wiki, so the FosWiki server mentioned here will also be retired.

Our FosWiki server has a page that was dedicated to the steps necessary for Provisioning New DIMS Users.

Caution

The FosWiki page Provisioning New DIMS Users looks like it may be out of date, or include steps that may not be necessary for just adding a new user. It has a huge number of steps that should be made more streamlined or added to the DIMS web app to simplify the process of adding and removing DIMS users in concert with the ops-trust portal at the center of DIMS.

Once the user has been given their password to the ops-trust portal, they need to change their MemberID to match the account name that should be used within DIMS. (E.g., Dave Dittrich may be given the MemberID of davedittrich2475 by the portal, but the desired account name within DIMS subsystems should be dittrich.)

GPG Encryption Keys for Email, etc.

Each ops-trust portal account holder needs a GPG key to be able to send/receive encrypted emails. In normal operation, one’s ops-trust portal account is not fully enabled until the user has uploaded their GPG key.

One of the easiest ways to process GPG-encrypted email is using Enigmail with the The GNU Privacy Guard from the Thunderbird email client. Follow the Enigmail Quick Start Guide to install, configure, and generate a GPG key for use with Thunderbird (which is supported on Mac, Linux, and Windows, and is installed by default on the DIMS Ubuntu developer laptops).

After you have set up The GNU Privacy Guard and uploaded your key, log in to the ops-trust portal and select PGP Keys from the menu on the left of the screen to download all GPG keys for other portal users and all email lists to which you subscribe.

Note

This step will only download keys that are in the system at the time you press the link, which means they will get out-of-date with respect to new users, regenerated keys, and/or new email lists that may be created over time. Get in the habit of updating your GPG key ring regularly, or at least remember that failure to encrypt/decrypt and email may be due to your keyring being out of date and needing a refresh.

Creating accounts

After a new user has successfully set up their ops-trust portal account and modified their MemberID to align with their desired DIMS account name, they must be added to the dims_users array in the $GIT/ansible-playbooks/group_vars/all file. Once added, the Ansible playbook roles that generate DIMS user accounts (e.g., dims-users-create) can be played to create accounts as needed.

Installing initial SSH key(s)

Before someone can clone Git repositories, or use SSH to log in to DIMS systems for interactive shell access, they must (a) have a DIMS SSH key, and (b) have the public key and authorized_keys file(s) on target systems set up properly.

  1. Create the user’s DIMS SSH key pair...

  2. Generate accounts using Ansible playbook ($whatever), which creates the accounts and installs their public key.

  3. Copy their key pair into the account on the system where they will be doing their development (i.e., a DIMS developer laptop, Vagrant virtual machine, or bare-metal workstation.) Also make sure their key is included in the authorized_keys file in the git account on git.devops.develop in order for them to be able to read/write source code using Git.

  4. Trigger a Jenkins build job for public-keys-configure to push the new user’s key to all DIMS-DevOps and DIMS-OPS systems.

  5. Set the password on the account they are supposed to use so they can log in to it, and/or securely transfer their public SSH key to them so they can use it to access the account without needing a password.

    Note

    They will need a password on the account for sudo on commands like dims-ci-utils.install.user that ask for the sudo password in order to pass it to Ansible.

Use command passwd <username>.

[dimsenv] mboggess@b52:~ () $ passwd mboggess
Changing password for mboggess.
(current) UNIX password:
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully

Remote Account Setup

This section details how to set up a new account for a current developer on a remote machine, after being logged in to the remote machine.

Change password

Use command passwd <username>.

[dimsenv] mboggess@b52:~ () $ passwd mboggess
Changing password for mboggess.
(current) UNIX password:
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully

Transfer SSH Keys to Remote Machine

  • Once logged in to remote machine, check ~/.ssh/authorized_keys file for public key:

    [dimsenv] mboggess@b52:~ () $ cd .ssh
    [dimsenv] mboggess@b52:~/.ssh () $ ls
    authorized_keys  config  known_hosts
    [dimsenv] mboggess@b52:~/.ssh () $ vim authorized_keys
    
  • Securely transfer DIMS RSA keys from local machine to remote machine

    Keys are located in ~/.ssh/ and should be named:

    • dims_${dimsusername}_rsa for private key
    • dims_${dimsusername}rsa.pub for public key
    • dims_${dimsusername}_rsa.sig for signature

    Copy all three files from local machine with DIMS RSA keys:

    [dimsenv] mboggess@dimsdev2:~ () $ cd .ssh
    [dimsenv] mboggess@dimsdev2:~/.ssh () $ scp dims_mboggess_rsa* mboggess@b52.tacoma.uw.edu:/home/mboggess/.ssh/
    dims_mboggess_rsa                                     100% 1675     1.6KB/s   00:00
    dims_mboggess_rsa.pub                                 100%  403     0.4KB/s   00:00
    dims_mboggess_rsa.sig                                 100%   82     0.1KB/s   00:00
    

    Check on remote machine:

    [dimsenv] mboggess@b52:~/.ssh () $ ls
    authorized_keys  dims_mboggess_rsa      dims_mboggess_rsa.sig
    config           dims_mboggess_rsa.pub  known_hosts
    

Note

This solves the “second hop issue”: a user can access machines one hop away because the necessary keys are available on their local machine, but when trying to go one hop further, keys are not available. For example, I can log in to b52 just fine, but when I try to run dims.git.syncrepos, which requires access to git.devops.develop, I ran into trouble because my keys were not on b52.

Sync Repos on Remote Machine

There probably will not be a .mrconfig file on the remote machine, so you must create an empty file with that name before you sync repos or the command will fail.

Failure when running dims.git.syncrepos because no .mrconfig:

<snip>

[+++] Adding Repo[49] umich-botnets to /home/mboggess/dims/.mrconfig and checking it out.
cp: cannot stat ‘/home/mboggess/dims/.mrconfig’: No such file or directory

[+++] Updated 49 of 49 available repos.
[+++] Summary of actions for repos that were updated:
- Any changes to branches at origin have been downloaded to your local repository
- Any branches that have been deleted at origin have also been deleted from your local repository
- Any changes from origin/master have been merged into branch 'master'
- Any changes from origin/develop have been merged into branch 'develop'
- Any resolved merge conflicts have been pushed back to origin
[+++] Added 49 new repos: ansible-inventory ansible-playbooks cif-client cif-java configs dims dims-ad dims-adminguide dims-asbuilt dims-ci-utils dims-dashboard dims-db-recovery dims-devguide dims-dockerfiles dims-dsdd dims-jds dims-keys dims-ocd dims-packer dims-parselogs dims-sample-data dims-sr dims-supervisor dims-svd dimssysconfig dims-test-repo dims-tp dims-tr dims-vagrant ELK fuse4j ipgrep java-native-loader java-stix-v1.1.1 mal4s MozDef ops-trust-openid ops-trust-portal poster-deck-2014-noflow prisem prisem-replacement pygraph rwfind sphinx-autobuild stix-java ticketing-redis tsk4j tupelo umich-botnets
[+++] Updating repos took 00:00:00

Looking in ~/dims/ for .mrconfig:

[dimsenv] mboggess@b52:~ () $ cd dims
[dimsenv] mboggess@b52:~/dims () $ ls -a
.  ..  git
  • Create .mrconfig

    [dimsenv] mboggess@b52:~/dims () $ touch .mrconfig
    [dimsenv] mboggess@b52:~/dims () $ ls -a
    .  ..  git  .mrconfig
    
  • Run dims.git.syncrepos

    [dimsenv] mboggess@b52:~/dims () $ cd ..
    [dimsenv] mboggess@b52:~ () $ dims.git.syncrepos
    [+++] Found 49 available repos at git@git.devops.develop
    [+++] Adding Repo[1] ansible-inventory to /home/mboggess/dims/.mrconfig and checking it out.
    mr checkout: /home/mboggess/dims/git/ansible-inventory
    Cloning into 'ansible-inventory'...
    remote: Counting objects: 481, done.
    remote: Compressing objects: 100% (387/387), done.
    remote: Total 481 (delta 237), reused 122 (delta 65)
    Receiving objects: 100% (481/481), 62.36 KiB | 0 bytes/s, done.
    Resolving deltas: 100% (237/237), done.
    Checking connectivity... done.
    Using default branch names.
    
    Which branch should be used for tracking production releases?
       - master
    Branch name for production releases: [master]
    Branch name for "next release" development: [develop]
    
    How to name your supporting branch prefixes?
    Feature branches? [feature/]
    Release branches? [release/]
    Hotfix branches? [hotfix/]
    Support branches? [support/]
    Version tag prefix? []
    
    mr checkout: finished (1 ok)
    
    <snip>
    
    [+++] Updated 49 of 49 available repos.
    [+++] Summary of actions for repos that were updated:
    - Any changes to branches at origin have been downloaded to your local repository
    - Any branches that have been deleted at origin have also been deleted from your local repository
    - Any changes from origin/master have been merged into branch 'master'
    - Any changes from origin/develop have been merged into branch 'develop'
    - Any resolved merge conflicts have been pushed back to origin
    [+++] Added 49 new repos: ansible-inventory ansible-playbooks cif-client cif-java configs dims dims-ad dims-adminguide dims-asbuilt dims-ci-utils dims-dashboard dims-db-recovery dims-devguide dims-dockerfiles dims-dsdd dims-jds dims-keys dims-ocd dims-packer dims-parselogs dims-sample-data dims-sr dims-supervisor dims-svd dimssysconfig dims-test-repo dims-tp dims-tr dims-vagrant ELK fuse4j ipgrep java-native-loader java-stix-v1.1.1 mal4s MozDef ops-trust-openid ops-trust-portal poster-deck-2014-noflow prisem prisem-replacement pygraph rwfind sphinx-autobuild stix-java ticketing-redis tsk4j tupelo umich-botnets
    [+++] Updating repos took 00:07:19
    

Build Python Virtual Environment on Remote Machine

  • When logged in to remote machine, change directories to location of virtual environment build scripts:

    [dimsenv] mboggess@b52:~ () $ cd $GIT/ansible-playbooks
    
  • Run the DIMS command to build the system virtualenv for access to

    system DIMS commands:

    [dimsenv] mboggess@b52:~/dims/git/ansible-playbooks (develop) $ ./dimsenv.install.system
    
  • Run exec bash to refresh:

    [dimsenv] mboggess@b52:~/dims/git/ansible-playbooks (develop) $ exec bash
    [+++] DIMS shell initialization [ansible-playbooks v1.2.107]
    [+++] Sourcing /opt/dims/etc/bashrc.dims.d/bashrc.dims.network ...
    [+++] OpenVPN status:
     * VPN '01_uwapl_daveb52' is running
     * VPN '02_prsm_dave-prisem-2' is running
    [+++] Sourcing /opt/dims/etc/bashrc.dims.d/bashrc.dims.virtualenv ...
    [+++] Activating virtual environment (/home/mboggess/dims/envs/dimsenv) [ansible-playbooks v1.2.107]
    [+++] (Create file /home/mboggess/.DIMS_NO_DIMSENV_ACTIVATE to disable)
    [+++] Virtual environment 'dimsenv' activated [ansible-playbooks v1.2.107]
    [+++] Installed /home/mboggess/dims/envs/dimsenv/bin/dimsenv.install.user
    [+++] Installed /home/mboggess/dims/envs/dimsenv/bin/dimsenv.install.system
    [+++] Sourcing /opt/dims/etc/bashrc.dims.d/git-prompt.sh ...
    [+++] Sourcing /opt/dims/etc/bashrc.dims.d/hub.bash_completion.sh ...
    

    Line “Activating virtual environment” should have path to dimsenv/ via $HOME/dims.

  • Run DIMS command to build user virtualenv:

    [dimsenv] mboggess@b52:~/dims/git/ansible-playbooks (develop) $ ./dimsenv.install.user
    
  • Run exec bash to refresh again.

  • Check $HOME/dims/envs/ for dimsenv/ and activation scripts:

    [dimsenv] mboggess@b52:~/dims/git/ansible-playbooks (develop) $ ls $HOME/dims/envs
    dimsenv          initialize    postdeactivate  postmkvirtualenv  preactivate    premkproject     prermvirtualenv
    get_env_details  postactivate  postmkproject   postrmvirtualenv  predeactivate  premkvirtualenv
    

Transfer Config Files

  • Your account personalization files need to be transferred to the remote machine as well, including .gitconfig, .vimrc, and .bash_aliases.

    From the local machine:

    [dimsenv] mboggess@dimsdev2:~ () $ scp .bash_aliases mboggess@b52.tacoma.uw.edu:/home/mboggess/
    .bash_aliases                                 100%  510     0.5KB/s   00:00
    [dimsenv] mboggess@dimsdev2:~ () $ scp .gitconfig mboggess@b52.tacoma.uw.edu:/home/mboggess/
    .gitconfig                                    100%  847     0.8KB/s   00:00
    [dimsenv] mboggess@dimsdev2:~ () $ scp .vimrc mboggess@b52.tacoma.uw.edu:/home/mboggess/
    .vimrc                                        100%  314     0.3KB/s   00:00
    

    On the remote machine, check for files and refresh bash:

    [dimsenv] mboggess@b52:~ () $ ls -a
    .   .ansible       .bash_history  .bashrc  dims              .gitconfig  .profile      .ssh      .vimrc
    ..  .bash_aliases  .bash_logout   .cache   examples.desktop  .mrtrust    .python-eggs  .viminfo
    [dimsenv] mboggess@b52:~ () $ exec bash
    

JIRA Onboarding

Adding LDAP Entries for Users

We have an OpenLDAP server which serves as an authorization backend for our LemonLDAP SSO. Authentication is provided by OpenID Connect. It also serves as the user directory for JIRA.

Note

You will need an application to be able to edit/add directory information. Apache Directory Studio is cross platform and recommended. Ideally, the Trident portal would directly feed these records, rather than requiring someone follow the lengthly steps outlined below using a more laborious graphical user interface.

An Ansible role apache-directory-studio is used to install this application. Once this role has been applied, you can start the GUI with the following command:

$ apache-directory-studio &

The first time the program is run, a connection must be configured for the project LDAP server. Follow the instructions in Add New Connection to Apache Directory Studio to create the initial connection.

Attention

When starting Adobe Directory Studio from the command line, you must add the & to run the program in the background. Since this is not a terminal program that takes input at the command line, failing to background the process will result in the shell not returning to a command prompt until after you quit the application, which novice Linux users unfamiliar with command shells and background processes will interpret as the terminal window being “hung” or “frozen”.

After Adobe Directory Studio has been installed and configured, start the application. You should see the initial connection in the list:

_images/apache-directory-studio-connectionlist.png

Initial LDAP Browser Connection list

  1. Click on the connection in the Connections list. (If you followed the instructions in Add New Connection to Apache Directory Studio, the connection you want is labelled ldap.devops.develop.

  2. Click to open DIT in the tree.

    _images/apache-directory-studio-browser.png

    DIT for connection ldap.devops.develop

  3. Click to open dc=prisem,dc=washington,dc=edu in the tree.

  4. Click to open ou=Users in the tree. The current users will display.

  5. Right-click ou=Users to open context menu and click New -> New Entry.

  6. Select Use existing entry as template. Click Browse button to open the ou and select a member.

  7. Click Next.

  8. In the Object Classes dialog, do not add any more object classes. Just click Next.

    _images/apache-directory-studio-objectclasses.png

    Object Classes (skip)

  9. In the Distinguished Name dialog, replace the template user’s name you selected with the new user’s name. The DN preview should then look like cn=new_user_name,ou=Users,dc=prisem,dc=washington,dc=edu.

    _images/apache-directory-studio-dn.png

    Distinguished Name dialog

  10. Click Next.

  11. In the Attribute Description dialog (center panel), replace the template values with the values for your new user. Double click each Valuefield to edit.

    _images/apache-directory-studio-attributes.png

    Attribute Description dialog

    Note

    Tab to the next field or the value you entered might not be saved.

    • sn - Enter the user’s Last name
    • displayName - Enter the user’s First and Last name
    • mail - Enter the user’s Gmail address using for authenticating with OpenID Connect authentication.
    • ssoRoles - These are used for testing right now (you can leave them as is.)
    • uid - enter the uid in the form firstname.lastname
    • userPassword - enter a password. It will be hashed.
  12. Click Finish.

  13. Click on the new member and verify the fields. Edit any that were not entered correctly.

Exit the application when your are done and have the user test the authentication by going to http://jira.prisem.washington.edu/ and select Google in the the OpenID Login dialog:

_images/jira-login.png

JIRA Dashboard Login screen

Note

Google OpenID requires that the domain name of the system requesting authentication have a valid public DNS name. Even though you can connect to the system from within the VPN/VLAN via a non-public DNS name lookup, the authentication will not work. For this reason, the name jira.prisem.washington.edu is mapped in the split-horizon DNS mappings.

If the user has not recently authenticated to Google, they will be prompted for their password and/or second-factor authentication information. Once authenticated, the JIRA Dashboard will pop up.

Adding Users to JIRA Groups

After adding the user to JDAP, JIRA will show them as a valid user, but they will have no access once logged in.

To anable access to JIRA necessary to add and modify tickets, an administrator needs to grant access. Figure adminpanel shows the Administration panel where these changes will be made.

_images/jira-adminpanel-1.png

JIRA Administration Panel

To grant a user “read-only” access, they need to be a member of the jira-users group. To grant “read/write” access, they need to also be a member of the jira-developers group. Only users with jira-adminisatrators action can make these changes.

To change access, select Groups under the Operations column of the user table. The Edit User Groups dialog will pop up as shown in Figure adminpanel. Type into the search box to find options, then select the group from the list to add that group to the user’s permission.

_images/jira-adminpanel-2.png

JIRA Edit User Groups dialog

Installation of DIMS Components on “Bare-metal”

This section describes installation of core Virtual Machine hypervisor servers, developer workstations, or collector devices on physical hardware. Installation of DIMS component systems in Virtual Machines is covered in Section Installation of DIMS Components Using Virtual Machines.

The initial operating system installation is handled using operating system installation media along with Kickstart auto-installation, followed by a second-stage pre-configuration step, and lastly by installation of required packages and configuration using Ansible.

A similar process is used to create Virtual Machines, though using Packer instead of stock OS installation ISO media plus Kickstart. This is covered in the dimspacker:lifecycle section of the dimspacker:dimspacker document.

Control and Target Prerequisites

For the control machine, the following must be true:

  1. Must be able to run DIMS Ansible playbooks (i.e. be an existing developer workstation).
  2. Must have the latest dims-ci-utils installed. That is, the latest dims.remote.setupworkstation script should be in /opt/dims/bin.
  3. Must have the required DIMS VPN enabled (so it can retrieve DIMS Git repos and artifacts on Jenkins requested by playbooks.)

Note

We are assuming the control machine is an existing workstation that has been successfully used to run DIMS playbooks and has at a minimum followed the original instructions for setting environment variables and installing dims-ci-utils.

For the target machine, the following must be true:

  1. The base operating system is installed.
  2. An ansible account must be present, configured for sudo access for performing administrator tasks, with the matching public key allowing SSH access via the private key on the control machine.
  3. Firewall rules must allow SSH access from the control machine.

Setting up a DIMS Developer Laptop

This section describes how to provision a new developer laptop using a custom bootable USB installation drive. Some of the steps are still manual ones, and these instructions will be updated as a more script-driven process is created. For now, this can serve to help guide the creation of the final process.

To acheive a repeatable and consistent process for installing a common base operating system (in this case, Ubuntu 14.04 LTS) that is ready to immediately be provisioned remotely from an Ansible control node, a customizable Ubuntu installation USB drive is used with all of the files necessary to go from a fresh computer system to a fully-functional networked host.

All of the steps for preparing an initial installation USB are given below, in the order they need to be performed. Once completed, you will have a bootable USB drive and a bit-copy of that drive that can be re-used.

Note

If you already have a bit-copy of one of these installation USB drives, skip to the Cloning an installation USB section.

If you already have a fresh (uncustomized) installation USB disk, skip forward to the Customizing an installation USB section.

Note

The DIMS project purchased a number of Dell Precision M4800 laptops for use for development and demonstration purposes. These laptops require the use of proprietary drivers for the Broadcom Wireless NIC and NVIDIA graphics controller. The specific models can be identified using lspci:

$ lspci -knn | grep -i Broadcom
03:00.0 Network controller [0280]: Broadcom Corporation BCM4352 802.11ac Wireless Network Adapter [14e4:43b1] (rev 03)
$ lspci | grep VGA
01:00.0 VGA compatible controller: NVIDIA Corporation GK107GLM [Quadro K1100M] (rev a1)

These drivers can be installed manually using the Ubuntu Additional Drivers app as seen in Figure Additional Drivers from working laptop.

_images/additional-drivers.png

Additional Drivers from working laptop

There is prototype code in the Ubuntu post-install script designed to automate this task based on information from How can I install Broadcom Wireless Adapter BCM4352 802.11ac PCID [14e4:43b1] (rev 03) on fresh install of Ubuntu 14.10 (Utopic Unicorn)?, which is essentially:

$ sudo apt-get update
$ sudo apt-get install bcmwl-kernel-source
$ sudo modprobe wl

Preparation of Ubuntu installation USB drive

This section describes the manual steps used to create a two-partition 8GB Ubuntu installation USB drive. The following section describes the use of the program dims.install.createusb to bit-image copy this drive, store it for shared use by DIMS team members, and use this image copy to clone the original USB drive and then populate it with custom information to be used when auto-installing Ubuntu 14.04 on a development laptop using this customized USB drive.

Note

Start out by studying the --help output of dims.install.createusb to understand the defaults it uses (shown by the highlighted lines in the following code block). These defaults are hard-coded into the program and should be updated when new Ubuntu install ISO images are used. Some of the command examples below make use of these defaults (rather than explicitly including all options on the command line.)

 Usage: dims.install.createusb [options] [args]

 Use "dims.install.createusb --help" to see help on command line options.

 Options:
   -h, --help            show this help message and exit
   -d, --debug           Enable debugging.
   -D DEVICE, --device=DEVICE
                         Device file for mounting USB. [default: sdb]
   -H HOSTNAME, --hostname=HOSTNAME
                         Hostname of system to install. [default dimsdev3]
   -l USBLABEL, --usblabel=USBLABEL
                         USB device label. [default: DIMSINSTALL]
   --ubuntu-base=UBUNTUBASE
                         Ubuntu base version. [default: 14.04]
   --ubuntu-minor=UBUNTUMINOR
                         Ubuntu minor version. [default: 4]
   --base-configs-dir=BASE_CONFIGS_DIR
                         Base directory for configuration files. [default:
                         /opt/dims/nas/scd]
   -u, --usage           Print usage information.
   -v, --verbose         Be verbose (on stdout) about what is happening.

   Development Options:
     Caution: use these options at your own risk.

     --find-device       Attempt to find USB device actively mounted and exit.
     --empty-casper      Empty out all contents (except lost+found) from
                         casper-rw and exit.
     --ls-casper         Just list contents of casper-rw file system.
     --label-casper      Put --usblabel into casper-rw and exit.
     --mount-casper      Mount casper-rw in cwd and exit.
     --umount-casper     Unmount casper-rw and exit.
     --mount-usb         Mount DIMS install USB and exit. [default: sdb]
     --unmount-usb       Unmount DIMS install USB and exit. [default: sdb]
     --read-usb-into     Read USB drive into file. [default: False]
     --write-usb-from    Write USB drive from file. [default: False]
     -f IMAGEFILE, --imagefile=IMAGEFILE
                         File name to use for storing compressed USB image.
                         [default: ubuntu-14.04.4-install.dd.bz2]
     --block-size=BLOCK_SIZE
                         Block size to use for 'dd' read/write. [default: 512]
Partition USB drive

If you are starting out with a blank USB drive, you must first partition the drive and label it so it is recognizable by DIMS scripts. An easy program to use for this purpose on Ubuntu is the Gnome Partition Editor (a.k.a., GParted).

Figure GParted formatting and labeling shows an 8GB USB drive partitioned using GParted. Create two partitions with the primary partition (shown here as /dev/sdb1) marked as bootable, with a FAT32 file system, and labeled DIMSINSTALL. Make the second partition an ext3 file system and label it DIMSBACKUP.

_images/GParted.png

GParted formatting and labeling

The paritions can also be shown using fdisk -l (here assuming the disk is mounted as /dev/sdb).

[dittrich@dimsdev2 git]$ sudo fdisk -l /dev/sdb

Disk /dev/sdb: 8009 MB, 8009023488 bytes
247 heads, 62 sectors/track, 1021 cylinders, total 15642624 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000cc03e

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *        2048     4196351     2097152    b  W95 FAT32
/dev/sdb2         4196352    15640575     5722112   83  Linux

Note

The dims.install.createusb script looks for a partition with the label DIMSINSTALL and will not manipulate drives that do not contain a partition with this label.

Note

The second partition can be used for backing up a user’s directory contents prior to re-installation of the operating system on a system. Since the kickstart process automatically partitions the hard drive, existing contents would be lost.

Create Ubuntu installation USB

Installation of Ubuntu on a developer system is performed using the Server installation image (e.g., ubuntu-14.04.4-server-amd64.iso).

The program to use for this purpose is the Ubuntu Startup Disk Creator. Run it with root privileges (as they are needed to write the Master Boot Record on the USB drive).

$ sudo usb-creator-gtk &

After downloading the Ubuntu Server installation ISO and verifying its integrity using the signed SHA256 hash files, write the installation ISO to the partitioned USB.

The primary partition (i.e., /dev/sdb1) is where the Ubuntu installation ISO image (and casper-rw file system storage file, where DIMS customization files will be stored) will be written. Make sure that the option is checked to store files across boots, which will create a casper-rw partition image within the startup disk image.

Note

The second partition does not show up because it is not marked as bootable, though it may be mounted and visible using the File viewer.

Figure Ubuntu Make Startup Disk shows what the Ubuntu Startup Disk Creator GTK application will look like at this step.

_images/usb-creator-make.png

Ubuntu Make Startup Disk

Note

If you have to re-create the DIMSINSTALL partition with the Startup Disk Creator, it will erase the entire partition (which removes the label). To manually change the label, use GNU’s GParted Partition Editor as described in the Ubuntu RenameUSBDrive page.

First verify the device name (so you don’t accidentally harm another auto-mounted device), then use mlabel as seen here:

 $ mount | grep '^/dev/sd'
 /dev/sda1 on /boot type ext3 (rw)
 /dev/sdb1 on /media/dittrich/917D-FA28 type vfat (rw,nosuid,nodev,uid=1004,gid=1004,shortname=mixed,dmask=0077,utf8=1,showexec,flush,uhelper=udisks2)
 /dev/sdb2 on /media/dittrich/DIMSBACKUP type ext3 (rw,nosuid,nodev,uhelper=udisks2)
 $ sudo mlabel -i /dev/sdb1 ::DIMSINSTALL

Now unmount and re-mount the device, and verify that the label did in fact get changed.

 $ dims.install.createusb --unmount-usb
 $ dims.install.createusb --mount-usb
 $ mount | grep '^/dev/sd'
 /dev/sda1 on /boot type ext3 (rw)
 /dev/sdb1 on /media/dittrich/DIMSINSTALL type vfat (rw,nosuid,nodev,uid=1004,gid=1004,shortname=mixed,dmask=0077,utf8=1,showexec,flush,uhelper=udisks2)
 /dev/sdb2 on /media/dittrich/DIMSBACKUP type ext3 (rw,nosuid,nodev,uhelper=udisks2)
Bit-copy installation USB for cloning

After creating a bootable Ubuntu installation USB (which has not yet been customized for a specific host installation), a copy of the boot disk should be made. This allows for the vanilla installation USB to be cloned to as many USB drives as are needed, each then being uniquely customized. This customization includes host name, SSH keys, SSH authorized_keys and known_hosts files, OpenVPN certificates, and any other files used in the installation and setup process necessary to result in a remotely Ansible configurable host.

$ dims.install.createusb --verbose --read-usb-into
[+++] dims.install.createusb
[+++] Reading USB drive on sdb into ubuntu-14.04.4-install.dd.bz2
15642624+0 records in
15642624+0 records out
8009023488 bytes (8.0 GB) copied, 1171.45 s, 6.8 MB/s
2498225+1 records in
2498225+1 records out
1279091271 bytes (1.3 GB) copied, 1171.51 s, 1.1 MB/s
[+++] Finished writing ubuntu-14.04.4-install.dd.bz2 in 0:19:31.506338 seconds
$ ls -l *.bz2
-rw-r--r-- 1 dittrich dittrich  837948365 Jan 18 18:57 ubuntu-14.04.2-install.dd.bz2
-rw-rw-r-- 1 dittrich dittrich 1279091271 Mar 25 21:49 ubuntu-14.04.4-install.dd.bz2

Cloning an installation USB

The previous section walked through the process of creating a skeleton Ubuntu auto-installation USB drive and bit-copying it to a compressed image file. This section describes how to take that compressed bit-copy and clone it to USB drives that are then customized for installing Ubuntu on specific bare-metal hosts for subsequent Ansible configuration.

We will assume that the previous steps were followed, producing a clone of the Ubuntu 14.04.4 install ISO in a file named ubuntu-14.04.4-install.dd.bz2, and that the USB drive we will be cloning to is available as /dev/sdb.

Caution

Be sure that you confirm this is correct, since this script does direct writes using dd, which can destroy the file system if applied to the wrong drive! There was not enough time to make this script more robust against use by someone who is unfamilar with bit copy operations in Unix/Linux.

$ dims.install.createusb --write-usb-from --verbose
[+++] dims.install.createusb
[+++] Partition /dev/sdb12 is not mounted
[+++] Partition /dev/sdb11 is not mounted
[+++] Writing ubuntu-14.04.4-install.dd.bz2 to USB drive on sdb
dd: error writing ‘/dev/sdb’: No space left on device
15632385+0 records in
15632384+0 records out
8003780608 bytes (8.0 GB) copied, 2511.1 s, 3.2 MB/s

bzip2: I/O or other error, bailing out.  Possible reason follows.
bzip2: Broken pipe
        Input file = ubuntu-14.04.4-install.dd.bz2, output file = (stdout)
[+++] Wrote sdb to USB drive on ubuntu-14.04.4-install.dd.bz2 in 0:41:51.110440 seconds

Note

The dd error “No space left on device” and the bzip2 error “Broken pipe” are normal. This happens because the exact number of blocks read from the disk in the copy operation precisely matches the number of blocks coming from the compressed file, which triggers a “disk full” condition. A direct read/write operation on the device, rather than shelling out to dd, would be more robust (but would also consume more time in coding that was not available.)

Customizing an installation USB

The installation ISO is customized with SSH keys, OpenVPN certificates, etc., by inserting files from a common file share into the installation USB.

Danger

These files that are inserted into the USB are not encrypted, and neither are the installation USB’s file systems. This requires physical control of the USB disk. These files should either be encrypted with something like Ansible Vault, or the file system encrypted such that it is decrypted as part of the Ubuntu install process.

In order to make the necessary files available to any of the DIMS developers, an NFS file share is used. Alternatives remote file sharing protocols include SSHFS and SMB.

An environment variable CFG points to the path to the files used to customize the installation ISO. At present, these are in directories with the short name of the host to be installed (e.g., dimsdev3).

[dimsenv] dittrich@dimsdev3:/opt/dims/nas () $ echo $CFG
/opt/dims/nas/scd
[dimsenv] dittrich@dimsdev3:/opt/dims/nas () $ tree $CFG/dimsdev3
/opt/dims/nas/scd/dimsdev3
├── IP
├── openvpn-cert
│   ├── 01_uwapl_dimsdev3.conf
│   └── 02_prsm_dimsdev3.conf
├── PRIVKEY
├── REMOTEUSER
├── ssh-host-keys
│   ├── key_fingerprints.txt
│   ├── known_hosts.add
│   ├── ssh_host_dsa_key
│   ├── ssh_host_dsa_key.pub
│   ├── ssh_host_ecdsa_key
│   ├── ssh_host_ecdsa_key.pub
│   ├── ssh_host_ed25519_key
│   ├── ssh_host_ed25519_key.pub
│   ├── ssh_host_rsa_key
│   └── ssh_host_rsa_key.pub
└── ssh-user-keys
    ├── ubuntu_install_rsa
    └── ubuntu_install_rsa.pub

3 directories, 17 files

Note

The OpenVPN certificates are created by hand. Two separate VPNs were originally used as hardware was split between two separate server rooms on two separate subnets, each with non-routable (RFC 1918) VLANs behind the VPNs. Hardware was moved into one data center and this will be reduced to one VPN as soon as VM consolidation and cabling changes can be made to use a single VLAN.

Note

The IP, PRIVKEY, and REMOTEUSER files hold the values used by some DIMS scripts for setting variables used for remotely provisioning the host using Ansible. We are migrating to using group_vars and/or host_vars files for holding these values so they can be shared by other scripts and used in Jinja templates.

New SSH host key sets can be generated using keys.host.create.

[dimsenv] dittrich@dimsdemo1:/opt/dims/nas () $ keys.host.create -d $CFG/dimsdev3/ssh-host-keys/ -v -p dimsdev3
[+++] Storing files in /opt/dims/nas/scd/dimsdev3/ssh-host-keys/
[+++] Removing any previous keys and related files
[+++] Generating 1024 bit dimsdev3 ssh DSA key
[+++] Generating 2048 bit dimsdev3 ssh RSA key
[+++] Generating 521 bit dimsdev3 ssh ECDSA key
[+++] Generating 1024 bit dimsdev3 ssh ED25519 key
[+++] Key fingerprints
1024 70:0e:ee:8b:23:34:cf:34:aa:3b:a0:ca:fd:50:58:a9  'dimsdev3 ssh DSA host key' (DSA)
2048 7f:89:da:e7:4d:92:fd:c1:3f:96:4f:05:f5:72:63:65  'dimsdev3 ssh RSA host key' (RSA)
521 0a:af:c7:c4:a8:35:47:48:22:b3:7e:5b:bf:39:76:69  'dimsdev3 ssh ECDSA host key' (ECDSA)
256 b2:dd:be:36:4d:03:a4:57:17:fb:a9:a9:97:e5:58:51  'dimsdev3 ssh ED25519 host key' (ED25519)
[dimsenv] dittrich@dimsdemo1:/opt/dims/nas () $ ls -l $CFG/dimsdev3/ssh-host-keys
total 18
-rw-rw-r-- 1 nobody nogroup  362 Apr  4 11:24 key_fingerprints.txt
-rw-rw-r-- 1 nobody nogroup 1304 Apr  4 11:24 known_hosts.add
-rw------- 1 nobody nogroup  668 Apr  4 11:24 ssh_host_dsa_key
-rw-r--r-- 1 nobody nogroup  617 Apr  4 11:24 ssh_host_dsa_key.pub
-rw------- 1 nobody nogroup  361 Apr  4 11:24 ssh_host_ecdsa_key
-rw-r--r-- 1 nobody nogroup  283 Apr  4 11:24 ssh_host_ecdsa_key.pub
-rw------- 1 nobody nogroup  432 Apr  4 11:24 ssh_host_ed25519_key
-rw-r--r-- 1 nobody nogroup  113 Apr  4 11:24 ssh_host_ed25519_key.pub
-rw------- 1 nobody nogroup 1679 Apr  4 11:24 ssh_host_rsa_key
-rw-r--r-- 1 nobody nogroup  409 Apr  4 11:24 ssh_host_rsa_key.pub

Note

The equivalent script to generate SSH user keys has not yet been written, but an early helper Makefile is available to perform these steps in a consistent manner. The highest level of security is acheived by having unique SSH keys for each account, however this would significantly complicate use of Ansible, which is designed to control a large number of hosts in a single run. Each DIMS instance being controlled by Ansible will thus have a shared key for the Ansible account that, at most, is unique to a deployment and/or category.

[dimsenv] dittrich@dimsdemo1:~/dims/git/dims-keys/ssh-pub (develop*) $ DIMSUSER=ansible make genkey
ssh-keygen -t rsa \
                -C "DIMS key for ansible" \
                -f dims_ansible_rsa
Generating public/private rsa key pair.
dims_ansible_rsa already exists.
Overwrite (y/n)? y
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in dims_ansible_rsa.
Your public key has been saved in dims_ansible_rsa.pub.
The key fingerprint is:
06:52:35:82:93:73:8b:e8:0f:7a:15:f4:44:29:a2:b8 DIMS key for ansible
The key's randomart image is:
+--[ RSA 2048]----+
|     ++oo        |
|  . B.+. .       |
| . -.O.          |
| o. o.o.         |
| o   .  S        |
| Eo .  .         |
| . +             |
| . . .           |
|  .              |
+-----------------+
ssh-keygen -l \
        -f dims_ansible_rsa.pub > dims_ansible_rsa.sig
[dimsenv] dittrich@dimsdemo1:~/dims/git/dims-keys/ssh-pub (develop*) $ ls -lat | head
total 128
-rw-rw-r--  1 dittrich dittrich   81 Nov 15 14:58 dims_ansible_rsa.sig
-rw-------  1 dittrich dittrich 1675 Nov 15 14:58 dims_ansible_rsa
-rw-rw-r--  1 dittrich dittrich  402 Nov 15 14:58 dims_ansible_rsa.pub
  . . .
[dimsenv] dittrich@dimsdemo1:~/dims/git/dims-keys/ssh-pub (develop*) $ mv dims_ansible_rsa* $CFG/zion/ssh-user-keys/

After all keys, certificates, etc., are installed in the new host’s directory in $CFG, you can write the contents to the installation USB disk partition.

[dimsenv] dittrich@dimsdemo1:/git/dims-ci-utils/usb-install (develop*) $ dims.install.createusb --help
Usage: ./dims.install.createusb [options] [args]

Use "./dims.install.createusb --help" to see help on command line options.


Options:
  -h, --help            show this help message and exit
  -d, --debug           Enable debugging.
  -D DEVICE, --device=DEVICE
                        Device file for mounting USB. [default: sdb]
  -H HOSTNAME, --hostname=HOSTNAME
                        Hostname of system to install. [default dimsdemo1]
  -l USBLABEL, --usblabel=USBLABEL
                        USB device label. [default: DIMSINSTALL]
  --distro-version=DISTROVERSION
                        Distribution version. [default: 14.04.5]
  --base-configs-dir=BASE_CONFIGS_DIR
                        Base directory for configuration files. [default:
                        /opt/dims/nas/scd]
  -u, --usage           Print usage information.
  -v, --verbose         Be verbose (on stdout) about what is happening.
  -V, --version         Print version and exit.

  Development Options:
    Caution: use these options at your own risk.

    --find-device       Attempt to find USB device actively mounted and exit.
    --empty-casper      Empty out all contents (except lost+found) from
                        casper-rw and exit.
    --ls-casper         Just list contents of casper-rw file system.
    --label-casper      Put --usblabel into casper-rw and exit.
    --mount-casper      Mount casper-rw in cwd and exit.
    --unmount-casper    Unmount casper-rw and exit.
    --mount-usb         Mount DIMS install USB (sdb) and exit. [default:
                        False]
    --unmount-usb       Unmount DIMS install USB (sdb) and exit. [default:
                        False]
    --read-usb-into     Read USB drive into file. [default: False]
    --write-usb-from    Write USB drive from file. [default: False]
    -f IMAGEFILE, --imagefile=IMAGEFILE
                        File name to use for storing compressed USB image.
                        [default: ubuntu-14.04.5-install.dd.bz2]
    --block-size=BLOCK_SIZE
                        Block size to use for 'dd' read/write. [default: 512]
[dimsenv] dittrich@dimsdemo1:/git/dims-ci-utils/usb-install (develop*) $ dims.install.createusb --hostname zion

After installing the operating system using the Kickstart customized USB drive, the system should be able to access the network. Test using ping 8.8.8.8 to verify network connectivity and a default route.

Install an initial clouds.yml file to configure dimscli:

[dimsenv] ansible@zion:~ () $ cat ~/.config/openstack/clouds.yml
clouds:
  ectf:
    profile: ectf
    prefer_ipv6: False
    force_ipv4: True
    consul_peers: ['node01.ops.ectf','node02.ops.ectf','node03.ops.ectf']
    region_name: ectf
    debug: True

Installation of DIMS Components Using Virtual Machines

This section describes installation of servers, developer workstations, or collector devices using virtual machines. Installation of DIMS component systems on “bare-metal” is covered in Section Installation of DIMS Components on “Bare-metal”.

DIMS on Virtual Machines

A local deployment of the DIMS system installed on virtual machines includes the following systems:

  • red.devops.local (Ubuntu Trusty)
  • yellow.devops.local (Debian Jessie)
  • blue16.devops.local (Ubuntu Xenial)
  • core-01.devops.local (CoreOS 1164.1.0)
  • core-02.devops.local (CoreOS 1164.1.0)
  • core-03.devops.local (CoreOS 1164.1.0)

This list will be updated as the group changes.

The following services and configurations are currently installed on some or all of the machines:

  • Basic DIMS configurations (environment variables, directories, etc)
  • Basic DIMS utilities
  • A DIMS-specific python virtual environment
  • DNS
  • Postfix
  • Docker
  • Consul
  • Swarm
  • Postgres
  • Nginx
  • Trident
  • Vagrant
  • Pycharm
  • Byobu

This list will be updated as more services and configruations are added.

Prerequisites for Instantiating Virtual Machines

You must have a centralized place to organize all the VMs. Scripts used in the build process depend on this place being rooted at /vm. To most easily structure this, and run into the least trouble with the build scripts, run the Vagrant role against the machine you will be instantiating the VMs on.

Once you’ve done that, you should end up with a structure that looks like the following:

[dimsenv] mboggess@dimsdev2:ims/nas/private/files/vagrants () $ tree -L 2 /vm
/vm
├── box
│   ├── coreos
│   └── red
├── cache
│   ├── apt
│   ├── coreos_production_vagrant.box
│   ├── debian-7.11.0-amd64-netinst.iso
│   ├── debian-8.5.0-amd64-netinst.iso
│   ├── sources
│   ├── ubuntu-14.04.4-desktop-amd64.iso
│   ├── ubuntu-14.04.4-server-amd64.iso
│   └── ubuntu-16.04.1-server-amd64.iso
├── ovf
│   └── red
├── run
│   ├── core-01
│   ├── core-02
│   ├── core-03
│   └── red
├── sources
└── vbox

As artifacts are made for the VMs (.box files, .ovf files, etc) they get placed into the appropriate folder. Some other files though you need to make sure you have before starting the build workflow. This includes any iso files for building the beefier Debian OSes or the CoreOS box files. We have gathered the isos on the $NAS, so you need access to it in order to retrieve these files.

  • Ubuntu 14.04.4 server iso download: $NAS/share/isos/ubuntu-14.04.4-server-amd64.iso
  • Ubuntu 14.04.4 desktop iso download: $NAS/share/isos/ubuntu-14.04.4-desktop-amd64.iso
  • Ubuntu 16.04.1 server iso download: $NAS/share/isos/ubuntu-16.04.1-server-amd64.iso
  • Debian Jessie 8.6.0 iso download: $NAS/share/isos/debian-8.5.0-amd64-netinst.iso
  • CoreOS 1164.1.0 box file download: $NAS/share/boxes/coreos_production_vagrant.box

You can download most of these files from the web, but we did make some changes to the Ubuntu 16.04.1 server iso itself, so you really need the iso from the NAS.

Then you need to set up your /vm/cache/sources directory. Since this is for a local deployment, the /vm/cache/sources directory acts as the central artifacts server location.

These are the files you need:

[dimsenv] mboggess@dimsdev2:/vm/cache/sources () $ tree
.
├── dims-ci-utils-develop.tgz
├── prisem-rpc-0.5.10.tar.gz
├── Python-2.7.12.tgz
├── python-dimscli-0.8.0.tar.gz
├── trident-cli_1.3.8_amd64.deb
└── trident-server_1.3.8_amd64.deb

0 directories, 11 files

To get these files you must download them from the artifacts server at jenkins.devops.develop in the /data/src directory. You can run wget or curl or scp to retrieve those files. Ensure they are stored at /vm/cache/sources.

Finally, you need access to the $NAS so you have access to the SSH keys used to access the VMs. Just make sure the $NAS is up before starting the process (run dims.nas.mount).

VM Build Workflow

Once all of the prerequisite structure and artifacts are in place, you can begin to build the VMs. You need to have access to the dims-packer and ansible-playbooks repos.

Note

Soon there should be a way to build these things using the develop branch on both of those repos. Currently, however, the major updates to the build workflow have been made on the dims-packer branch called feature/dims-760. Once that branch is merged, only specific feature updates will be on any branch; stable code for building the VMs will be available on the develop branch.

These instructions do not indicate branches as work should be done from the develop branch and will be able to be done from the develop branch soon.

Follow these steps to build the 3 CoreOS VMs and the 3 Debian VMs.

  1. If you have the byobu program, get a new window (F2) and change directories to $GIT/dims-packer.

  2. Make sure you have an updated repo (git hf update && git hf pull).

  3. Build the artifacts for the VMs by running

    for node in core-01 core-02 core-03 red yellow blue16;
          do test.vagrant.factory build $node.devops.local;
        done
    

    This will build the CoreOS nodes first, which is nice because they build really fast, so you can move on to getting those machines booted and provisioned, while you’re waiting for the beefier VM artifacts to build.

  4. Once you’ve made it through the CoreOS VM builds, but are still waiting on red, yellow, and blue16, you can start to provision the CoreOS nodes. Get a new byobu window and split it into thirds, vertically (Ctrl-Shift-F2)

  5. In each of the splits, you’ll change directories to one of the CoreOS VM’s run directories. So cd /vm/run/core-01 in the left split, cd /vm/run/core-02 in the middle split, cd /vm/run/core-03 in the right split. You should have something that looks like this:

    _images/coreossplits.png

    Byobu window with 3 splits for working in CoreOS VM run directories

  6. Now, you can use the byobu’s “spray” functionality to send the same commands to all three splits. First, hit Alt-F9 to turn the spray functionality on. Then, we want to “boot” the machines and provision them, so we will run make up && make provision. This wil run vagrant up, trigger some post-up configurations, and then use Ansible to provision the machines.

    At the end, once everything has provisioned, you should get output from tests that are run. The more successes, the better. The current test output looks like the following:

    _images/coreosprovisionedtests.png

    CoreOS VMs provisioned and test output

  7. When the red, yellow, and blue16 artifacts have all been built, you can do the same thing to boot and provision those machines. Get a new byobu window, make three vertical splits, and change directories to the appropriate run directories (/vm/run/red, /vm/run/yellow, /vm/run/blue16). You should have something that looks like the following

    _images/noncoreossplits.png

    Byobu window with 3 splits for working in non-CoreOS VM run directories

    Turn on the byobu spray functionality and run make up && make provision.

    Again, at the end, you should get output from the tests that are run. The very end of the current test output look like the following:

    _images/noncoreosprovisionedtests.png

    Non-CoreOS VMs provisioned and test output

Run Directory Helper Makefile Targets

Beyond the steps outlined in the section above, there are many other make helpers in the VM run directory.

[dimsenv] mboggess@dimsdev2:/vm/run/red () $ make help
/vm/run/red
[Using Makefile.dims.global v1.7.1 rev ]
---------------------------------------------------------------------------
Usage: make [something]

Where "something" is one of the targets listed in the sections below.


----- Targets from Makefile -----

show - show all variables used with this Makefile
NOTE: all of the following are done with timing and with
      output saved to a file named 'make-DATESTRING.txt'

up - Do 'vagrant up --no-provision'
reboot - Do 'vagrant halt && vagrant up --no-provision'
halt - halt vagrant cluster
update-box - update the CoreOS Vagrant box file
provision - Time and record 'vagrant provision'
reprovision-remote - Update ansible-playbooks from remote (w/current checked out branch)
reprovision-local - Reprovision host via locally rsync-ed ansible-playbooks
sync-playbooks - Update ansible-playbooks by rsync from current checked out working directory
rebuild - use test.vagrant.factory from packer repo to do 'destroy' and 'build' in one step
destroy - Do 'vagrant destroy'
clean - Remove unecessary files
spotless - Remove all temporary files for this VM.
listvms - lists all configured virtual machines (using 'vboxmanage')
list - list all running VMs
vminfo - See some info about VMs
test - Run 'test.sh' with bash -x and redirect output to 'test.out'
       This is a helper that can be run from the /vagrant
       directory in the VM. Have it write output to a file
       that you follow with "tail -F" and you can observe
       results from the host
run-tests: Run test.runner for system level tests
                         This will be like at the end of running
                         the Ansible provisioner, but at will.
 @echo
----- Targets from /opt/dims/etc/Makefile.dims.global -----

help - Show this help information (usually the default rule)

dimsdefaults - show default variables included from Makefile.dims.global
print-SOMETHING - prints the value of variable "SOMETHING"
version - show the Git revision for this repo
envcheck - perform checks of requirements for DIMS development

---------------------------------------------------------------------------

Installation of a Complete DIMS Instance

The Distributed Incident Management System (DIMS) is a system comprised of many sub-systems. That is to say, there are many inter-related and inter-dependent services that work together to provide a coherent whole which is called a DIMS instance. These subsytems may be provided by daemons running in a normal Linux system running on bare-metal (i.e., an operating system installed onto a standard computer hardware server), in a virtual machine running on a bare-metal host, or in Docker containers. Conceptually, it does not matter what underlying operating system is used, whether it is physical or virtual, or whether it is a Docker container: DIMS is comprised of micro-services that communicate using standard TCP/IP connections, regardless of where those services are running.

This chapter covers the steps necessary to install and configure a DIMS instance using (a) a single server running a cluster comprised of three virtual machines, and (b) a three-node bare-metal cluster.

Cluster Foundation Setup

To bootstrap a DIMS instance, it is necessary to first install the required base operating system, pre-requisite packages, and software components that serve as the foundation for running the DIMS micro-services. This includes the DIMS software and configuration files that differentiate one DIMS instance from another on the network.

Each DIMS instance has a routable Internet connection from at least one node and an internal local area network on which the DIMS system components are connected on the back end. This means there is at least one IP address block that is shared on the back, regardless of whether the primary node has its own DNS domain and Internet accessible IP address (as would be the case for a production service deployment) or uses dynamic addressing on WiFi or wired interface for a local development deployment.

A DIMS deployment that is to be used for public facing services on the Internet requires a real DNS domain and routable IP address(es), with SSL certificates to secure the web application front end. To remotely administer the system requires setting up SSH keys for secure remote access and/or remote administration using Ansible.

Accounts in the Trident user portal can be set up from the command line using the tcli user interface, or by using the Trident web application front end.

Single-host Virtual Machine Deployment

Bootstrapping User Base

Trident

This chapter introduces Trident, a “Trusted Information Exchange Toolkit” that facilitates the formation of trust groups, communication between members of trust groups, among other things. This chapter will walk through the installation and configuration of Trident and its prerequisites. How to use Trident and its various features will be covered in a different section.

Installing Trident manually

This section walks through the steps to use the tcli command line interface to manually configure a Trident deployment with an initial trust group, trust group administrator accounts and default mailing lists. These would be the steps necessary to bootstrap a Trident system for use by a trusted information sharing organization before starting to add regular trust group members and moving into the standard vetting process for growing the trust group.

Before logging in, you can get help on the top level command options using tcli help:

$ tcli help
-=- Trident Help -=-

Welcome to the Trident menu system which is CLI command based.
If a given command is not in help menu the selected user does not have permissions for it.

Each section, items marked [SUB], has its own 'help' command.

The following commands are available on the root level:
 user                 [SUB]                User commands
 system               [SUB]                System commands

Logging in is done using the system subcommand block. To get help on that subcommand block, add the subsection to the command:

$ tcli system help
Help for system
 login                <username> <password> <twofactor> Login
 logout                                    Logout
 whoami                                    Who Am I?
 get                  [SUB]                Get values from the system

The standard Trident administrator account is trident. Log in to it with the secret password configured at the time the Trident packages were installed and the initial tsetup command was used to bootstrap the database.

$ tcli system login trident THE_ACTUAL_SECRET_PASSWORD
Login successful

Now that you are logged in, further subcommand blocks become available. Use help (or just add the subcommand without any options, in some cases) to see what new options are available:

$ tcli system help
Help for system
 login                <username> <password> <twofactor> Login
 logout                                    Logout
 whoami                                    Who Am I?
 swapadmin                                 Swap from regular to sysadmin user
 get                  [SUB]                Get values from the system

To perform system administration actions, you must use swapadmin to change the logged in user to be an administrator:

$ tcli system swapadmin
Now a SysAdmin user

Again, this opens up further options and/or subcommands. Look to see what those are:

$ tcli system help
Help for system
 report                                    Report system statistics
 login                <username> <password> <twofactor> Login
 logout                                    Logout
 whoami                                    Who Am I?
 swapadmin                                 Swap from regular to sysadmin user
 set                  [SUB]                Configure the system
 get                  [SUB]                Get values from the system

To get the current setting of system attributes, use tcli system get followed by the attribute you want to get. Again, you can either add help to see the list, or just use the command tcli system get to see the attributes:

$ tcli system get help
Help for system get
 name                                      System Name - Name of the System
 welcome_text                              Welcome Text - Welcome message shown on login page
 adminname                                 Name of the Admistrator(s) - Name of the Administrator, shown at bottom of the page
 adminemail                                Administrator email address - Email address of the Administrator, linked at the bottom of the page
 copyyears                                 Copyright Years - Years that copyright ownership is claimed
 email_domain                              Email Domain - The domain where emails are sourced from
 url_public                                Public URL - The full URL where Trident is exposed to the public, used for redirects and OAuth2 (Example: https://trident.example.net)
 people_domain                             People Domain - Domain used for people's email addresses and identifiers (Example: people.trident.example.net)
 cli_enabled                               CLI Enabled - Enable the Web CLI (/cli/)
 api_enabled                               API Enabled - Enable the API URL (/api/) thus allowing external tools to access the details provided they have authenticated
 oauth_enabled                             OAuth/OpenID Enabled - Enable OAuth 2.0 and OpenID Connect support (/oauth2/ + /.wellknown/webfinger)
 no_index                                  No Web Indexing - Disallow Web crawlers/robots from indexing and following links
 email_sig                                 Email Signature - Signature appended to mailinglist messages
 require2fa                                Require 2FA - Require Two Factor Authentication (2FA) for every Login
 pw_enforce                                Enforce Rules - When enabled the rules below are enforced on new passwords
 pw_length                                 Minimal Password Length (suggested: 12)
 pw_letters                                Minimum amount of Letters
 pw_uppers                                 Minimum amount of Uppercase characters
 pw_lowers                                 Minimum amount of Lowercase characters
 pw_numbers                                Minimum amount of Numbers
 pw_specials                               Minimum amount of Special characters
 sysadmin_restrict                         IP Restrict SysAdmin - When provided the given CIDR prefixes, space separated, are the only ones that allow the SysAdmin bit to be enabled. The SysAdmin b
it is dropped for SysAdmins coming from different prefixes. Note that 127.0.0.1 and ::1 are always included in the set, thus CLI access remains working.
 header_image                              Header Image - Image shown on the Welcome page
 logo_image                                Logo Image - Logo shown in the menu bar
 unknown_image                             Unknown Person Image - Logo shown for users who do not have an image set
 showversion                               Show Trident Version in UI - Show the Trident version in the UI, default enabled so that users can report issues to the Trident Project
 adminemailpublic                          Show Sysadmin E-mail to non-members - Show sysadmin e-mail in the public footer

$ tcli system get
Help for system get
 name                                      System Name - Name of the System
 welcome_text                              Welcome Text - Welcome message shown on login page
 adminname                                 Name of the Admistrator(s) - Name of the Administrator, shown at bottom of the page
 adminemail                                Administrator email address - Email address of the Administrator, linked at the bottom of the page
 . . .
 showversion                               Show Trident Version in UI - Show the Trident version in the UI, default enabled so that users can report issues to the Trident Project
 adminemailpublic                          Show Sysadmin E-mail to non-members - Show sysadmin e-mail in the public footer

On first installation, the database exists for Trident configuration, but many attributes are not yet configured. For example, if you try to see the administrator’s name and email address (which are shown in the main page of the web UI), do:

$ tcli system get adminname
unknown
$ tcli system get adminemail
unknown

There is a setting for the email domain, but it is just an example that will not actually work:

$ tcli system get email_domain
trident.example.net

You will need to set it to something that matches the SMTP Mail Transfer Agent (MTA), which is Postfix in this case:

$ tcli system set email_domain prisem.washington.edu
Updated email_domain

If you will be giving members a unique email address that is related to the trust group, rather than their personal or work email address, set the people_domain (which also initially comes with an example default):

$ tcli system get people_domain
people.trident.example.net
$ tcli system set people_domain people.prisem.washington.edu
Updated people_domain

As with the email addresses, the public URL is configured with a non-working example:

$ tcli system get url_public
https://trident.example.net

Set it to match the routable public URL that people will use to get to the Trident portal from the Internet:

$ tcli system set url_public https://zion.prisem.washington.edu
Updated url_public

You may toggle whether the web UI shows the address of the administrator to anyone who is not logged in (i.e., the general public) or does not. The default setting is yes:

$ tcli system get adminemailpublic
yes

There is no initial welcome text shown on the web UI. Set it as appropriate:

$ tcli system get welcome_text
Not Configured
$ tcli system set welcome_text "DIMS"
Updated welcome_text

Set the descriptive name of the administrator and the email address used to communicate with them:

$ tcli system set adminname "DIMS Administrator"
Updated adminname
$ tcli system set adminemail trident@prisem.washington.edu
Updated adminemail

You must set the name of the deployed portal that will be presented in the web UI:

$ tcli system get name
Not Configured
$ tcli system set name "DIMS Trident"
Updated name

A trailer is placed on all outgoing email messages. This allows including reminders about information sharing policies or other disclaimers. By default, it reads as follows:

$ tcli system get email_sig
All message content remains the property of the author
and must not be forwarded or redistributed without explicit permission.

The main web page includes a “header” graphic image that spans the browser window, allowing you to brand the portal. The file must be loaded under the web_root directory for the Trident web app to access it. By default, it is located in a subdirectory named gfx/ with the name gm.jpg:

$ tcli system get header_image
/gfx/gm.jpg
$ sudo find / -type d -name gfx
/usr/share/trident/webroot/gfx

There is also a logo that is displayed by the web app:

$ tcli system get logo_image
/gfx/logo.png

You can either replace these files with content of your chosing, or you can add new files with different names and change the configuration settings. The directory with these files may contain other files, so check first:

$ ls /usr/share/trident/webroot/gfx
gm.jpg  info.png  invalid.png  logo.png  logo.svg  red_asterisk.png  search.png  unknown_person.jpg  valid.png  warning.png  xkcd_password_strength.png

If you wish to use your organization’s logo, you must first copy the file onto the system.

$ wget https://www.example.com/images/logo_24.png
--2017-01-13 12:41:27--  https://www.example.com/images/logo_24.png
Resolving www.example.com (www.example.com)... 93.184.216.34
Connecting to www.example.com (www.example.com)|93.184.216.34|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6220 (6.1K) [image/png]
Saving to: ‘logo_24.png’

logo_24.png                                       100%[============================================================================================================>]   6.07K  --.-KB/s   in 0s

2017-01-13 12:41:28 (125 MB/s) - ‘logo_24.png’ saved [6220/6220]

For this example, we will over-write the original logo with this new file:

$ sudo mv logo_24.png /usr/share/trident/webroot/gfx/logo.png

For the next example, we will add a new file for the header image, and change the variable to point to it:

$ sudo mv vagrant/our_org_header.png /usr/share/trident/webroot/gfx/
$ tcli system set header_image our_org_header.png
Updated header_image
$ ls -l /usr/share/trident/webroot/gfx/
total 580
-rwxr-xr-x 1 root    root     83078 Sep 12 07:37 gm.jpg
-rwxr-xr-x 1 root    root       580 Sep 12 07:37 info.png
-rwxr-xr-x 1 root    root       424 Sep 12 07:37 invalid.png
-rw-r--r-- 1 ansible ansible   6220 Dec  9  2015 logo.png
-rwxr-xr-x 1 root    root      2541 Sep 12 07:37 logo.svg
-rwxr-xr-x 1 root    root       223 Sep 12 07:37 red_asterisk.png
-rwxr-xr-x 1 root    root      3287 Sep 12 07:37 search.png
-rwxr-xr-x 1 root    root      2994 Sep 12 07:37 unknown_person.jpg
-rw-r--r-- 1 root    root     59250 Jan 13 12:53 usss-1.jpg
-rw-rw-r-- 1 ansible dims    309901 Jan 13 12:50 our_org_header.png
-rwxr-xr-x 1 root    root       389 Sep 12 07:37 valid.png
-rwxr-xr-x 1 root    root       616 Sep 12 07:37 warning.png
-rwxr-xr-x 1 root    root     93029 Sep 12 07:37 xkcd_password_strength.png
$ sudo chown root:root /usr/share/trident/webroot/gfx/*
$ sudo chmod 755 /usr/share/trident/webroot/gfx/*
$ ls -l /usr/share/trident/webroot/gfx/
total 580
-rwxr-xr-x 1 root root  83078 Sep 12 07:37 gm.jpg
-rwxr-xr-x 1 root root    580 Sep 12 07:37 info.png
-rwxr-xr-x 1 root root    424 Sep 12 07:37 invalid.png
-rwxr-xr-x 1 root root   6220 Dec  9  2015 logo.png
-rwxr-xr-x 1 root root   2541 Sep 12 07:37 logo.svg
-rwxr-xr-x 1 root root    223 Sep 12 07:37 red_asterisk.png
-rwxr-xr-x 1 root root   3287 Sep 12 07:37 search.png
-rwxr-xr-x 1 root root   2994 Sep 12 07:37 unknown_person.jpg
-rwxr-xr-x 1 root root 309901 Jan 13 12:50 our_org_header.png
-rwxr-xr-x 1 root root    389 Sep 12 07:37 valid.png
-rwxr-xr-x 1 root root    616 Sep 12 07:37 warning.png
-rwxr-xr-x 1 root root  93029 Sep 12 07:37 xkcd_password_strength.png
$ tcli system get header_image
/gfx/gm.jpg
$ tcli system set header_image /gfx/gm.jpg
Updated header_image

Installing Trident with Ansible

Prerequisites

The following items are necessary before installing Trident via Ansible:

  • Access to and knowledge of how to use Ansible roles foundational to provisioning DIMS systems. More information about these roles can be found at tbd:tbd.
  • Host(s) provisioned by Ansible roles foundational to DIMS systems. If using multiple hosts for a Trident instance, they must all be provisioned with these roles.
  • Access to and knowledge of how to use Ansible roles specific to standing up a working Trident instance. More information about these roles can be found below, and information about how to provision a host with them can be found at tbd:tbd.
  • Latest Trident package OR
  • Access to the github.com Trident repo

Trident Artifact Build Process

Note

You must have access to the Trident github repo in order to build Debian packages. You must be able to clone the repo.

The following section outlines the steps needed to obtain/update the Trident source code and build a Debian package from it so that artifact is available for use by the Ansible role.

  1. Prerequisite environment, per Trident documentation on their DEV-1.3 branch:

    • Debian Jessie
  2. Prerequisite packages, per Trident documentation on their DEV-1.3 branch:

    • build-essential
    • git
    • pbuilder
  3. Additional packages required, not listed in Trident’s documentation:

    • dh-systemd
    • golang-go
  4. Also, not listed in Trident’s “build” requirements list, you must have Go installed. In Trident’s “runtime” requirements list, it says version 1.5.1+, so I have downloaded and installed version 1.5.1.

    $ cd /usr/local
    $ wget https://storage.googleapis.com/golang/go1.5.1.linux-amd64.tar.gz
    $ sudo tar -xzf go1.5.5.linux-amd64.tar.gz
    $ export PATH=$PATH:/usr/local/go/bin
    
  5. If you have a copy of the Trident source code, determine which version it is by running

    $ /usr/sbin/tridentd --version
    
  6. Compare this with the latest version of Trident source code on GitHub. This is a little tricky because there is a mismatch of version numbers between the debian/changelog file in the repo and the tags and branch names.

    As of 13 Jul 2016, the official latest version is 1.2.0.

    Go to the Trident repo on the master branch and go to the debian/changelog file. Here you will see the latest version.

  7. Update or retrieve source code from GitHub. This may be a git clone or a git pull depending on how you are utilizing the Trident source (whether you need it once or if you are forking the repo).

  8. In root directory of Trident git source, build the package:

    $ dpkg-buildpackage -b -uc -us
    

    This will build the binaries one level up from the trident root dir.

    Note

    The dpkg-buildpackage command will prompt you for your github username and password.

    Note

    The dpkg-buildpackage command runs a script called doc/deps.sh which has a plethora for “cannot find package X” errors. This is a known issue, see https://github.com/bapril/trident/issues/371. It still seems to build a usable artifact...

  9. Place debian package wherever your Ansible role retrieves the package from for installation.

Provisioning Process

The following section outlines the steps needed to provision a host to stand up a working Trident instance.

  1. Ensure all variables for your deployment are set to the correct values. In particular, ensure any Trident-Postgres-Nginx-Postfix networking variables are set correctly.
  2. Apply the postgres Ansible role.
  3. Apply the nginx Ansible role.
  4. Apply the postfix Ansible role.
  5. Apply the trident Ansible role.

Once all the roles have been applied, on the nginx host, you should be able to browse to the proxy address and see the Trident homepage. Instructions about how to actually use Trident and set up trust groups, etc. can be found at tbd:tbd.

Trident Prerequisites

The following are prerequisites that must be installed and configured before installing and configuring Trident:

  • PostgreSQL 9.1+ database
  • Postfix
  • Nginx

PostgreSQL Database

The Trident documentation gives instructions on how to set up both a local postgres server and Trident database, as well as a remote server and database. In this section, we will cover and expand the instructions for installing and configuring a remote postgres server and Trident database. See Trident’s documentation page for a local installation and configuration.

For remote postgres servers, the Trident documentation recommends temporarily installing Trident on the remote target on which the postgres server will reside, use Trident’s tsetup command to create and setup the Trident database, then remove the Trident package.

Note

The “In a nutshell” steps in the “Remote Database” section of the Trident documentation seem to conflict with each other and the steps outlined in the “Local Database” section, which the location should really be the only thing that differentiates the two, I believe.

The following is my best interpretation, though it is just that, my interpretation. Notes and todo blocks follow at steps where I’m interpreting.

Essentially, the following steps would need to occur on the remote target:

  1. Install PostgreSQL 9.1+

  2. Create the system trident user

  3. Temporarily install the Trident package(s).

    Note

    Here is a confusing bit from the “nutshell” steps in the “Remote Database” section of the Trident documentation. The first two steps are to “Create the trident user” and “Create the trident database``, and the last step is “Run tsetup from the remote server as normal”. However, tsetup does those two things (user and database creation).

    The third step says “Provide permissions for the user to access the database”. I’m not sure which user this means–the PostgreSQL trident user, I’m assuming. I’m also assuming that since tsetup creates a trident user for PostgreSQL, it will also give it the appropriate permissions. (I’m assuming this because the “Local Database” section said nothing about giving anyone appropriate permissions.)

    Perhaps I’m confused, and this step means give the system trident user appropriate permissions, but...I don’t think the system user would be accessing the database.

    Either way, for now, until this is clarified, I’m “skipping” this step because it seems to be taken care of by another “step”.

  4. Properly configure the Trident daemon at /etc/trident/trident.conf

    The following is a template of trident.conf:

    #######################################################
    # Trident Configuration
    #######################################################
    # Except for comment lines (anything starting with '#')
    # this file is in the JSON format, thus mind the commas
    # and quotes otherwise Trident can't properly use it.
    #
    # This file should only be readable by the Trident user
    #######################################################
    
    {
      # Where the dbschemas, webroot and templates are located
      "file_root": "/usr/share/trident/",
    
      # Where variable files are stored
      "var_root": "/var/lib/trident/",
    
      # Crypto Keys for JWT (in directory relative to config dir)
      "jwt_key_prv": "jwt.prv",
      "jwt_key_pub": "jwt.pub",
    
      #########################################
      # PostgreSQL Database details
      #########################################
      # PSQL local unix socket
      # Uses PSQL peer authentication
      # This works out of the box on Debian
      #########################################
      "db_host": "/var/run/postgresql/",
      #"db_port": "5432",
      #"db_name": "trident",
      #"db_user": "trident",
      #"db_pass": "trident",
    
      "db_port": "{{ tridentDBPort }}",
      "db_name": "{{ tridentDBName }}",
      "db_user": "{{ tridentDBUser }}",
      "db_pass": "{{ tridentDBPass }}",
    
      # The Nodename is used to identify this instance
      # in a cluster of hosts. The name must be unique.
      #
      # The name is also used as a hostname for SMTP EHLO/HELO
      # messages and thus must be a FQDN.
      #
      # empty => system configured (typically /etc/hostname)
      "nodename": "{{ tridentFQDN }}",
    
      # On which HTTP port to run our Trident Daemon
      "http_port": "{{ tridentHTTPPort }}"
    }
    
  5. Properly configure the postgres pg_hba.conf file (location variable)

    The following is a template of pg_hba.conf:

    # PostgreSQL Client Authentication Configuration File
    # ===================================================
    #
    # Refer to the "Client Authentication" section in the PostgreSQL
    # documentation for a complete description of this file.  A short
    # synopsis follows.
    #
    # This file controls: which hosts are allowed to connect, how clients
    # are authenticated, which PostgreSQL user names they can use, which
    # databases they can access.  Records take one of these forms:
    #
    # local      DATABASE  USER  METHOD  [OPTIONS]
    # host       DATABASE  USER  ADDRESS  METHOD  [OPTIONS]
    # hostssl    DATABASE  USER  ADDRESS  METHOD  [OPTIONS]
    # hostnossl  DATABASE  USER  ADDRESS  METHOD  [OPTIONS]
    #
    # (The uppercase items must be replaced by actual values.)
    #
    # The first field is the connection type: "local" is a Unix-domain
    # socket, "host" is either a plain or SSL-encrypted TCP/IP socket,
    # "hostssl" is an SSL-encrypted TCP/IP socket, and "hostnossl" is a
    # plain TCP/IP socket.
    #
    # DATABASE can be "all", "sameuser", "samerole", "replication", a
    # database name, or a comma-separated list thereof. The "all"
    # keyword does not match "replication". Access to replication
    # must be enabled in a separate record (see example below).
    #
    # USER can be "all", a user name, a group name prefixed with "+", or a
    # comma-separated list thereof.  In both the DATABASE and USER fields
    # you can also write a file name prefixed with "@" to include names
    # from a separate file.
    #
    # ADDRESS specifies the set of hosts the record matches.  It can be a
    # host name, or it is made up of an IP address and a CIDR mask that is
    # an integer (between 0 and 32 (IPv4) or 128 (IPv6) inclusive) that
    # specifies the number of significant bits in the mask.  A host name
    # that starts with a dot (.) matches a suffix of the actual host name.
    # Alternatively, you can write an IP address and netmask in separate
    # columns to specify the set of hosts.  Instead of a CIDR-address, you
    # can write "samehost" to match any of the server's own IP addresses,
    # or "samenet" to match any address in any subnet that the server is
    # directly connected to.
    #
    # METHOD can be "trust", "reject", "md5", "password", "gss", "sspi",
    # "krb5", "ident", "peer", "pam", "ldap", "radius" or "cert".  Note that
    # "password" sends passwords in clear text; "md5" is preferred since
    # it sends encrypted passwords.
    #
    # OPTIONS are a set of options for the authentication in the format
    # NAME=VALUE.  The available options depend on the different
    # authentication methods -- refer to the "Client Authentication"
    # section in the documentation for a list of which options are
    # available for which authentication methods.
    #
    # Database and user names containing spaces, commas, quotes and other
    # special characters must be quoted.  Quoting one of the keywords
    # "all", "sameuser", "samerole" or "replication" makes the name lose
    # its special character, and just match a database or username with
    # that name.
    #
    # This file is read on server startup and when the postmaster receives
    # a SIGHUP signal.  If you edit the file on a running system, you have
    # to SIGHUP the postmaster for the changes to take effect.  You can
    # use "pg_ctl reload" to do that.
    
    # Put your actual configuration here
    # ----------------------------------
    #
    # If you want to allow non-local connections, you need to add more
    # "host" records.  In that case you will also need to make PostgreSQL
    # listen on a non-local interface via the listen_addresses
    # configuration parameter, or via the -i or -h command line switches.
    
    # CAUTION: Configuring the system for local "trust" authentication
    # allows any local user to connect as any PostgreSQL user, including
    # the database superuser.  If you do not trust all your local users,
    # use another authentication method.
    
    
    # TYPE  DATABASE        USER            ADDRESS                 METHOD
    
    # "local" is for Unix domain socket connections only
    local   all             all                                     trust
    # IPv4 local connections:
    host    all             all             127.0.0.1/32            trust
    # IPv6 local connections:
    host    all             all             ::1/128                 trust
    # Allow replication connections from localhost, by a user with the
    # replication privilege.
    #local   replication     postgres                                trust
    #host    replication     postgres        127.0.0.1/32            trust
    #host    replication     postgres        ::1/128                 trust
    
    # Allow connections to trident db from remote user via md5
    host     {{ tridentDBName }}        {{ tridentDBUser }}             0.0.0.0/0               md5
    
  6. Ensure reachability of the database port defined in /etc/trident/trident.conf

  7. Create the Trident database using the following command: su - postgres -c "/usr/sbin/tsetup setup_db

  8. Remove the Trident packages

Nginx Webserver

  1. Install Nginx

  2. Properly configure /etc/nginx/conf.d/trident.conf

    The following is a template of the nginx trident.conf for a production system:

    # The Trident Daemon Upstream
    include /etc/trident/nginx/trident-upstream.inc;
    
    # Redirect all HTTP (80) traffic to HTTPS (443)
    # Trident should only be exposed over HTTPS
    server {
      listen {{ nginxTridentHTTPPort }} default_server;
      listen [::]:{{ nginxTridentHTTPPort }} default_server;
    
            server_name _default_;
    
            rewrite ^ https://$host$request_uri permanent;
    }
    
    # The HTTPS server that exposed Trident
    server {
      listen {{ nginxTridentHTTPSPort }} ssl;
      listen [::]:{{ nginxTridentHTTPSPort }} ssl;
    
      server_name {{ tridentFQDN }};
    
      # May need to variablize these...
      ssl_certificate   trident.crt;
      ssl_certificate_key trident.key;
      ssl_prefer_server_ciphers on;
    
      # And other SSL options, recommended:
      # - ssl_dhparam
      # - ssl_protocols
      # - ssl_ciphers
      # See https://cipherli.st/ for details
    
      # STS header
      add_header Strict-Transport-Security "max-age=31536001";
    
      # HTTP Key Pinning
      add_header Public-Key-Pins "Public-Key-Pins: max-age=5184000; pin-sha256=\"...\""
    
      access_log /var/log/nginx/trident-access.log;
    
      # Include the config for making Trident work
      include /etc/trident/nginx/trident-server.inc;
    }
    

    The following is a template of the nginx trident.conf for a development system:

    # The Trident Daemon Upstream
    include /etc/trident/nginx/trident-upstream.inc;
    
    
    # The HTTP server that exposed Trident - development only
    
    server {
       listen {{ nginxTridentHTTPPort }} default_server;
       listen [::]:{{ nginxTridentHTTPPort }} default_server;
    
      server_name _default_;
    
      access_log /var/log/nginx/trident-access.log;
    
      # Include the config for making Trident work
      include /etc/trident/nginx/trident-server.inc;
    }
    

    Note

    With this config, Nginx will only listen for the Trident daemon on an HTTP port (no HTTPS).

  3. Properly configure Trident Daemon Upstream at /etc/trident/nginx/trident-upstream.inc

    The following is a template of trident-upstream.inc:

    upstream trident-daemon {
      server {{ tridentDBIP }}:{{ tridentDBPort }};
    }
    
  4. Properly configure the Trident server at /etc/trident/nginx/trident-server.inc

    The following is an example of trident-server.inc:

    
    
      # Our webroot (contains static, non-sensitive files, source if public ;)
      root /usr/share/trident/webroot/;
    
      ######################################################
      # Static files
      ######################################################
      location /css/ {
      }
    
      location /favicon.ico {
      }
    
      location /gfx/ {
      }
    
      location /js/ {
      }
    
      ######################################################
      # Forward all requests to the Trident Daemon
      ######################################################
      location / {
        client_max_body_size    0;
        proxy_set_header  Host $host;
        proxy_http_version  1.1;
        proxy_pass    http://trident-daemon;
      }
    
    

Postfix

  1. Install Postfix

  2. Know the answers to the following:

    • What type of mail configuration
    • The Fully Qualified Domain Name (FQDN) of your server
  3. Properly configure Postfix’s main config file at /etc/postfix/main.cf

    The following is a template of main.cf:

    # See /usr/share/postfix/main.cf.dist for a commented, more complete version
    
    
    # Debian specific:  Specifying a file name will cause the first
    # line of that file to be used as the name.  The Debian default
    # is /etc/mailname.
    #myorigin = /etc/mailname
    
    smtpd_banner = $myhostname ESMTP $mail_name (Ubuntu)
    biff = no
    
    # appending .domain is the MUA's job.
    append_dot_mydomain = no
    
    # Uncomment the next line to generate "delayed mail" warnings
    #delay_warning_time = 4h
    
    readme_directory = no
    
    # TLS parameters
    smtpd_tls_cert_file=/etc/ssl/certs/ssl-cert-snakeoil.pem
    smtpd_tls_key_file=/etc/ssl/private/ssl-cert-snakeoil.key
    smtpd_use_tls=yes
    smtpd_tls_session_cache_database = btree:${data_directory}/smtpd_scache
    smtp_tls_session_cache_database = btree:${data_directory}/smtp_scache
    
    # See /usr/share/doc/postfix/TLS_README.gz in the postfix-doc package for
    # information on enabling SSL in the smtp client.
    
    smtpd_relay_restrictions = permit_mynetworks permit_sasl_authenticated defer_unauth_destination
    #myhostname = dimsdev2.prisem.washington.edu
    myhostname = {{ postfixHostname }}
    alias_maps = hash:/etc/aliases
    alias_database = hash:/etc/aliases
    myorigin = /etc/mailname
    #mydestination = dimsdev2.prisem.washington.edu, localhost.prisem.washington.edu, , localhost
    mydestination = {{ postfixDestinations }}
    relayhost = 
    mynetworks = 127.0.0.0/8 [::ffff:127.0.0.0]/104 [::1]/128
    mailbox_size_limit = 0
    recipient_delimiter = +
    inet_interfaces = all
    inet_protocols = all
    
  4. Properly configure /etc/aliases

    The following is a template of aliases:

    # See man 5 aliases for format
    postmaster:    root
    {{ tridentHandlerName }}: "|/usr/sbin/trident-wrapper"
    
  5. Might have to configure Postfix’s master config file at /etc/postfix/master.cf

    The following is an example of master.cf:

    #
    # Postfix master process configuration file.  For details on the format
    # of the file, see the master(5) manual page (command: "man 5 master" or
    # on-line: http://www.postfix.org/master.5.html).
    #
    # Do not forget to execute "postfix reload" after editing this file.
    #
    # ==========================================================================
    # service type  private unpriv  chroot  wakeup  maxproc command + args
    #               (yes)   (yes)   (yes)   (never) (100)
    # ==========================================================================
    smtp      inet  n       -       -       -       -       smtpd
    #smtp      inet  n       -       -       -       1       postscreen
    #smtpd     pass  -       -       -       -       -       smtpd
    #dnsblog   unix  -       -       -       -       0       dnsblog
    #tlsproxy  unix  -       -       -       -       0       tlsproxy
    #submission inet n       -       -       -       -       smtpd
    #  -o syslog_name=postfix/submission
    #  -o smtpd_tls_security_level=encrypt
    #  -o smtpd_sasl_auth_enable=yes
    #  -o smtpd_reject_unlisted_recipient=no
    #  -o smtpd_client_restrictions=$mua_client_restrictions
    #  -o smtpd_helo_restrictions=$mua_helo_restrictions
    #  -o smtpd_sender_restrictions=$mua_sender_restrictions
    #  -o smtpd_recipient_restrictions=
    #  -o smtpd_relay_restrictions=permit_sasl_authenticated,reject
    #  -o milter_macro_daemon_name=ORIGINATING
    #smtps     inet  n       -       -       -       -       smtpd
    #  -o syslog_name=postfix/smtps
    #  -o smtpd_tls_wrappermode=yes
    #  -o smtpd_sasl_auth_enable=yes
    #  -o smtpd_reject_unlisted_recipient=no
    #  -o smtpd_client_restrictions=$mua_client_restrictions
    #  -o smtpd_helo_restrictions=$mua_helo_restrictions
    #  -o smtpd_sender_restrictions=$mua_sender_restrictions
    #  -o smtpd_recipient_restrictions=
    #  -o smtpd_relay_restrictions=permit_sasl_authenticated,reject
    #  -o milter_macro_daemon_name=ORIGINATING
    #628       inet  n       -       -       -       -       qmqpd
    pickup    unix  n       -       -       60      1       pickup
    cleanup   unix  n       -       -       -       0       cleanup
    qmgr      unix  n       -       n       300     1       qmgr
    #qmgr     unix  n       -       n       300     1       oqmgr
    tlsmgr    unix  -       -       -       1000?   1       tlsmgr
    rewrite   unix  -       -       -       -       -       trivial-rewrite
    bounce    unix  -       -       -       -       0       bounce
    defer     unix  -       -       -       -       0       bounce
    trace     unix  -       -       -       -       0       bounce
    verify    unix  -       -       -       -       1       verify
    flush     unix  n       -       -       1000?   0       flush
    proxymap  unix  -       -       n       -       -       proxymap
    proxywrite unix -       -       n       -       1       proxymap
    smtp      unix  -       -       -       -       -       smtp
    relay     unix  -       -       -       -       -       smtp
    #       -o smtp_helo_timeout=5 -o smtp_connect_timeout=5
    showq     unix  n       -       -       -       -       showq
    error     unix  -       -       -       -       -       error
    retry     unix  -       -       -       -       -       error
    discard   unix  -       -       -       -       -       discard
    local     unix  -       n       n       -       -       local
    virtual   unix  -       n       n       -       -       virtual
    lmtp      unix  -       -       -       -       -       lmtp
    anvil     unix  -       -       -       -       1       anvil
    scache    unix  -       -       -       -       1       scache
    #
    # ====================================================================
    # Interfaces to non-Postfix software. Be sure to examine the manual
    # pages of the non-Postfix software to find out what options it wants.
    #
    # Many of the following services use the Postfix pipe(8) delivery
    # agent.  See the pipe(8) man page for information about ${recipient}
    # and other message envelope options.
    # ====================================================================
    #
    # maildrop. See the Postfix MAILDROP_README file for details.
    # Also specify in main.cf: maildrop_destination_recipient_limit=1
    #
    maildrop  unix  -       n       n       -       -       pipe
      flags=DRhu user=vmail argv=/usr/bin/maildrop -d ${recipient}
    #
    # ====================================================================
    #
    # Recent Cyrus versions can use the existing "lmtp" master.cf entry.
    #
    # Specify in cyrus.conf:
    #   lmtp    cmd="lmtpd -a" listen="localhost:lmtp" proto=tcp4
    #
    # Specify in main.cf one or more of the following:
    #  mailbox_transport = lmtp:inet:localhost
    #  virtual_transport = lmtp:inet:localhost
    #
    # ====================================================================
    #
    # Cyrus 2.1.5 (Amos Gouaux)
    # Also specify in main.cf: cyrus_destination_recipient_limit=1
    #
    #cyrus     unix  -       n       n       -       -       pipe
    #  user=cyrus argv=/cyrus/bin/deliver -e -r ${sender} -m ${extension} ${user}
    #
    # ====================================================================
    # Old example of delivery via Cyrus.
    #
    #old-cyrus unix  -       n       n       -       -       pipe
    #  flags=R user=cyrus argv=/cyrus/bin/deliver -e -m ${extension} ${user}
    #
    # ====================================================================
    #
    # See the Postfix UUCP_README file for configuration details.
    #
    uucp      unix  -       n       n       -       -       pipe
      flags=Fqhu user=uucp argv=uux -r -n -z -a$sender - $nexthop!rmail ($recipient)
    #
    # Other external delivery methods.
    #
    ifmail    unix  -       n       n       -       -       pipe
      flags=F user=ftn argv=/usr/lib/ifmail/ifmail -r $nexthop ($recipient)
    bsmtp     unix  -       n       n       -       -       pipe
      flags=Fq. user=bsmtp argv=/usr/lib/bsmtp/bsmtp -t$nexthop -f$sender $recipient
    scalemail-backend unix	-	n	n	-	2	pipe
      flags=R user=scalemail argv=/usr/lib/scalemail/bin/scalemail-store ${nexthop} ${user} ${extension}
    mailman   unix  -       n       n       -       -       pipe
      flags=FR user=list argv=/usr/lib/mailman/bin/postfix-to-mailman.py
      ${nexthop} ${user}
    
    
  6. Might have to configure additional email addresses at /etc/postfix/virtual

    The following is a template of virtual:

    mail-handler@example.net {{ tridentHandlerName }}
    @example.net             {{ tridentHandlerName }}
    

Note

The Trident documentation gave the information used to configure the /etc/aliases file and the /etc/postfix/virtual file, but then just said “Of course do configure the rest of Postfix properly.” I don’t really know what that means, so that’s why I included the master.cf file, since that was included in the /etc/postfix dir. There are a couple other files there, /etc/postfix/dynamicmaps.cf and /etc/postfix/postfix-files, along with a sasl/ dir and a couple scripts.

Install Trident

Now we can install the Trident server and the Trident CLI.

  1. Retrieve the Trident debian packages from source.prisem.washington.edu

    $ wget http://source.prisem.washington.edu:8442/trident-server_1.0.3_amd64.deb
    $ wget http://source.prisem.washington.edu:8442/trident-cli_1.0.3_amd64.deb
    

    Note

    The version may change...the above commands need to be kept in sync.

  2. Properly configure the Trident daemon at /etc/trident/trident.conf

    This template can be seen in the PostgreSQL Database section.

  3. Properly configure Trident daemon defaults at /etc/default/trident

    The following is an example of /etc/default/trident:

    # This is a configuration file for /etc/init.d/trident; it allows you to
    # perform common modifications to the behavior of the Trident daemon
    # startup without editing the init script (and thus getting prompted
    # by dpkg on upgrades).
    
    # Start Trident at startup ? (ignored by systemd)
    TRIDENT_ENABLED=No
    
    # The username as who to run Trident
    DAEMON_USER=trident
    
    # Extra options to pass to the Trident daemon
    DAEMON_OPTS="-username trident -insecurecookies -disabletwofactor -debug -config /etc/trident"
    

Running Trident

There are several ways of running the Trident daemon, but we have divided them into a “secure, non-debug” way and a “non-secure, debug” way.

  • Insecure, debug:

    DAEMON_USER=trident /usr/sbin/tridentd \
       -insecurecookies \
       -disabletwofactor \
       -debug \
       -config /etc/trident/ \
       -daemonize \
       -syslog \
       -verbosedb
    
  • Secure, non-debug:

    DAEMON_USER=trident /usr/sbin/tridentd \
       -config /etc/trident/ \
       -daemonize \
    

Note

  • The above code is from a start script used by the Dockerfile created by Linda Parsons ($GIT/dims-dockerfiles/dockerfiles/trident/conf/start.sh). I just grabbed it to show how to run the daemon. We should probably always have syslog enabled...
  • There’s a note in that start script that says using the daemonize flag doesn’t appear to be daemonizing the Trident daemon. Should keep that in mind.

Using tcli on the command line

The following output shows some of the commands available to tcli command line users, and how to log in as a sysadmin user to gain access to more commands.

[dimsenv] ansible@yellow:~ () $ tcli help
-=- Trident Help -=-

Welcome to the Trident menu system which is CLI command based.
If a given command is not in help menu the selected user does not have permissions for it.

Each section, items marked [SUB], has its own 'help' command.

The following commands are available on the root level:
 user                 [SUB]                User commands
 system               [SUB]                System commands
[dimsenv] ansible@yellow:~ () $ tcli user help
Help for user
 password             [SUB]                Password commands
[dimsenv] ansible@yellow:~ () $ tcli system help
Help for system
 login                <username> <password> <twofactor> Login
 logout                                    Logout
 whoami                                    Who Am I?
 get                  [SUB]                Get values from the system
[dimsenv] ansible@yellow:~ () $ tcli system login trident trident123
Login successful
[dimsenv] ansible@yellow:~ () $ tcli system whoami
Username: trident
Fullname:
[dimsenv] ansible@yellow:~ () $ tcli system swapadmin
Now a SysAdmin user
[dimsenv] ansible@yellow:~ () $ tcli system help
Help for system
 report                                    Report system statistics
 login                <username> <password> <twofactor> Login
 logout                                    Logout
 whoami                                    Who Am I?
 swapadmin                                 Swap from regular to sysadmin user
 set                  [SUB]                Configure the system
 get                  [SUB]                Get values from the system
[dimsenv] ansible@yellow:~ () $ tcli user help
Help for user
 new                  <username> <email>   Create a new user
 nominate             <username> <email> <bio_info> <affiliation> <descr> Nominate New User
 set                  [SUB]                Set properties of a user
 get                  [SUB]                Get properties of a user
 list                 <match>              List all users
 merge                <into> <from>        Merge a user
 delete               <username>           Delete a new user
 2fa                  [SUB]                2FA Token Management
 email                [SUB]                Email commands
 password             [SUB]                Password commands
 detail               [SUB]                Manage Contact Details
 language             [SUB]                Manage Language Skills
[dimsenv] ansible@yellow:~ () $

There are certain things with which a DIMS system is automatically configured. These attributes are set via tasks in the Trident Ansible role:

---

# file: v2/roles/trident/tasks/main.yml

<snip>

- name: Ensure trident administator is logged in
  shell: "tcli system login {{ trident.initial_sysadmin.name }} {{ trident.initial_sysadmin.password }}"
  register: tcli_login
  no_log: true
  when: ansible_lsb.codename == "jessie"
  become: yes
  tags: [ trident ]

- name: Require successful login to trident
  fail: "Failed to log in via trident: {{ tcli_login.stdout }}"
  when: ansible_lsb.codename == "jessie" and tcli_login.stdout != "Login successful"
  tags: [ trident ]

- name: Ensure system configurtion is present
  shell: "{{ item }}"
  with_items:
   - "tcli system swapadmin"
   - "tcli system set name '{{ trident.name }}'"
   - "tcli system set welcome_text '{{ trident.welcome_text }}'"
   - "tcli system set url_public {{ trident.url_public }}"
   - "tcli system set adminname '{{ trident.adminname }}'"
   - "tcli system set adminemail '{{ trident.adminemail }}'"
   - "tcli system set email_domain '{{ trident.email_domain }}'"
   - "tcli system set people_domain '{{ trident.people_domain }}'"
   - "tcli system set logo_image {{ trident.logo_image }}"
   - "tcli system set header_image {{ trident.header_image }}"
  when: ansible_lsb.codename == "jessie" and tcli_login.stdout == "Login successful"
  become: yes
  tags: [ trident ]

<snip>

#EOF

Once the role is run against the host machine which is to run the Trident application, not only is Trident running, and you have access to the web application, but the web app shows that the customization has taken place.

Additionally, we bootstrap global initial admin accounts and a initial trust group with its mailing lists:

---

# file: v2/roles/trident/tasks/main.yml

<snip>

- name: Ensure trident administator is logged in
  shell: "tcli system login {{ trident.initial_sysadmin.name }} {{ trident.initial_sysadmin.password }}"
  register: tcli_login
  no_log: true
  when: ansible_lsb.codename == "jessie"
  become: yes
  tags: [ trident ]

- name: Require successful login to trident
  fail: "Failed to log in via trident: {{ tcli_login.stdout }}"
  when: ansible_lsb.codename == "jessie" and tcli_login.stdout != "Login successful"
  tags: [ trident ]

<snip>

- name: Ensure initial sysadmin user example email is not present
  shell: "tcli user email remove trident@trident.example.net"
  when: ansible_lsb.codename == "jessie" and tcli_login.stdout == "Login successful"
  become: yes
  tags: [ trident ]

- name: Ensure initial sysadmin user email is present
  shell: "tcli user email add {{ trident.initial_sysadmin.name }} {{ trident.initial_sysadmin.email }}"
  when: ansible_lsb.codename == "jessie" and tcli_login.stdout == "Login successful"
  become: yes
  tags: [ trident ]

- name: Force initial sysadmin email address to be confirmed
  shell: "tcli user email confirm_force {{ trident.initial_sysadmin.name }} {{ trident.initial_sysadmin.email }}"
  when: ansible_lsb.codename == "jessie" and tcli_login.stdout == "Login successful"
  become: yes
  tags: [ trident ]

- name: Ensure initial TG is present
  shell: "tcli tg add {{ trident.initial_tg.ident }}"
  when: ansible_lsb.codename == "jessie" and tcli_login.stdout == "Login successful"
  become: yes
  tags: [ trident ]

- name: Ensure initial TG description is present
  shell: "tcli tg set descr {{ trident.initial_tg.descr }}"
  when: ansible_lsb.codename == "jessie" and tcli_login.stdout == "Login successful"
  become: yes
  tags: [ trident ]

- name: Ensure initial ML is present
  shell: "tcli ml new {{ trident.initial_tg.ident }} {{ trident.initial_ml }}"
  when: ansible_lsb.codename == "jessie" and tcli_login.stdout == "Login successful"
  become: yes
  tags: [ trident ]

- name: Ensure global admin accounts are present
  shell: "tcli user new {{ item.key }} {{ item.value.email }}"
  with_dict: "{{ trident_admins }}"
  when: ansible_lsb.codename == "jessie" and tcli_login.stdout == "Login successful"
  become: yes
  tags: [ trident ]

- name: Ensure global admin accounts have passwords
  shell: "tcli user password set portal {{ item.key }} {{ tridentSysAdminPass }}"
  with_dict: "{{ trident_admins }}"
  when: ansible_lsb.codename == "jessie" and tcli_login.stdout == "Login successful"
  become: yes
  tags: [ trident ]

- name: Force global admin emails to be confirmed
  shell: "tcli user email confirm_force {{ item.key }} {{ item.value.email }}"
  with_dict: "{{ trident_admins }}"
  when: ansible_lsb.codename == "jessie" and tcli_login.stdout == "Login successful"
  become: yes
  tags: [ trident ]

- name: Ensure global admin users have global sysadmin rights
  shell: "tcli user set sysadmin {{ item.key }} true"
  with_dict: "{{ trident_admins }}"
  when: ansible_lsb.codename == "jessie" and tcli_login.stdout == "Login successful"
  become: yes
  tags: [ trident ]

- name: Nominate global admin users to initial TG
  shell: "tcli tg member nominate {{ trident.initial_tg.ident }} {{ item.key }}"
  with_dict: "{{ trident_admins }}"
  when: ansible_lsb.codename == "jessie" and tcli_login.stdout == "Login successful"
  become: yes
  tags: [ trident ]

- name: Approve global admin users to initial TG
  shell: "tcli tg member approve {{ trident.initial_tg.ident }} {{ item.key }}"
  with_dict: "{{ trident_admins }}"
  when: ansible_lsb.codename == "jessie" and tcli_login.stdout == "Login successful"
  become: yes
  tags: [ trident ]

- name: Ensure global admin users have initial TG sysadmin rights
  shell: "tcli tg member promote {{ trident.initial_tg.ident }} {{ item.key }}"
  with_dict: "{{ trident_admins }}"
  when: ansible_lsb.codename == "jessie" and tcli_login.stdout == "Login successful"
  become: yes
  tags: [ trident ]

<snip>

#EOF

At the end of the role, there are now admin accounts that can be immediately used to set up other trust groups and other mailing lists, as well begin and continue the process of curating memberships of these trust groups.

To set these things up yourself, follow these commands:

Now you should have a pretty good understanding of how tcli works. Always remember to login and then “swapadmin” when you need to change and customize things.

Configuring Trident via web app

Once Trident is running and DNS is working properly, to get to the web GUI, you will navigate to trident.$category.$deployment in your web browser, given what development category and DIMS deployment you are in.

This will open the following home page:

_images/TridentHomePage.png

Trident home page

To login, click the sign-in button, which will take you to the following page where you can enter your login information:

_images/TridentLogin.png

Trident login page

The next page that opens will be a more or less blank page until you set up some trust groups:

_images/TridentInitialLogin.png

Trident initial login page

In the top right corner will be your profile image (though it will just say “Profile Image” until you upload one), as well as the Trident system name (unconfigured at the beginning), your username, your “UserMode” or status, and the logout link. The “UserMode” is either “Regular” or “Sysadmin”. You must have system administration access in order to anything besides edit your own profile and look at trust group information of trust groups you are in.

To switch to a “Sysadmin” UserMode, click the “Regular” UserMode link in the top right corner. This will swap you to “Sysadmin” status and the page will slightly change. This is shown below:

_images/TridentToSysadmin.png

Change to sysadmin

Changing to “sysadmin” allows you to add and configure trust groups, to have acces to the Trident command line interface, tcli (or “tickly”), and to view and monitor reports, logs, and settings for this particular Trident system.

User configurations

This section walks through the configuration of a user who has sysadmin privileges. There are a couple differences between what a “regular” user can configure and what a “sysadmin” user can configure. The “password reset” section is not available to users without sysadmin privileges. Additionally, there are a couple profile items hidden from regular users.

To begin, click the “User” tab at the top of the page. This will take you to a table of contents page with links to various things you can edit or look at for your user. These are also itemized in the second row at the top of the page.

_images/TridentUserAccountInfo.png

Options for editing a user

To edit the user’s profile, click the “Profile” link, either in the table of contents list or in the second row at the top of the page. This will take you to a page where you can edit profile information for the user.

_images/TridentUserProfile.png

Profile options

To update the profile, make sure to scroll all the way through all the options, and at the end of the page, there is the “Update Profile” button. This will leave you at the Profile page, but if you scroll all the way back down, you’ll see a notice about how many fields were update and how many were not modified.

_images/TridentUpdateUserProfile.png

Profile update

You can change your user’s username:

_images/TridentUsername.png

Change user’s username

You can change your user’s password:

_images/TridentUserPassword.png

Change user’s password

You can set up two-factor authentication:

_images/TridentUser2FA.png

Setup two-factor authentication

You must add and verify your email address to receive emails from trust groups to which you belong. First, “create” your email:

_images/TridentUserCreateEmail.png

Create user email

Once you submit your email address, you must get a verification code. Click the “Verify” button on this page to get the verification code sent to you via email:

_images/TridentUserVerifyEmail.png

Verify user email

Once you receive the email with the code, put the code in the “Verification Code” box on the following page:

_images/TridentUserEmailVerifyCode.png

Submit verification code

If it is a valid verification code, your email’s status will change from “Unverified” to “Verified”.

_images/TridentUserEmailVerified.png

Verified email status

You can also download your user’s PGP keys:

_images/TridentUserPGPKeys.png

Download PGP keys

You can also view an audit log for your user:

_images/TridentUserAuditLog.png

View user audit log

As a “sysadmin” user, you can do all of these things for all users under your administration. A list of these users can be found by clicking the “User” tab in the second row at the top of the page, when in “Sysadmin” UserMode.

_images/TridentSysadminUserList.png

View user list as sysadmin

Additionally, only a sysadmin can reset another user’s password or remove an email address.

_images/TridentUserPasswordReset.png

Reset a user’s password as sysadmin

_images/TridentUserEmailRemove.png

Remove an email as sysadmin

Sysadmin configurations

Sysadmins can set up trust groups, view information about their system, and use the Trident command line interface, tcli (or “tickly”), through the web app. This section walks through these features.

Trust group configurations

The initial login page will list your trust groups. If you don’t have any, or to add new ones, click the “Add Trust Group” link in the second row at the top of the page.

_images/TridentTrustGroupAdd-1.png

No trust groups, yet.

The following page will start the configuration of the trust group, starting with a name for the trust group.

_images/TridentTrustGroupAdd.png

Add a trust group

Warning

If there isn’t at least one verified email address, this will fail.

Once you have at least one trust group, clicking the “Trust Group” tab at the top of the page will give you an index of the trust groups you have access to. This list can be seen as a regular user or as a sysadmin user, as can be seen by this page (shown from the regular user perspective):

_images/TridentTGRegular.png

List of trust groups

As a sysadmin user, however, you can do much more than just view a list of trust groups. For all trust groups under your administration, you can manage users, set up mailing lists, view audit logs, set up and use wiki and file storage, as well as set other configurations and download PGP keys.

In order to have access to the wiki and file storage, you must set that up via the group settings:

_images/TridentTrustGroupSettings.png

Some trust group settings

You must select the “Wiki Module” and “Files Module” if you want to use those features:

_images/TridentTrustGroupSettings.png

Some trust group settings

Trust group wiki:

_images/TridentTrustGroupWiki.png

Empty trust group wiki

Trust group files:

_images/TridentTrustGroupFiles.png

Empty trust group file storage

You can then click the tabs near the top of the page or the green buttons in the middle of the page to “Add” a file or directory or to list the files and directories.

Download PGP keys:

_images/TridentTrustGroupPGPKeysDownload.png

Download trust group PGP keys

See list of trust group members:

_images/TridentTrustGroupMembers.png

List of trust group members

To nominate a user, you must search for them via their email address:

_images/TridentTrustGroupNominate.png

Search by email to nominate user

To add mailing lists, choose a trust group, then click the “Mailing List” tab in the second row at the top of the page. There are some default mailing lists when you add a trust group:

_images/TridentTrustGroupMailingList.png

Default trust group mailing lists

Click the “New Mailing List” in the second row at the top of the page. On the next page, give your mailing list a name:

_images/TridentTrustGroupMailingListAdd.png

Add trust group mailing list

You can then see the newly added mailing list:

_images/TridentTrustGroupMailingListWithNew.png

Default and added mailing list index

Once the mailing list is created, you can update its settings, subscribe or unsubscribe users, and view the PGP key.

To update a mailing list’s settings, choose a mailing list, then click the “Settings” tab in the second row at the top of the page.

_images/TridentTrustGroupMailingListUpdateNew.png

Update mailing list settings

If no users have been subscribed to a mailing list, you’ll see the following page:

_images/TridentTGMLGDNoMember.png

No members on mailing list

To add a user to a mailing list, choose a trust group and a mailing list, then click the “Subscribe” tab in the second row at the top of the page. Type in the username of the user you’d like to subscribe to the list.

_images/TridentTGMLGeneralAdd.png

Add member to mailing list

If the user already exists on a mailing list, you’ll see the following:

_images/TridentTGMLGeneralAlreadyMember.png

Already member on mailing list

To see the users on a mailing list, choose a trust group and a mailing list, and you’ll see a list of users and basic information about them:

_images/TridentTGMLGeneralListPopulated.png

List of users on mailing list

As a user, you can see which mailing lists you are subscribed to by particular trust groups:

_images/TridentTGMLGeneralSubscribed.png

Mailing list subscription status

To unsubscribe a user, choose a trust group and a mailing list, then click the “Unsubscribe” tab in the second row at the top of the page. Then give the username you’d like to unsubscribe from the given mailing list, and click “Unsubscribe”.

_images/TridentTrustGroupMailingListUnsubscribe.png

Unsubscribe a user

System information

To view the Trident System information, you must be a sysadmin. Click the “System” tab in the top row at the top of the page.

_images/TridentSystemOptions.png

Trident system information options

To view the audit log, click the “Audit Log” link in the index, or click the “Audit Log” tab in the second row at the top of the page.

_images/TridentSystemAuditLog.png

Trident system audit log

To view the report, click the “Report” link in the index, or click the “Report” tab in the second row at the top of the page.

_images/TridentSystemReport.png

Trident system report

To change the system settings, click the “Settings” link in the index, or click the “Settings” tab in the second row at the top of the page.

_images/TridentSystemSettings.png

Trident system settings

Don’t forget to click the “Update Settings” button at the bottom of the page for the changes to take affect.

_images/TridentSystemUpdateSettings.png

Update Trident system settings

Basic tcli use

To use tcli via the web app, you must be a sysadmin user. Click the “CLI” tab at the top of the page.

To get started, you can type the “help” command into the box, and you’ll get useful information on how to run tcli:

_images/TridentTCLI.png

Get tcli help

Anything you can run on the command line using tcli, you can run via the web app.

Upgrading configuration across Trident versions

One of the challenges with integrating open source applications into a continuous delivery or automated deployment environment has to do with managing customizations across changes in ongoing releases. From one version of a program to another, the contents of congiruation files may change, they may be split into more configuration files, or merged from many into a smaller number, or their names and/or directory paths changed.

The first challenge with automating the configuration and installation of an open source application requires figuring out which files to put under Ansible control, and how to template those files so as to use variables in a way that supports customized deployments.

Each time a new release comes out, opportunities for things to break exist. Simply updating the version number and re-installing may work, but it may also break one or more things in the application. Some things that break will be easy to detect when starting a service, or running the application, but other problems may not be detected until long into the execution of some application or service that cause problems that are much harder to debug due to time between updating and encountering the problem.

To manage the upgrade process, one or more of the following tasks must be performed.

  1. Differencing the contents of files under Ansible control to determine when configuration customization changes are necessary, or whether it is safe to just update and move on.
  2. Differening the contents of the distribution archive, or resulting installed files, to detect file name changes, new configuration files, etc. Knowing when the contents of default files have changed in the face of continuous deployment of files that are under Ansible control, takes some getting used to. Having a development environment in which a default installation can be performed, or using a basic “vanilla” virtual machine to hand-install the new package to look at the resulting files, may be necessary.
  3. Chosing how to handle file name changes for possible backward-compatibility or multi-version support. This may involve complicated Ansible when conditionals, file names containing version numbers, or other mechanisms that prevent situations where a change results in a situation where the playbook only works with versions <= N or >=N in a mutually-exclusive exclusive way.

To see how these problems manifest themselves, and how to detect and handle them, let’s take a look at two different releases of the trident portal system. We will compare two releases, versions 1.3.8 and 1.4.2.

We start by extracting the contents of each release’s deb archive file into a directory where we can examine and/or compare the files.

$ cd /tmp
$ dpkg -x /vm/cache/sources/trident-server_1.3.8_amd64.deb trident_1.3.8
$ dpkg -x /vm/cache/sources/trident-server_1.4.2_amd64.deb trident_1.4.2

We now have two parallel directories in /tmp. Using the Unix diff program, we can see which files differ in content, or differ in existence (i.e., occur in one directory, but not the other).

Here is an example of changes to file contents:

$ diff -r trident_1.3.8/ trident_1.4.2/
diff -r trident_1.3.8/etc/init.d/trident trident_1.4.2/etc/init.d/trident
109a110,113
>   rotate)
>       start-stop-daemon --stop --quiet --signal USR1 --exec ${DAEMON} --pidfile ${PIDFILE} --name ${DNAME}
>       ;;
>
116c120
<       log_action_msg "Usage: ${SCRIPTNAME} {start|stop|restart|status}" || true
---
>       log_action_msg "Usage: ${SCRIPTNAME} {start|stop|restart|status|rotate}" || true
diff -r trident_1.3.8/etc/trident/nginx/trident-server.inc trident_1.4.2/etc/trident/nginx/trident-server.inc
11,12d10
< #     include
< # ------------------>8
13a12,13
> #     ssl_certificate ...
> #     ...
15c15,17
<
---
> #     include /etc/trident/nginx/trident-server.inc
> # }
> # ------------------>8
23c25,28
<       location /css/ {
---
>       location ~ ^/(css|gfx|js)/ {
>               expires 7d;
>               root /usr/share/;

Here are examples of file system changes, specifically those files in the webroot directory:

 $ diff -r trident_1.3.8/ trident_1.4.2/ | grep '^Only' | grep '/webroot'
 Only in trident_1.3.8/usr/share/trident/webroot/css: epiceditor
 Only in trident_1.3.8/usr/share/trident/webroot/css: form.css
 Only in trident_1.3.8/usr/share/trident/webroot/css: style.css
 Only in trident_1.4.2/usr/share/trident/webroot/css: trident.css
 Only in trident_1.3.8/usr/share/trident/webroot: favicon.ico
 Only in trident_1.3.8/usr/share/trident/webroot/gfx: gm.jpg
 Only in trident_1.3.8/usr/share/trident/webroot/gfx: info.png
 Only in trident_1.3.8/usr/share/trident/webroot/gfx: invalid.png
 Only in trident_1.3.8/usr/share/trident/webroot/gfx: logo.png
 Only in trident_1.3.8/usr/share/trident/webroot/gfx: red_asterisk.png
 Only in trident_1.3.8/usr/share/trident/webroot/gfx: search.png
 Only in trident_1.3.8/usr/share/trident/webroot/gfx: unknown_person.jpg
 Only in trident_1.3.8/usr/share/trident/webroot/gfx: valid.png
 Only in trident_1.3.8/usr/share/trident/webroot/gfx: warning.png
 Only in trident_1.3.8/usr/share/trident/webroot/gfx: xkcd_password_strength.png
 Only in trident_1.3.8/usr/share/trident/webroot: js
 Only in trident_1.3.8/usr/share/trident/webroot: robots-ok.txt
 Only in trident_1.3.8/usr/share/trident/webroot: robots.txt

We can see that one file (form.css) was removed between release 1.3.8 and 1.4.2, while one file (style.css) was renamed, possibly including the now-absent form.css` file, to a new file named ``trident.css. By looking at the contents of the form.css file, it is clear that .styled_form is one of the unique elements defined in this file. Looking at the contents of the same directory from both versions seems to support the hypothesis that this file was merged:

$ grep -r styled_form trident_1.3.8/usr/share/trident/webroot/css/
trident_1.3.8/usr/share/trident/webroot/css/style.css:form#wikiform.styled_form
trident_1.3.8/usr/share/trident/webroot/css/form.css:.styled_form .form_hint, .styled_form .required
trident_1.3.8/usr/share/trident/webroot/css/form.css:.styled_form ul
trident_1.3.8/usr/share/trident/webroot/css/form.css:.styled_form li
trident_1.3.8/usr/share/trident/webroot/css/form.css:.styled_form h2
trident_1.3.8/usr/share/trident/webroot/css/form.css:.styled_form label
trident_1.3.8/usr/share/trident/webroot/css/form.css:.styled_form input, .fakebutton
trident_1.3.8/usr/share/trident/webroot/css/form.css:.styled_form textarea
trident_1.3.8/usr/share/trident/webroot/css/form.css:.styled_form input[type=number]
trident_1.3.8/usr/share/trident/webroot/css/form.css:.styled_form input[type=radio]
trident_1.3.8/usr/share/trident/webroot/css/form.css:.styled_form input[type=submit], .fakebutton
trident_1.3.8/usr/share/trident/webroot/css/form.css:.styled_form input, .styled_form textarea, .fakebutton
trident_1.3.8/usr/share/trident/webroot/css/form.css:.styled_form input:focus, .styled_form textarea:focus, .fakebutton
trident_1.3.8/usr/share/trident/webroot/css/form.css:.styled_form input:required, .styled_form textarea:required
trident_1.3.8/usr/share/trident/webroot/css/form.css:.styled_form input:required:valid, .styled_form textarea:required:valid
trident_1.3.8/usr/share/trident/webroot/css/form.css:.styled_form input:focus:invalid, .styled_form textarea:focus:invalid
trident_1.3.8/usr/share/trident/webroot/css/form.css:form.styled_form li.info label, form.styled_form li.error label, form.styled_form li.okay label, form.styled_form li.warning label, form.styl
ed_form li.required label
trident_1.3.8/usr/share/trident/webroot/css/form.css:form.styled_form li.info label
trident_1.3.8/usr/share/trident/webroot/css/form.css:form.styled_form li.error label
trident_1.3.8/usr/share/trident/webroot/css/form.css:form.styled_form li.okay label
trident_1.3.8/usr/share/trident/webroot/css/form.css:form.styled_form li.warning label
trident_1.3.8/usr/share/trident/webroot/css/form.css:form.styled_form li.required label
trident_1.3.8/usr/share/trident/webroot/css/form.css:.styled_form input:hover + .form_hint, .styled_form textarea:hover + .form_hint
trident_1.3.8/usr/share/trident/webroot/css/form.css:.styled_form input:required:valid + .form_hint, .styled_form textarea:required:valid + .form_hint,
trident_1.3.8/usr/share/trident/webroot/css/form.css:.styled_form input:required:valid + .form_hint::before, .styled_form textarea:required:valid + .form_hint::before
trident_1.3.8/usr/share/trident/webroot/css/form.css:.styled_form input[type=submit], .fakebutton, .styled_button input
trident_1.3.8/usr/share/trident/webroot/css/form.css:.styled_form input[type=submit], .fakebutton, .styled_button input
trident_1.3.8/usr/share/trident/webroot/css/form.css:.styled_form input[type=submit]:disabled
trident_1.3.8/usr/share/trident/webroot/css/form.css:.styled_form input[type=submit].deny
trident_1.3.8/usr/share/trident/webroot/css/form.css:.styled_form input[type=checkbox], input[type=radio]
trident_1.3.8/usr/share/trident/webroot/css/form.css:.styled_form input[type=checkbox]:checked, input[type=radio]:checked
trident_1.3.8/usr/share/trident/webroot/css/form.css:.styled_form input[type=checkbox]:disabled, input[type=radio]:disabled
trident_1.3.8/usr/share/trident/webroot/css/form.css:.styled_form input[type=checkbox]:checked:disabled, input[type=radio]:checked:disabled
trident_1.3.8/usr/share/trident/webroot/css/form.css:.styled_form input[type=checkbox]:after, input[type=radio]:after
trident_1.3.8/usr/share/trident/webroot/css/form.css:.styled_form input[type=checkbox]:disabled:after, input[type=radio]:disabled:after
trident_1.3.8/usr/share/trident/webroot/css/form.css:.styled_form input[type="checkbox"]:checked:after,input[type="radio"]:checked:after
trident_1.3.8/usr/share/trident/webroot/css/form.css:.styled_form input[type="checkbox"]:focus
trident_1.3.8/usr/share/trident/webroot/css/form.css:.styled_form textarea.console
$ grep -r styled_form trident_1.4.2/usr/share/trident/webroot/css/
trident_1.4.2/usr/share/trident/webroot/css/trident.css:.login form.styled_form
trident_1.4.2/usr/share/trident/webroot/css/trident.css:.login .styled_form input
trident_1.4.2/usr/share/trident/webroot/css/trident.css:.login .styled_form input[type="submit"]

The problem now is how to support one CSS file named style.css for (at least) version 1.3.8, but a file named trident.css for (at least) version 1.4.2. There still remains the question, “When did this change occur, and how do we instruct Ansible which file to use?”

If, on the other hand, the file name has not changed but its contents vary significantly (e.g., one uses a variable named file_root and the other has changed to using a variable named file_roots), it becomes more complicated in managing a file with one name, but two different contents. This requires differentiating files by metadata (i.e., the name must include a version number or some other unique string), or the use of Jinja conditionals must be done. The latter mechanism of Jinja conditional inclusion, is a bit simpler and is easiest to manage in terms of file differencing as the mechanism for maintaining the contents of different versions of the file.

For example, here is how the difference between content in the file trident.conf.j2 can be managed using Jinja conditionals:

# {{ ansible_managed }} [ansible-playbooks v{{ ansibleplaybooks_version }}]
#
#######################################################
# Trident Configuration
#######################################################
# Except for comment lines (anything starting with '#')
# this file is in the JSON format, thus mind the commas
# and quotes otherwise Trident can't properly use it.
#
# This file should only be readable by the Trident user
#######################################################

{
{% if trident.version in [ '1.3.8' ] %}
    "file_root": "/usr/share/trident/",
{% endif %}
{% if trident.version in [ '1.4.2' ] %}
    # Where the dbschemas, webroot and templates are located
    "file_roots": [ "/usr/share/trident/", "/usr/share/pitchfork/" ],
{% endif %}

    # Where variable files are stored
    "var_root": "/var/lib/trident/",

    # TODO(dittrich): Try to get this to rsyslog for sorting, not separate logging
    # Log File location (logrotate rotates it)
    "logfile": "/var/log/trident/trident.log",

    # Crypto Keys for JWT (in directory relative to config dir)
    "jwt_key_prv": "jwt.prv",
    "jwt_key_pub": "jwt.pub",

{% if trident.version in [ '1.4.2' ] %}
    # Content Security Policy
    "csp": "default-src 'self'",

    # CSS: Cascading Style Sheets
    "css": [ "trident", "blockquote", "code", "crumbs", "diff", "form", "loader", "messages", "search", "table", "wiki" ],

    # Javascript: global Javascript for every page
    # (Should actually always be empty)
    "javascript": [],

    # X-Forwarded-For Trusted IP list
    # CIDR prefixes from which we trust the XFF header
    "xff_trusted_cidr": [ "127.0.0.1/8" ],

    # Weak Password Dictionaries
    "pw_weakdicts": [ "10k_most_common.txt" ],
{% endif %}

{% if trident.version in [ '1.3.8' ] %}
    #########################################
    # PostgreSQL Database details
    #########################################
    # PSQL local unix socket
    # Uses PSQL peer authentication
    # This works out of the box on Debian
    #########################################
{% endif %}
{% if trident.version in [ '1.4.2' ] %}
    #########################################
    # PostgreSQL Database details
    #########################################
    # Requires configuration of pg_hba.conf!
    #
    # local unix socket (Debian):
    #   "db_host": "/var/run/postgresql/",
    #   "db_port": "5432",
    #
    # remote:
    #   "db_host": "db.example.org",
    #   "db_port": "5432",
    #########################################
{% endif %}

Emails and other non-official documentation

  • Email from Linda in response to Megan asking for any additional documentation.
To: Megan Boggess <mboggess@uw.edu>
From: Linda Parsons <linda.parsons@nextcentury.com>
Date: April 13, 2016
Subject: Trident emails and any other documentation

Hi Megan,

Yes, the new project is fun, and I hope things are going well for you too... 
There isn't any documentation on Trident other than what they provide at 
trident.li and on their github pages - have Dave get you access to their repo. 
I relied on that documentation to do all the Docker and Ansible stuff.


The README in the dims-dockerfiles repo is the one that describes what I did. 
I may have comments in Ansible files as well that are descriptive - I don't 
have access to the code at the moment. I had the deployment done (or at least 
a working version to get you started) from build through Ansible deployment of 
two docker containers... but there is still work to be done and you will need 
to make the Ansible deployment fit with how you guys are doing things now.

the Postgresql container, and one to actually create the .deb build files to 
install Trident. The "build-trident" (or "trident-build" - not sure but it has 
"build" in the name of the directory) has a script that will pull the current 
source in our git repo (which in turn is from their trident repo - someone needs
to keep that synchronized) and will create the .deb files and push them to our 
sources repo. That is so the actual Docker images can be created using them.  
I made a change to the file that controls the packaging so that it didn't require 
additional software like nginx, postfix, etc. - this is better for docker since 
we may not want all the services on all the containers that need this software. 
For example, to create the database on the postgresql container, you need trident 
installed as well just so you can run their scripts.  Anyway, the .deb packages 
don't force the user to install those services, but of course you will install 
them if you need them. So, I've got nginx and trident on the main trident image. 
The one thing that needs to be done is to also install and configure postfix on 
that image. I had been hoping we could use a separate docker container for that, 
but it would require changes to their source code. So you will need to modify that
Dockerfile to install and configure postfix.

Maybe you could look through the dims-dockerfile stuff and the Ansible playbooks 
and then get back to me if you have questions. I could do a quick hangout to answer 
them.  Also note there are two docker images for the postgresql container - one for 
the default one that is installed in a new environment, and one to install a copy of 
our ops-trust database. The second was used to get the trident system up and running 
on hub.prisem.washington.edu so we could use it and have the Dashboard be able to 
get data from that database. It was also necessary at the time since there apparently 
is a bug in a new install and the sysadmin can't create trust groups from within the 
UI (I have an issue in github for that but no one has responded). However, it cannot 
be used for new systems.

Another thing that needs to be worked out is how to do the certificates for the 
machine running the trident docker containers. Also, if you look at the Ansible 
playbooks, there are commands to start the containers in a development mode and in 
secure (production) mode.  We are currently using development mode since we don't have 
the certs - production mode for the docker containers hasn't been tested.

I don't really have any emails to the trident guys... we had talked about emailing 
Vixie about the bug I mentioned above but I had to leave before that was done. 
I'm not sure why they haven't responded to the bug report on github.  Anyway, what 
I knew was from reading through their docs many times and also from what I knew about 
Postgres databases in general, and then from actually building the system. So I think 
from reading the Dockerfiles and the Ansible playbooks you will get a good brain dump.

You should be able to build and deploy the trident system locally as long as you 
have a VM to install it on and a consul cluster running as well (need the consul 
stuff so the docker containers can talk to each other on the overlay network). 
Its better to use just the regular postgres-trident docker container for postgres 
(which creates a new database) - then you'll see the bug I mentioned. It is 
imperitive that they fix that or let us know what we're doing wrong if anything 
(I posted a log to the github issue that shows the database errors that are 
being produced). It will also allow you to be able to test adding postfix to the mix.

Last I looked to they had not fixed the firewall issue that was preventing us from 
accessing the old ops-trust machines - not sure if that has been fixed yet.

Linda
  • There is an Ansible role called trident-docker-deploy located in $GIT/ansible-playbooks/roles. This roles creates a volume container to be paired with a DIMS postgres container (if it doesn’t already exist), and a DIMS postgres container and DIMS Trident container.

    The Dockerfiles and related files and scripts for these containers can be viewed at:

    • Postgres: $GIT/dims-dockerfiles/dockerfiles/postgres-trident
    • Trident: $GIT/dims-dockerfiles/dockerfiles/trident
  • Additionally, Linda created a couple “helper” containers. One container updates source.prisem.washington.edu and another builds off the “fresh-install” DIMS postgres container to install a copy of the DIMS OPS-Trust database.

    These can be viewed at:

    • Build: $GIT/dims-dockerfiles/dockerfiles/trident-build
    • Original Database: $GIT/dims-dockerfiles/dockerfiles/postgres-trident-clone

AMQP and RabbitMQ

This chapter covers configuration and debugging of RabbitMQ, a popular AMQP message bus service.

RabbitMQ use in DIMS

AMQP (specifically RabbitMQ) is discussed in Sections DIMS architectural design and System Software Architecture of DIMS Architecture Design v 2.10.0, and the specifics of the server initially configured for use in DIMS is documented in Section dimsasbuilt:rabbitmq of dimsasbuilt:dimsasbuilt. Its use for processing logs within DIMS is discussed in Section dimsparselogs:introtologparsing of dimsparselogs:parsinglogswithdims.

Attention

While RabbitMQ is documented extensively on their web site, it is sometimes hard to interpret what it says. Another very useful resource is Chapter 8: Administering RabbitMQ from the Web from RabbitMQ in Action: Distributed messaging for everyone, by Alvaro Videla and Jason J. W. Williams.

Basic Service Administration

RabbitMQ is started/stopped/restarted/queried for status just like any other Ubuntu service using the service command as root. Its configuration files and settings are found in /etc/rabbitmq and /etc/default/rabbitmq-server, and its log files in /var/log/rabbitmq/.

root@rabbitmq:~# cd /etc/rabbitmq
root@rabbitmq:/etc/rabbitmq# tree
.
├── enabled_plugins
├── rabbitmq.config
├── rabbitmq.conf.d
└── rabbitmq-env.conf

1 directory, 3 files
root@rabbitmq:/etc/rabbitmq# cat rabbitmq.config
[
{kernel,
[{inet_dist_listen_min, 45000},
{inet_dist_listen_max, 45000}
]
}
].
root@rabbitmq:/var/log/rabbitmq# cat /etc/default/rabbitmq-server
ulimit -n 1024

Note

The ulimit setting here controls the number of open file handles a process can have. A server with lots of connections needs a higher limit than the default, hence this setting. See [rabbitmq-discuss] Increasing the file descriptors limit and mozilla/opsec-puppet and Increase RabbitMQ file descriptor limit and memory watermark without restart.

root@b52:/etc/rabbitmq# rabbitmqctl status | grep -A 4 file_descriptors
 {file_descriptors,
      [{total_limit,924},{total_used,3},{sockets_limit,829},{sockets_used,1}]},
 {processes,[{limit,1048576},{used,200}]},
 {run_queue,0},
 {uptime,82858}]
root@rabbitmq:/etc/rabbitmq# cd /var/log/rabbitmq
root@rabbitmq:/var/log/rabbitmq# tree
.
├── rabbit@rabbitmq.log
├── rabbit@rabbitmq-sasl.log
├── shutdown_log
└── startup_log

0 directories, 4 files

Managing RabbitMQ

RabbitMQ can be administered in two ways: (1) manually, using the built-in web interface, or (2) using command line tools like rabbitmqctl and rabbitmqadmin.

To get access to the management interface, you must enabled rabbitmq_management in the RabbitMQ configuration:

 root@rabbitmq:/etc/rabbitmq# cat rabbitmq-env.conf
 #RABBITMQ_NODE_IP_ADDRESS=10.142.29.170
 RABBITMQ_NODE_PORT=5672
 RABBITMQ_SERVER_START_ARGS="-rabbitmq_management listener [{port,15672}]"

 # Source other environment files (that include ONLY variable settings,
 # not RabbitMQ configuration
 for ENVFILE in `ls /etc/rabbitmq/rabbitmq.conf.d |sort -r`; do
     . /etc/rabbitmq/rabbitmq.conf.d/$ENVFILE
     done

Once you do this, and restart the server, two things become available. The first is a web interface, and the second is access to a downloadable (from the RabbitMQ server itself) script named rabbitmqadmin.

Using the web interface

You can see the web management interface in Figure RabbitMQ Mangement Interface Login Screen and Figure Figure RabbitMQ Mangement Interface Home Screen.

_images/rabbitmq-management-login.png

RabbitMQ Mangement Interface Login Screen

_images/rabbitmq-management-home.png

RabbitMQ Mangement Interface Home Screen

Using the command line

The RabbitMQ service daemons are started like any other service on Ubuntu 14.04.

root@b52:~# service rabbitmq-server restart
 * Restarting message broker rabbitmq-server
   ...done.

There are multiple ways with Linux to discover the listening port number. You can identify the process names with ps or pstree to map to output of netstat, use lsof, and the epmd command:

root@b52:~# pstree -p | less
init(1)-+- ...
        |-lightdm(2599)-+-Xorg(2648)
        |  ...
        |               |-lightdm(3363)-+-init(4946)-+-at-spi-bus-laun(5140)-+-dbus-daemon(5144)
        |               |               |            |-rabbitmq-server(19303)---beam.smp(19311)-+-inet_gethost(19492)---inet_gethos+
        |               |               |            |                                          |-{beam.smp}(19408)
        |               |               |            |                                          |-{beam.smp}(19409)
        |               |               |            |                                          | ...
        |               |               |            |                                          |-{beam.smp}(19451)
        |               |               |            |                                          `-{beam.smp}(19452)
        | ...
root@b52:~# netstat -pan | grep beam
tcp        0      0 0.0.0.0:45000           0.0.0.0:*               LISTEN      19311/beam.smp
tcp        0      0 127.0.0.1:51156         127.0.0.1:4369          ESTABLISHED 19311/beam.smp
tcp6       0      0 :::5672                 :::*                    LISTEN      19311/beam.smp
root@b52:~# lsof -i | grep beam
beam.smp  19311        rabbitmq    8u  IPv4 27589259      0t0  TCP *:45000 (LISTEN)
beam.smp  19311        rabbitmq    9u  IPv4 27589261      0t0  TCP localhost:51156->localhost:epmd (ESTABLISHED)
beam.smp  19311        rabbitmq   16u  IPv6 27580219      0t0  TCP *:amqp (LISTEN)
root@b52:~# epmd -names
epmd: up and running on port 4369 with data:
name rabbit at port 45000

There are two ways of getting the exact same information on the runtime status of RabbitMQ. The first uses rabbitmqctl directly. The second uses service rabbitmq-server status. They are both shown here:

 root@rabbitmq:/etc/rabbitmq# rabbitmqctl status
 Status of node rabbit@rabbitmq ...
 [{pid,8815},
  {running_applications,
      [{rabbitmq_management,"RabbitMQ Management Console","0.0.0"},
       {rabbitmq_management_agent,"RabbitMQ Management Agent","0.0.0"},
       {amqp_client,"RabbitMQ AMQP Client","0.0.0"},
       {rabbit,"RabbitMQ","2.7.1"},
       {os_mon,"CPO  CXC 138 46","2.2.7"},
       {sasl,"SASL  CXC 138 11","2.1.10"},
       {rabbitmq_mochiweb,"RabbitMQ Mochiweb Embedding","0.0.0"},
       {webmachine,"webmachine","1.7.0-rmq0.0.0-hg"},
       {mochiweb,"MochiMedia Web Server","1.3-rmq0.0.0-git"},
       {inets,"INETS  CXC 138 49","5.7.1"},
       {mnesia,"MNESIA  CXC 138 12","4.5"},
       {stdlib,"ERTS  CXC 138 10","1.17.5"},
       {kernel,"ERTS  CXC 138 10","2.14.5"}]},
  {os,{unix,linux}},
  {erlang_version,
      "Erlang R14B04 (erts-5.8.5) [source] [64-bit] [smp:16:16] [rq:16] [async-threads:30] [kernel-poll:true]\n"},
  {memory,
      [{total,31080064},
       {processes,11445592},
       {processes_used,11433880},
       {system,19634472},
       {atom,1336577},
       {atom_used,1313624},
       {binary,117880},
       {code,14301212},
       {ets,1142776}]},
  {vm_memory_high_watermark,0.39999999996434304},
  {vm_memory_limit,6730807705}]
 ...done.
 root@rabbitmq:/etc/rabbitmq# service rabbitmq-server status
 Status of node rabbit@rabbitmq ...
 [{pid,8815},
  {running_applications,
      [{rabbitmq_management,"RabbitMQ Management Console","0.0.0"},
       {rabbitmq_management_agent,"RabbitMQ Management Agent","0.0.0"},
       {amqp_client,"RabbitMQ AMQP Client","0.0.0"},
       {rabbit,"RabbitMQ","2.7.1"},
       {os_mon,"CPO  CXC 138 46","2.2.7"},
       {sasl,"SASL  CXC 138 11","2.1.10"},
       {rabbitmq_mochiweb,"RabbitMQ Mochiweb Embedding","0.0.0"},
       {webmachine,"webmachine","1.7.0-rmq0.0.0-hg"},
       {mochiweb,"MochiMedia Web Server","1.3-rmq0.0.0-git"},
       {inets,"INETS  CXC 138 49","5.7.1"},
       {mnesia,"MNESIA  CXC 138 12","4.5"},
       {stdlib,"ERTS  CXC 138 10","1.17.5"},
       {kernel,"ERTS  CXC 138 10","2.14.5"}]},
  {os,{unix,linux}},
  {erlang_version,
      "Erlang R14B04 (erts-5.8.5) [source] [64-bit] [smp:16:16] [rq:16] [async-threads:30] [kernel-poll:true]\n"},
  {memory,
      [{total,31103832},
       {processes,11469280},
       {processes_used,11457568},
       {system,19634552},
       {atom,1336577},
       {atom_used,1313689},
       {binary,117880},
       {code,14301212},
       {ets,1142776}]},
  {vm_memory_high_watermark,0.39999999996434304},
  {vm_memory_limit,6730807705}]
 ...done.

The following shows how to get a copy of the rabbitmqadmin script and make it executable from the command line.

root@rabbitmq:/etc/rabbitmq# wget http://localhost:55672/cli/rabbitmqadmin
root@rabbitmq:/etc/rabbitmq# chmod +x rabbitmqadmin

Note

These steps should be done immediately after initial RabbitMQ installation when creating Ansible playbooks, the script turned into a Jinja2 template, and installed into the $PATH for direct access from the command line (as opposed to being run with a relative path after changing directory into the /etc/rabbitmq directory as shown here).

The rabbitmqadmin script has a help option that provides information on how to use it.

root@rabbitmq:/etc/rabbitmq# ./rabbitmqadmin help subcommands
Usage
=====
  rabbitmqadmin [options] subcommand

  where subcommand is one of:

Display
=======

  list users [<column>...]
  list vhosts [<column>...]
  list connections [<column>...]
  list exchanges [<column>...]
  list bindings [<column>...]
  list permissions [<column>...]
  list channels [<column>...]
  list parameters [<column>...]
  list queues [<column>...]
  list policies [<column>...]
  list nodes [<column>...]
  show overview [<column>...]

Object Manipulation
===================

  declare queue name=... [node=... auto_delete=... durable=... arguments=...]
  declare vhost name=... [tracing=...]
  declare user name=... password=... tags=...
  declare exchange name=... type=... [auto_delete=... internal=... durable=... arguments=...]
  declare policy name=... pattern=... definition=... [priority=... apply-to=...]
  declare parameter component=... name=... value=...
  declare permission vhost=... user=... configure=... write=... read=...
  declare binding source=... destination=... [arguments=... routing_key=... destination_type=...]
  delete queue name=...
  delete vhost name=...
  delete user name=...
  delete exchange name=...
  delete policy name=...
  delete parameter component=... name=...
  delete permission vhost=... user=...
  delete binding source=... destination_type=... destination=... properties_key=...
  close connection name=...
  purge queue name=...

Broker Definitions
==================

  export <file>
  import <file>

Publishing and Consuming
========================

  publish routing_key=... [payload=... payload_encoding=... exchange=...]
  get queue=... [count=... requeue=... payload_file=... encoding=...]

  * If payload is not specified on publish, standard input is used

  * If payload_file is not specified on get, the payload will be shown on
    standard output along with the message metadata

  * If payload_file is specified on get, count must not be set

Here rabbitmqadmin is used to get a list of the currently defined exchanges:

root@rabbitmq:/etc/rabbitmq# ./rabbitmqadmin list exchanges
+-------+--------------------+---------+-------------+---------+----------+
| vhost |        name        |  type   | auto_delete | durable | internal |
+-------+--------------------+---------+-------------+---------+----------+
| /     |                    | direct  | False       | True    | False    |
| /     | amq.direct         | direct  | False       | True    | False    |
| /     | amq.fanout         | fanout  | False       | True    | False    |
| /     | amq.headers        | headers | False       | True    | False    |
| /     | amq.match          | headers | False       | True    | False    |
| /     | amq.rabbitmq.log   | topic   | False       | True    | False    |
| /     | amq.rabbitmq.trace | topic   | False       | True    | False    |
| /     | amq.topic          | topic   | False       | True    | False    |
| /     | devops             | fanout  | False       | True    | False    |
| /     | log_task           | direct  | False       | True    | False    |
| /     | logs               | fanout  | False       | False   | False    |
+-------+--------------------+---------+-------------+---------+----------+

We can now define a new fanout exchange where we can direct log messages for later processing using rabbitmqadmin, rather than the web interface:

root@rabbitmq:/etc/rabbitmq# ./rabbitmqadmin declare exchange name=health type=fanout auto_delete=false durable=true internal=false
exchange declared
root@rabbitmq:/etc/rabbitmq# ./rabbitmqadmin list exchanges
+-------+--------------------+---------+-------------+---------+----------+
| vhost |        name        |  type   | auto_delete | durable | internal |
+-------+--------------------+---------+-------------+---------+----------+
| /     |                    | direct  | False       | True    | False    |
| /     | amq.direct         | direct  | False       | True    | False    |
| /     | amq.fanout         | fanout  | False       | True    | False    |
| /     | amq.headers        | headers | False       | True    | False    |
| /     | amq.match          | headers | False       | True    | False    |
| /     | amq.rabbitmq.log   | topic   | False       | True    | False    |
| /     | amq.rabbitmq.trace | topic   | False       | True    | False    |
| /     | amq.topic          | topic   | False       | True    | False    |
| /     | devops             | fanout  | False       | True    | False    |
| /     | health             | fanout  | False       | True    | False    |
| /     | log_task           | direct  | False       | True    | False    |
| /     | logs               | fanout  | False       | False   | False    |
+-------+--------------------+---------+-------------+---------+----------+

After creating all of the broker objects we wish to have in the default server (using either the web interface and/or rabbitmqadmin) you can export a JSON file that can be put under Ansible control for later import into a newly instantiated RabbitMQ server. (See Loading rabbitmq config at startup.)

Caution

There are passwords in this output (which are redacted here). Keep this file secure and do not put it in a public source repository without encryption or templating (e.g., with Jinja2).

root@rabbitmq:/etc/rabbitmq# ./rabbitmqadmin export broker-objects.json
Exported definitions for localhost to "broker-objects.json"
root@rabbitmq:/etc/rabbitmq# python -m json.tool broker-objects.json
{
    "bindings": [
        {
            "arguments": {},
            "destination": "log_task",
            "destination_type": "queue",
            "routing_key": "log_task",
            "source": "log_task",
            "vhost": "/"
        },
        {
            "arguments": {},
            "destination": "log_test_queue",
            "destination_type": "queue",
            "routing_key": "",
            "source": "test_exchange",
            "vhost": "/"
        },
        {
            "arguments": {},
            "destination": "taskqueue",
            "destination_type": "queue",
            "routing_key": "",
            "source": "test_exchange",
            "vhost": "/"
        },
        {
            "arguments": {},
            "destination": "test_exchange",
            "destination_type": "queue",
            "routing_key": "test_exchange",
            "source": "test_exchange",
            "vhost": "/"
        }
    ],
    "exchanges": [
        {
            "arguments": {},
            "auto_delete": false,
            "durable": true,
            "internal": false,
            "name": "test_exchange",
            "type": "direct",
            "vhost": "/"
        },
        {
            "arguments": {},
            "auto_delete": false,
            "durable": true,
            "internal": false,
            "name": "devops",
            "type": "fanout",
            "vhost": "/"
        },
        {
            "arguments": {},
            "auto_delete": false,
            "durable": true,
            "internal": false,
            "name": "test",
            "type": "fanout",
            "vhost": "/"
        },
        {
            "arguments": {},
            "auto_delete": false,
            "durable": true,
            "internal": false,
            "name": "health",
            "type": "fanout",
            "vhost": "/"
        },
        {
            "arguments": {},
            "auto_delete": false,
            "durable": false,
            "internal": false,
            "name": "logs",
            "type": "fanout",
            "vhost": "/"
        },
        {
            "arguments": {},
            "auto_delete": false,
            "durable": true,
            "internal": false,
            "name": "log_task",
            "type": "direct",
            "vhost": "/"
        }
    ],
    "permissions": [
        {
            "configure": ".*",
            "read": ".*",
            "user": "rpc_user",
            "vhost": "/",
            "write": ".*"
        },
        {
            "configure": ".*",
            "read": ".*",
            "user": "logmatrix",
            "vhost": "/",
            "write": ".*"
        },
        {
            "configure": ".*",
            "read": ".*",
            "user": "hutchman",
            "vhost": "/",
            "write": ".*"
        }
    ],
    "queues": [
        {
            "arguments": {},
            "auto_delete": false,
            "durable": false,
            "name": "crosscor_test_0.5.5",
            "vhost": "/"
        },
        {
            "arguments": {},
            "auto_delete": false,
            "durable": true,
            "name": "taskqueue",
            "vhost": "/"
        },
        {
            "arguments": {},
            "auto_delete": false,
            "durable": false,
            "name": "cifbulk_v1_0.5.5",
            "vhost": "/"
        },
        {
            "arguments": {},
            "auto_delete": false,
            "durable": true,
            "name": "test_exchange",
            "vhost": "/"
        },
        {
            "arguments": {},
            "auto_delete": false,
            "durable": false,
            "name": "anon_0.5.5",
            "vhost": "/"
        },
        {
            "arguments": {},
            "auto_delete": false,
            "durable": true,
            "name": "log_task",
            "vhost": "/"
        },
        {
            "arguments": {},
            "auto_delete": false,
            "durable": false,
            "name": "cifbulk_v1_test_0.5.5",
            "vhost": "/"
        },
        {
            "arguments": {},
            "auto_delete": false,
            "durable": false,
            "name": "crosscor_0.5.5",
            "vhost": "/"
        },
        {
            "arguments": {},
            "auto_delete": false,
            "durable": true,
            "name": "log_queue_test",
            "vhost": "/"
        },
        {
            "arguments": {},
            "auto_delete": false,
            "durable": true,
            "name": "log_test_queue",
            "vhost": "/"
        },
        {
            "arguments": {},
            "auto_delete": false,
            "durable": false,
            "name": "anon_test_0.5.5",
            "vhost": "/"
        }
    ],
    "rabbit_version": "2.7.1",
    "users": [
        {
            "name": "hutchman",
            "password_hash": "REDACTED",
            "tags": "administrator"
        },
        {
            "name": "logmatrix",
            "password_hash": "REDACTED",
            "tags": "administrator"
        },
        {
            "name": "rpc_user",
            "password_hash": "REDACTED",
            "tags": ""
        }
    ],
    "vhosts": [
        {
            "name": "/"
        }
    ]
}

Management with Ansible playbooks

RaspberryPi and Docker

This chapter covers installing and configuring Docker on a RaspberryPi 2 for prototyping Docker container microservices and supporting DIMS deployment using PXE boot support.

Installing HypriotOS w/Docker

Note

The Raspberry Pi uses a micro SD card to hold the operating system it will boot. To run any operating system, you must first create a bootable micro SD card. You can find many pages with instructions on How to Flash an SD Card for Raspberry Pi. This section uses one such set of instructions for a ARM-based Linux distribution with Docker installed on it.

The folks at Hypriot have instructions for Getting started with Docker on your Raspberry Pi, that step through the process of install one of their pre-configured SD card images to your Raspberry Pi. Mac users can take advantage of a command-line script to flash the SD card image on GitHub in the repo hypriot/flash.

[dimsenv] dittrich@27b:~/git () $ git clone https://github.com/hypriot/flash.git
Cloning into 'flash'...
remote: Counting objects: 100, done.
remote: Total 100 (delta 0), reused 0 (delta 0), pack-reused 100
Receiving objects: 100% (100/100), 25.54 KiB | 0 bytes/s, done.
Resolving deltas: 100% (42/42), done.
Checking connectivity... done.
[dimsenv] dittrich@27b:~/git () $ git checkout -b dims
[dimsenv] dittrich@27b:~/git (dims) $ cd flash
[dimsenv] dittrich@27b:~/git/flash (dims) $ ls
AUTHORS         Darwin          LICENSE         Linux           README.md
[dimsenv] dittrich@27b:~/git/flash (dims) $ tree
.
├── AUTHORS
├── Darwin
│   └── flash
├── LICENSE
├── Linux
│   └── flash
└── README.md

2 directories, 5 files
[dimsenv] dittrich@27b:~/git/flash (dims) $ cd Darwin
[dimsenv] dittrich@27b:~/git/flash/Darwin (dims) $ brew install pv
==> Downloading https://homebrew.bintray.com/bottles/pv-1.6.0.yosemite.bottle.1.tar.gz
brew install awscli/usr/bin/curl -fLA Homebrew 0.9.5 (Ruby 2.0.0-481; OS X 10.10.5) https://homebrew.bintray.com/bottles/pv-1.6.0.yosemite.bottle.1.tar.gz -C 0 -o /Library/Caches/Homebrew/p
v-1.6.0.yosemite.bottle.1.tar.gz.incomplete
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 34692  100 34692    0     0  10668      0  0:00:03  0:00:03 --:--:-- 10671
==> Verifying pv-1.6.0.yosemite.bottle.1.tar.gz checksum
==> Pouring pv-1.6.0.yosemite.bottle.1.tar.gz
tar xf /Library/Caches/Homebrew/pv-1.6.0.yosemite.bottle.1.tar.gz
==> Finishing up
ln -s ../Cellar/pv/1.6.0/bin/pv pv
ln -s ../../../Cellar/pv/1.6.0/share/man/man1/pv.1 pv.1
==> Summary
🍺  /usr/local/Cellar/pv/1.6.0: 4 files, 84K

If you need to enable wireless, create an occidentalis.txt file with the SSID and password for connecting to your wireless access point. PXE boot over ethernet will use the wired interface, but you may want to enable wireless for remote management of the Raspberry Pi.

[dimsenv] dittrich@27b:~/git/flash/Darwin (dims) $ vi occidentalis.txt
# hostname for your Hypriot Raspberry Pi:
hostname=dims-rpi

# basic wireless networking options:
wifi_ssid=REDACTED
wifi_password=REDACTED

Note

The instructions below assume that you have created an occidentalis.txt file. Remove that from the command line if you did not create one.

Insert a micro SD card into one of the memory slots and run the flash script, referencing the most recent version of the hypriot-rpi image file from the SD card images page.

[dimsenv] dittrich@27b:~/git/flash/Darwin (dims*) $ ./flash -c occidentalis.txt http://downloads.hypriot.com/hypriot-rpi-20151004-132414.img.zip

Downloading http://downloads.hypriot.com/hypriot-rpi-20151004-132414.img.zip ...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  449M  100  449M    0     0  3025k      0  0:02:32  0:02:32 --:--:--  118k
Uncompressing /tmp/image.img.zip ...
Archive:  /tmp/image.img.zip
  inflating: /tmp/hypriot-rpi-20151004-132414.img
Use /tmp/hypriot-rpi-20151004-132414.img
Filesystem    512-blocks      Used Available Capacity   iused   ifree %iused  Mounted on
/dev/disk1     974749472 905546856  68690616    93% 113257355 8586327   93%   /
devfs                686       686         0   100%      1188       0  100%   /dev
map -hosts             0         0         0   100%         0       0  100%   /net
map auto_home          0         0         0   100%         0       0  100%   /home
/dev/disk2s2    15328216   5154552  10173664    34%    644317 1271708   34%   /Users/dittrich/dims/git
/dev/disk3s1      130780     47284     83496    37%       512       0  100%   /Volumes/NO NAME

Is /dev/disk3s1 correct? y
Unmounting disk3 ...
Unmount of all volumes on disk3 was successful
Unmount of all volumes on disk3 was successful
Flashing /tmp/hypriot-rpi-20151004-132414.img to disk3 ...
Password:
 1.4GiB 0:03:45 [6.34MiB/s] [=====================================================================================================================================================>] 100%

dd: /dev/rdisk3: Invalid argument
0+22889 records in
0+22888 records out
1499987968 bytes transferred in 225.533768 secs (6650835 bytes/sec)
Copying occidentalis.txt to /Volumes/NO NAME/occidentalis.txt ...
Unmounting and ejecting disk3 ...
Unmount of all volumes on disk3 was successful
Unmount of all volumes on disk3 was successful
Disk /dev/disk3 ejected
🍺  Finished.

Insert the SD card into the Raspberry Pi and power it on. It will use DHCP to get an IP address, so these instructions require that you find the system on the network. (In this case, the IP address was identified to be 192.168.0.104.)

Copy your SSH key to the Raspberry Pi for remote SSH access.

[dimsenv] dittrich@27b:~/git/flash/Darwin (dims*) $ ssh-copy-id -i ~/.ssh/dims_dittrich_rsa.pub root@192.168.0.104

/opt/local/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/opt/local/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@192.168.0.104's password:

Number of key(s) added:        1

Now try logging into the machine, with:   "ssh 'root@192.168.0.104'"
and check to make sure that only the key(s) you wanted were added.

Since this is the first boot, now is a good time to update the operating system.

[dimsenv] dittrich@27b:~ () $ slogin -i ~/.ssh/dims_dittrich_rsa root@192.168.0.104
Linux dims-rpi 3.18.11-hypriotos-v7+ #2 SMP PREEMPT Sun Apr 12 16:34:20 UTC 2015 armv7l

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Sat Oct 31 06:24:35 2015 from 192.168.0.5
HypriotOS: root@dims-rpi in ~
$ apt-get update
Get:1 http://mirrordirector.raspbian.org wheezy Release.gpg [490 B]
Get:2 http://mirrordirector.raspbian.org wheezy Release [14.4 kB]
...
HypriotOS: root@dims-rpi in ~
$ aptitude safe-upgrade
The following packages will be upgraded:
  bind9-host curl dpkg libbind9-80 libcurl3 libcurl3-gnutls libdns88 libexpat1 libisc84 libisccc80 libisccfg82 liblwres80 libsqlite3-0 libssl1.0.0 openssl sudo tzdata wpasupplicant
18 packages upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
Need to get 8,700 kB of archives. After unpacking 957 kB will be freed.
Do you want to continue? [Y/n/?] y
Get: 1 http://mirrordirector.raspbian.org/raspbian/ wheezy/main dpkg armhf 1.16.16+rpi1 [2,599 kB]
...
Setting up sudo (1.8.5p2-1+nmu3) ...
Setting up wpasupplicant (1.0-3+deb7u2) ...

Current status: 0 updates [-18].

If you are not in central Europe, you may want to also set the time zone.

HypriotOS: root@dims-rpi in ~
$ dpkg-reconfigure tzdata

Current default time zone: 'US/Pacific-New'
Local time is now:      Fri Oct 30 22:29:49 PDT 2015.
Universal Time is now:  Sat Oct 31 05:29:49 UTC 2015.

Installing a Persistent Docker Container

The Hypriot web page shows how to download and run a Docker container to serve a web page to prove the Raspberry Pi is online and working. As soon as you reboot the Raspberry Pi, the container will stop and you will have to log in and manually re-run it.

The container can be made persistent across reboots using supervisord, which is demonstrated in this section.

Install and Test the Container

Start by running the Docker container as described in Getting started with Docker on your Raspberry Pi, to make sure it can run standalone and that you can connect to it over the network.

HypriotOS: root@dims-rpi in ~
$ docker run -d -p 80:80 hypriot/rpi-busybox-httpd
Unable to find image 'hypriot/rpi-busybox-httpd:latest' locally
latest: Pulling from hypriot/rpi-busybox-httpd
78666be98989: Pull complete
65c121b6f9de: Pull complete
4674ad400a98: Pull complete
d0cb6fa4fa79: Pull complete
Digest: sha256:c00342f952d97628bf5dda457d3b409c37df687c859df82b9424f61264f54cd1
Status: Downloaded newer image for hypriot/rpi-busybox-httpd:latest
e0131b218070ef8a0c82a8bde07b749a4d3e3b4fb7ca15930e3148c1252dee1d
HypriotOS: root@dims-rpi in ~
$ docker ps
CONTAINER ID        IMAGE                              COMMAND                CREATED             STATUS              PORTS                NAMES
e0131b218070        hypriot/rpi-busybox-httpd:latest   "/bin/busybox httpd    7 seconds ago       Up 6 seconds        0.0.0.0:80->80/tcp   admiring_heisenberg

Validate the port (in this case, tcp6/80 is bound) are now actively listening.

HypriotOS: root@dims-rpi in ~
$ netstat -pan
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      2105/sshd
tcp        0    184 192.168.0.104:22        192.168.0.5:61271       ESTABLISHED 1518/sshd: root [priv
tcp6       0      0 :::80                   :::*                    LISTEN      11430/docker-proxy
tcp6       0      0 :::22                   :::*                    LISTEN      763/sshd
udp        0      0 0.0.0.0:7712            0.0.0.0:*                           1951/dhclient
udp        0      0 0.0.0.0:68              0.0.0.0:*                           1951/dhclient
udp        0      0 172.17.42.1:123         0.0.0.0:*                           1717/ntpd
udp        0      0 192.168.0.104:123       0.0.0.0:*                           1717/ntpd
udp        0      0 127.0.0.1:123           0.0.0.0:*                           1717/ntpd
udp        0      0 0.0.0.0:123             0.0.0.0:*                           1717/ntpd
udp        0      0 0.0.0.0:5353            0.0.0.0:*                           1822/avahi-daemon:
udp        0      0 0.0.0.0:42246           0.0.0.0:*                           1822/avahi-daemon:
...

If you can connect to the server, you will see Hypriot’s page:

Hypriot test page

Hypriot test page

Install and Test Supervisor

Now install the supervisor package.

HypriotOS: root@dims-rpi in ~
$ apt-get install supervisor
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following extra packages will be installed:
  file libmagic1 mime-support python python-medusa python-meld3 python-minimal python-pkg-resources python-support python2.7 python2.7-minimal
Suggested packages:
  python-doc python-tk python-medusa-doc python-distribute python-distribute-doc python2.7-doc binfmt-support
The following NEW packages will be installed:
  file libmagic1 mime-support python python-medusa python-meld3 python-minimal python-pkg-resources python-support python2.7 python2.7-minimal supervisor
0 upgraded, 12 newly installed, 0 to remove and 0 not upgraded.
Need to get 5,273 kB of archives.
After this operation, 19.2 MB of additional disk space will be used.
Do you want to continue [Y/n]? y
Get:1 http://mirrordirector.raspbian.org/raspbian/ wheezy/main libmagic1 armhf 5.11-2+deb7u8 [201 kB]
Get:2 http://mirrordirector.raspbian.org/raspbian/ wheezy/main file armhf 5.11-2+deb7u8 [53.1 kB]
...
Setting up python-meld3 (0.6.5-3.1) ...
Setting up supervisor (3.0a8-1.1+deb7u1) ...
Starting supervisor: supervisord.
Processing triggers for python-support ...

Verify that it is running.

HypriotOS: root@dims-rpi in ~
$ service supervisor status
supervisord is running

We will now configure the persistence mechanism (i.e., supervisord configuration file) that will employ an abstraction mechanism in the form of a script to actually start the container. Here is what the run script looks like:

HypriotOS: root@dims-rpi in ~
$ cat rpi-busybox-httpd.run
#!/bin/bash

NAME=${1:-rpi-busybox-httpd}

# Remove any stopped container with the specified name.
/usr/bin/docker rm $NAME 2>/dev/null

# Run the container with the specified name.
/usr/bin/docker run \
        -a stdout \
        --rm \
        --name $NAME \
        -p 80:80 \
        hypriot/rpi-busybox-httpd

The run script is then referenced in the supervisord configuration file that is placed into the conf.d directory along with any other configuration files that supervisord will manage. The command line is very simple.

HypriotOS: root@dims-rpi in ~
$ cat /etc/supervisor/conf.d/rpi-busybox-httpd.conf
[program:rpi-busybox-httpd]
command=/root/rpi-busybox-httpd.run "%(program_name)s_%(process_num)02d"
autostart=true
autorestart=true
startretries=100
numprocs=1
process_name=%(program_name)s_%(process_num)02d
user=root
env=HOSTNAME="dims-rpi",SHELL="/bin/bash",USER="root",PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",LANG="en_US"

Make sure that supervisord can restart with this configuration file in place, and that port tcp6/80 is still listening.

HypriotOS: root@dims-rpi in ~
$ service supervisor restart
Restarting supervisor: supervisord.
HypriotOS: root@dims-rpi in ~
$ netstat -pan --inet
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      2105/sshd
tcp        0    184 192.168.0.104:22        192.168.0.5:61271       ESTABLISHED 2116/0
udp        0      0 0.0.0.0:7712            0.0.0.0:*                           1951/dhclient
udp        0      0 0.0.0.0:68              0.0.0.0:*                           1951/dhclient
udp        0      0 172.17.42.1:123         0.0.0.0:*                           1717/ntpd
udp        0      0 192.168.0.104:123       0.0.0.0:*                           1717/ntpd
udp        0      0 127.0.0.1:123           0.0.0.0:*                           1717/ntpd
udp        0      0 0.0.0.0:123             0.0.0.0:*                           1717/ntpd
udp        0      0 0.0.0.0:5353            0.0.0.0:*                           1822/avahi-daemon:
udp        0      0 0.0.0.0:42246           0.0.0.0:*                           1822/avahi-daemon:
HypriotOS: root@dims-rpi in ~
$ docker ps
CONTAINER ID        IMAGE                              COMMAND                CREATED             STATUS              PORTS                NAMES
53d51a7f1c17        hypriot/rpi-busybox-httpd:latest   "/bin/busybox httpd    12 seconds ago      Up 11 seconds       0.0.0.0:80->80/tcp   rpi-busybox-httpd_00

Test the server remotely by loading the URL http://192.168.0.105 from a browser on the same subnet to confirm the Hypriot test page seen in Figure Hypriot test page is still being served.

Now, reboot the Raspeberry Pi to make sure that supervisord starts the container at boot time.

HypriotOS: root@dims-rpi in ~
$ /sbin/shutdown -r now

Broadcast message from root@dims-rpi (pts/0) (Sat Oct 31 18:06:08 2015):
The system is going down for reboot NOW!
HypriotOS: root@dims-rpi in ~
$ Connection to 192.168.0.104 closed by remote host.
Connection to 192.168.0.104 closed.

Log in remotely again and validate the container is running.

[dimsenv] dittrich@27b:~/git/homepage (develop*) $ !slo
slogin -i ~/.ssh/dims_dittrich_rsa root@192.168.0.104
Linux dims-rpi 3.18.11-hypriotos-v7+ #2 SMP PREEMPT Sun Apr 12 16:34:20 UTC 2015 armv7l

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Sat Oct 31 16:33:23 2015 from 192.168.0.5
HypriotOS: root@dims-rpi in ~
$ date
Sat Oct 31 18:07:25 PDT 2015
HypriotOS: root@dims-rpi in ~
$ docker ps
CONTAINER ID        IMAGE                              COMMAND                CREATED              STATUS              PORTS                NAMES
3a8b96428ab4        hypriot/rpi-busybox-httpd:latest   "/bin/busybox httpd    About a minute ago   Up About a minute   0.0.0.0:80->80/tcp   rpi-busybox-httpd_00

Lastly, load the URL http://192.168.0.105 one last time to confirm the Hypriot test page seen in Figure Hypriot test page is being served after the reboot.

You can also validate supervisord activity by checking its log files, which are placed by default in /var/log/supervisor:

HypriotOS: root@dims-rpi in ~
$ cd /var/log/supervisor
HypriotOS: root@dims-rpi in /var/log/supervisor
$ ls -l
total 12
-rw------- 1 root root    0 Nov  1 00:16 rpi-busybox-httpd_00-stderr---supervisor-d5okeu.log
-rw------- 1 root root   21 Nov  1 00:16 rpi-busybox-httpd_00-stdout---supervisor-dos6Dz.log
-rw-r--r-- 1 root root 7495 Nov  1 00:16 supervisord.log
HypriotOS: root@dims-rpi in /var/log/supervisor
$ cat rpi-busybox-httpd_00-stdout---supervisor-dos6Dz.log
rpi-busybox-httpd_00
HypriotOS: pi@dims-rpi in /var/log/supervisor
$ cat supervisord.log
2015-10-30 22:32:54,750 CRIT Supervisor running as root (no user in config file)
2015-10-30 22:32:54,947 INFO RPC interface 'supervisor' initialized
2015-10-30 22:32:54,947 WARN cElementTree not installed, using slower XML parser for XML-RPC
2015-10-30 22:32:54,948 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2015-10-30 22:32:54,951 INFO daemonizing the supervisord process
2015-10-30 22:32:54,954 INFO supervisord started with pid 4744
2015-10-31 02:17:12,001 CRIT Supervisor running as root (no user in config file)
2015-10-31 02:17:12,282 INFO RPC interface 'supervisor' initialized
2015-10-31 02:17:12,282 WARN cElementTree not installed, using slower XML parser for XML-RPC
2015-10-31 02:17:12,283 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2015-10-31 02:17:12,286 INFO daemonizing the supervisord process
2015-10-31 02:17:12,289 INFO supervisord started with pid 1873
2015-10-31 18:03:22,227 WARN received SIGTERM indicating exit request
2015-10-31 18:03:27,621 CRIT Supervisor running as root (no user in config file)
2015-10-31 18:03:27,621 WARN Included extra file "/etc/supervisor/conf.d/rpi-busybox-httpd.conf" during parsing
2015-10-31 18:03:27,815 INFO RPC interface 'supervisor' initialized
2015-10-31 18:03:27,816 WARN cElementTree not installed, using slower XML parser for XML-RPC
2015-10-31 18:03:27,816 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2015-10-31 18:03:27,819 INFO daemonizing the supervisord process
2015-10-31 18:03:27,822 INFO supervisord started with pid 2501
2015-10-31 18:03:28,829 INFO spawned: 'rpi-busybox-httpd_00' with pid 2505
2015-10-31 18:03:29,832 INFO success: rpi-busybox-httpd_00 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2015-10-31 18:06:09,939 WARN received SIGTERM indicating exit request
2015-10-31 18:06:09,943 INFO waiting for rpi-busybox-httpd_00 to die
2015-10-31 18:06:10,275 INFO stopped: rpi-busybox-httpd_00 (terminated by SIGTERM)
2015-10-31 18:06:10,277 WARN received SIGTERM indicating exit request
2015-10-31 18:06:18,801 CRIT Supervisor running as root (no user in config file)
2015-10-31 18:06:18,803 WARN Included extra file "/etc/supervisor/conf.d/rpi-busybox-httpd.conf" during parsing
2015-10-31 18:06:19,149 INFO RPC interface 'supervisor' initialized
2015-10-31 18:06:19,149 WARN cElementTree not installed, using slower XML parser for XML-RPC
2015-10-31 18:06:19,150 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2015-10-31 18:06:19,154 INFO daemonizing the supervisord process
2015-10-31 18:06:19,157 INFO supervisord started with pid 1894
2015-10-31 18:06:20,169 INFO spawned: 'rpi-busybox-httpd_00' with pid 2079
2015-10-31 18:06:21,537 INFO success: rpi-busybox-httpd_00 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

Caution

The above httpd container uses Busybox (presumably ash), and appears to possibly be ignoring any signals it is sent. A more robust container that traps signals and exits properly should be used (e.g., using nginx).

Extending to other Services

Extending supervisord control to other services is as simple as following the same steps as Section Installing a Persistent Docker Container with other run scripts and supervisord configuration files.




Docker Datacenter

This chapter documents email exchanges between DIMS team members and Docker engineers about setting up and evaluating Docker Datacenter.

Initial Inquiry

This section includes the pdf showing the basics of Docker Datacenter.

_images/GettingStarted4.png

Basics of Docker Datacenter pdf.

pdf

This pdf was sent along with the response to our initial inquiry to Docker about evaluating Docker Datcenter on 3/2/16.

_images/GettingStarted1.png

Image 1 of email.

_images/GettingStarted2.png

Image 2 of email

Jeremy also set up a call with other Docker engineers on 3/2/16.

_images/GettingStarted3.png

Email re: call with Docker engineers.

Docker Trusted Repository Issues

This section documents issues Megan was having when trying to set up a Docker Trusted Registry as part of a local Docker Datacenter.

_images/DTRissues.png

DTR issues.

Further Information

As more is learned about Docker Datacenter, particularly admin-related information, it will be documented here.

Managing Long-running Services

This chapter covers the process of keeping a service program alive across system reboots, using supervisord or Upstart. Regardless of which of these mechanisms is used, the concept is similar:

  • A program that provides a network service is supposed to be started when the system starts, and stopped when the system is brought down. This should be done cleanly, so that any required state is maintained across reboots.

  • If the program exits for any reason, this reason should be checked and acted upon such that the desired goal of having the service be available when you want it to be available is maintained. This means that when the service program exists with an unexpected return code, it is restarted.

    Note

    If the program is supposed to be turned off, and it exits with an expected “normal” exit code, it is left off until it is explicitly started again.

The supervisord program is much simpler than Upstart, but in some cases is sufficient to get the job done with a minimum of effort, and is much easier to debug. Upstart, on the other hand, is very complex and feature-rich, lending to more sophisticated capabilities (e.g., monitoring multiple hierarchical dependent services to control starting and stopping service daemons in complex inter-dependent situations). This flexibility comes at the cost of much more difficulty in designing, developing, and most importantly debugging these services and requires significantly greater system administration and programming experience to accomplish. The section on Upstart includes some techniques for debugging services.

Note

Section RaspberryPi and Docker covers this topic in the specific context of a prototype Docker containerized service using the HypriotOS on a RaspberryPI. This section covers the same material in the context of the primary operating system used by the DIMS project, Ubuntu.

Services using supervisord

Services using Upstart

By default, Upstart does not log very much. To see the logging level currently set, do:

$ sudo initctl log-priority
message

To increase the logging level, do:

$ sudo initctl log-priority info
$ sudo initctl log-priority
info

Now you can follow the system logs using sudo tail -f /var/log/syslog and watch events. In this case, we want to see all of the init events associated with restarting the OpenVPN tunnel (which is the pathway used by the Consul agents for communicating.)

To know which events are associated with the action we are about to cause, use the logger program to insert markers immediately before the restart is triggered. Then wait until it looks like the service is completely restarted before inserting another marker and then copying the log output.

Attention

Because service are stopped and started asynchronously in the background, the only marker that is easy to accurately set is the one immediately before the restart is triggered. If another && was added to insert a marker immediately after the sudo service openvpn restart command returned and the shell allowed the logger command to run, it would insert the marker in the middle of the actions going on in the background.

Be careful to keep this asynchrony in your mind and separate the act of the shell returning from the unrelated act of the service being restarted, or else you will not get the results you expect.

Additionally, on a busy system there may also be other events that show up in the log file between the logger command and the initiation of the restart action (and interspersed with the logs that are important for our purposes. You will need to carefully delete those log entries that are not important in order to minimize the “noise” of all the state transition messages from init.

$ logger -t DITTRICH -p local0.info "Restarting OpenVPN" && sudo service openvpn restart
* Stopping virtual private network daemon(s)...
*   Stopping VPN '01_prsm_dimsdemo1'
  ...done.
*   Stopping VPN '02_uwapl_dimsdemo1'
  ...done.
* Starting virtual private network daemon(s)...
*   Autostarting VPN '01_prsm_dimsdemo1'
*   Autostarting VPN '02_uwapl_dimsdemo1'
$ logger -t DITTRICH -p local0.info "Done"
Jun  4 20:07:16 dimsdemo1.node.consul DITTRICH: Restarting OpenVPN
Jun  4 20:07:16 dimsdemo1.node.consul ovpn-01_prsm_dimsdemo1[14113]: event_wait : Interrupted system call (code=4)
Jun  4 20:07:16 dimsdemo1.node.consul ovpn-01_prsm_dimsdemo1[14113]: /sbin/ip route del 10.142.29.0/24
Jun  4 20:07:16 dimsdemo1.node.consul ovpn-01_prsm_dimsdemo1[14113]: ERROR: Linux route delete command failed: external program exited with error status: 2
Jun  4 20:07:16 dimsdemo1.node.consul ovpn-01_prsm_dimsdemo1[14113]: Closing TUN/TAP interface
Jun  4 20:07:16 dimsdemo1.node.consul ovpn-01_prsm_dimsdemo1[14113]: /sbin/ip addr del dev tun0 10.86.86.4/24
Jun  4 20:07:16 dimsdemo1.node.consul ovpn-01_prsm_dimsdemo1[14113]: Linux ip addr del failed: external program exited with error status: 2
Jun  4 20:07:16 dimsdemo1.node.consul NetworkManager[1055]:    SCPlugin-Ifupdown: devices removed (path: /sys/devices/virtual/net/tun0, iface: tun0)
Jun  4 20:07:16 dimsdemo1.node.consul kernel: [58061.461020] init: Handling queues-device-removed event
Jun  4 20:07:16 dimsdemo1.node.consul kernel: [58061.461202] init: Handling queues-device-removed event
Jun  4 20:07:16 dimsdemo1.node.consul kernel: [58061.461321] init: Handling net-device-removed event
Jun  4 20:07:16 dimsdemo1.node.consul kernel: [58061.461372] init: network-interface (tun0) goal changed from start to stop
Jun  4 20:07:16 dimsdemo1.node.consul kernel: [58061.461400] init: network-interface (tun0) state changed from running to stopping
Jun  4 20:07:16 dimsdemo1.node.consul kernel: [58061.461449] init: Handling stopping event
Jun  4 20:07:16 dimsdemo1.node.consul kernel: [58061.461482] init: network-interface (tun0) state changed from stopping to killed
Jun  4 20:07:16 dimsdemo1.node.consul kernel: [58061.461517] init: network-interface (tun0) state changed from killed to post-stop
Jun  4 20:07:16 dimsdemo1.node.consul kernel: [58061.462204] init: network-interface (tun0) post-stop process (26911)
Jun  4 20:07:16 dimsdemo1.node.consul kernel: [58061.463454] init: network-interface (tun0) post-stop process (26911) exited normally
Jun  4 20:07:16 dimsdemo1.node.consul kernel: [58061.463512] init: network-interface (tun0) state changed from post-stop to waiting
Jun  4 20:07:16 dimsdemo1.node.consul kernel: [58061.463686] init: Handling stopped event
Jun  4 20:07:16 dimsdemo1.node.consul kernel: [58061.463772] init: startpar-bridge (network-interface-tun0-stopped) goal changed from stop to start
Jun  4 20:07:16 dimsdemo1.node.consul kernel: [58061.463807] init: startpar-bridge (network-interface-tun0-stopped) state changed from waiting to starting
Jun  4 20:07:16 dimsdemo1.node.consul kernel: [58061.463929] init: network-interface-security (network-interface/tun0) goal changed from start to stop
Jun  4 20:07:16 dimsdemo1.node.consul kernel: [58061.463956] init: network-interface-security (network-interface/tun0) state changed from running to stopping
Jun  4 20:07:16 dimsdemo1.node.consul kernel: [58061.464026] init: Handling starting event
Jun  4 20:07:16 dimsdemo1.node.consul kernel: [58061.464080] init: startpar-bridge (network-interface-tun0-stopped) state changed from starting to security
Jun  4 20:07:16 dimsdemo1.node.consul kernel: [58061.464113] init: startpar-bridge (network-interface-tun0-stopped) state changed from security to pre-start
Jun  4 20:07:16 dimsdemo1.node.consul kernel: [58061.464146] init: startpar-bridge (network-interface-tun0-stopped) state changed from pre-start to spawned
Jun  4 20:07:16 dimsdemo1.node.consul kernel: [58061.464639] init: startpar-bridge (network-interface-tun0-stopped) main process (26914)
Jun  4 20:07:16 dimsdemo1.node.consul kernel: [58061.464660] init: startpar-bridge (network-interface-tun0-stopped) state changed from spawned to post-start
Jun  4 20:07:16 dimsdemo1.node.consul kernel: [58061.464705] init: startpar-bridge (network-interface-tun0-stopped) state changed from post-start to running
Jun  4 20:07:16 dimsdemo1.node.consul kernel: [58061.464784] init: Handling stopping event
Jun  4 20:07:16 dimsdemo1.node.consul kernel: [58061.464903] init: network-interface-security (network-interface/tun0) state changed from stopping to killed
Jun  4 20:07:16 dimsdemo1.node.consul kernel: [58061.464936] init: network-interface-security (network-interface/tun0) state changed from killed to post-stop
Jun  4 20:07:16 dimsdemo1.node.consul kernel: [58061.464967] init: network-interface-security (network-interface/tun0) state changed from post-stop to waiting
Jun  4 20:07:16 dimsdemo1.node.consul kernel: [58061.465100] init: Handling started event
Jun  4 20:07:16 dimsdemo1.node.consul kernel: [58061.465180] init: Handling stopped event
Jun  4 20:07:16 dimsdemo1.node.consul kernel: [58061.465236] init: startpar-bridge (network-interface-security-network-interface/tun0-stopped) goal changed from stop to start
Jun  4 20:07:16 dimsdemo1.node.consul kernel: [58061.465267] init: startpar-bridge (network-interface-security-network-interface/tun0-stopped) state changed from waiting to starting
Jun  4 20:07:16 dimsdemo1.node.consul kernel: [58061.465339] init: Handling starting event
Jun  4 20:07:16 dimsdemo1.node.consul kernel: [58061.465379] init: startpar-bridge (network-interface-security-network-interface/tun0-stopped) state changed from starting to security
Jun  4 20:07:16 dimsdemo1.node.consul kernel: [58061.465410] init: startpar-bridge (network-interface-security-network-interface/tun0-stopped) state changed from security to pre-start
Jun  4 20:07:16 dimsdemo1.node.consul kernel: [58061.465438] init: startpar-bridge (network-interface-security-network-interface/tun0-stopped) state changed from pre-start to spawned
Jun  4 20:07:16 dimsdemo1.node.consul kernel: [58061.466165] init: startpar-bridge (network-interface-security-network-interface/tun0-stopped) main process (26915)
Jun  4 20:07:16 dimsdemo1.node.consul kernel: [58061.466190] init: startpar-bridge (network-interface-security-network-interface/tun0-stopped) state changed from spawned to post-start
Jun  4 20:07:16 dimsdemo1.node.consul kernel: [58061.466244] init: startpar-bridge (network-interface-security-network-interface/tun0-stopped) state changed from post-start to running
Jun  4 20:07:16 dimsdemo1.node.consul kernel: [58061.466331] init: Handling started event
Jun  4 20:07:16 dimsdemo1.node.consul kernel: [58061.466610] init: startpar-bridge (network-interface-tun0-stopped) main process (26914) exited normally
Jun  4 20:07:16 dimsdemo1.node.consul kernel: [58061.466667] init: startpar-bridge (network-interface-tun0-stopped) goal changed from start to stop
Jun  4 20:07:16 dimsdemo1.node.consul kernel: [58061.466729] init: startpar-bridge (network-interface-tun0-stopped) state changed from running to stopping
Jun  4 20:07:16 dimsdemo1.node.consul kernel: [58061.466796] init: startpar-bridge (network-interface-security-network-interface/tun0-stopped) main process (26915) exited normally
Jun  4 20:07:16 dimsdemo1.node.consul kernel: [58061.466848] init: startpar-bridge (network-interface-security-network-interface/tun0-stopped) goal changed from start to stop
Jun  4 20:07:16 dimsdemo1.node.consul kernel: [58061.466883] init: startpar-bridge (network-interface-security-network-interface/tun0-stopped) state changed from running to stopping
Jun  4 20:07:16 dimsdemo1.node.consul kernel: [58061.466921] init: Handling stopping event
Jun  4 20:07:16 dimsdemo1.node.consul kernel: [58061.466959] init: startpar-bridge (network-interface-tun0-stopped) state changed from stopping to killed
Jun  4 20:07:16 dimsdemo1.node.consul kernel: [58061.466990] init: startpar-bridge (network-interface-tun0-stopped) state changed from killed to post-stop
Jun  4 20:07:16 dimsdemo1.node.consul kernel: [58061.467020] init: startpar-bridge (network-interface-tun0-stopped) state changed from post-stop to waiting
Jun  4 20:07:16 dimsdemo1.node.consul kernel: [58061.467134] init: Handling stopping event
Jun  4 20:07:16 dimsdemo1.node.consul kernel: [58061.467169] init: startpar-bridge (network-interface-security-network-interface/tun0-stopped) state changed from stopping to killed
Jun  4 20:07:16 dimsdemo1.node.consul kernel: [58061.467199] init: startpar-bridge (network-interface-security-network-interface/tun0-stopped) state changed from killed to post-stop
Jun  4 20:07:16 dimsdemo1.node.consul kernel: [58061.467248] init: startpar-bridge (network-interface-security-network-interface/tun0-stopped) state changed from post-stop to waiting
Jun  4 20:07:16 dimsdemo1.node.consul kernel: [58061.467398] init: Handling stopped event
Jun  4 20:07:16 dimsdemo1.node.consul kernel: [58061.467490] init: Handling stopped event
Jun  4 20:07:16 dimsdemo1.node.consul ovpn-01_prsm_dimsdemo1[14113]: SIGTERM[hard,] received, process exiting
Jun  4 20:07:17 dimsdemo1.node.consul ovpn-02_uwapl_dimsdemo1[14127]: event_wait : Interrupted system call (code=4)
Jun  4 20:07:17 dimsdemo1.node.consul ovpn-02_uwapl_dimsdemo1[14127]: /sbin/ip route del 38.111.193.0/24
Jun  4 20:07:17 dimsdemo1.node.consul ovpn-02_uwapl_dimsdemo1[14127]: ERROR: Linux route delete command failed: external program exited with error status: 2
Jun  4 20:07:17 dimsdemo1.node.consul ovpn-02_uwapl_dimsdemo1[14127]: /sbin/ip route del 199.168.91.0/24
Jun  4 20:07:17 dimsdemo1.node.consul ovpn-02_uwapl_dimsdemo1[14127]: ERROR: Linux route delete command failed: external program exited with error status: 2
Jun  4 20:07:17 dimsdemo1.node.consul ovpn-02_uwapl_dimsdemo1[14127]: /sbin/ip route del 192.168.88.0/24
Jun  4 20:07:17 dimsdemo1.node.consul ovpn-02_uwapl_dimsdemo1[14127]: ERROR: Linux route delete command failed: external program exited with error status: 2
Jun  4 20:07:17 dimsdemo1.node.consul ovpn-02_uwapl_dimsdemo1[14127]: Closing TUN/TAP interface
Jun  4 20:07:17 dimsdemo1.node.consul ovpn-02_uwapl_dimsdemo1[14127]: /sbin/ip addr del dev tun88 10.88.88.5/24
Jun  4 20:07:17 dimsdemo1.node.consul ovpn-02_uwapl_dimsdemo1[14127]: Linux ip addr del failed: external program exited with error status: 2
Jun  4 20:07:17 dimsdemo1.node.consul NetworkManager[1055]:    SCPlugin-Ifupdown: devices removed (path: /sys/devices/virtual/net/tun88, iface: tun88)
Jun  4 20:07:17 dimsdemo1.node.consul kernel: [58061.504410] init: Handling queues-device-removed event
Jun  4 20:07:17 dimsdemo1.node.consul kernel: [58061.504612] init: Handling queues-device-removed event
Jun  4 20:07:17 dimsdemo1.node.consul kernel: [58061.504723] init: Handling net-device-removed event
Jun  4 20:07:17 dimsdemo1.node.consul kernel: [58061.504763] init: network-interface (tun88) goal changed from start to stop
Jun  4 20:07:17 dimsdemo1.node.consul kernel: [58061.504799] init: network-interface (tun88) state changed from running to stopping
Jun  4 20:07:17 dimsdemo1.node.consul kernel: [58061.504844] init: Handling stopping event
Jun  4 20:07:17 dimsdemo1.node.consul kernel: [58061.504877] init: network-interface (tun88) state changed from stopping to killed
Jun  4 20:07:17 dimsdemo1.node.consul kernel: [58061.504907] init: network-interface (tun88) state changed from killed to post-stop
Jun  4 20:07:17 dimsdemo1.node.consul kernel: [58061.505652] init: network-interface (tun88) post-stop process (26927)
Jun  4 20:07:17 dimsdemo1.node.consul kernel: [58061.506919] init: network-interface (tun88) post-stop process (26927) exited normally
Jun  4 20:07:17 dimsdemo1.node.consul kernel: [58061.506976] init: network-interface (tun88) state changed from post-stop to waiting
Jun  4 20:07:17 dimsdemo1.node.consul kernel: [58061.507159] init: Handling stopped event
Jun  4 20:07:17 dimsdemo1.node.consul kernel: [58061.507234] init: startpar-bridge (network-interface-tun88-stopped) goal changed from stop to start
Jun  4 20:07:17 dimsdemo1.node.consul kernel: [58061.507263] init: startpar-bridge (network-interface-tun88-stopped) state changed from waiting to starting
Jun  4 20:07:17 dimsdemo1.node.consul kernel: [58061.507431] init: network-interface-security (network-interface/tun88) goal changed from start to stop
Jun  4 20:07:17 dimsdemo1.node.consul kernel: [58061.507470] init: network-interface-security (network-interface/tun88) state changed from running to stopping
Jun  4 20:07:17 dimsdemo1.node.consul kernel: [58061.507511] init: Handling starting event
Jun  4 20:07:17 dimsdemo1.node.consul kernel: [58061.507554] init: startpar-bridge (network-interface-tun88-stopped) state changed from starting to security
Jun  4 20:07:17 dimsdemo1.node.consul kernel: [58061.507575] init: startpar-bridge (network-interface-tun88-stopped) state changed from security to pre-start
Jun  4 20:07:17 dimsdemo1.node.consul kernel: [58061.507594] init: startpar-bridge (network-interface-tun88-stopped) state changed from pre-start to spawned
Jun  4 20:07:17 dimsdemo1.node.consul kernel: [58061.508094] init: startpar-bridge (network-interface-tun88-stopped) main process (26930)
Jun  4 20:07:17 dimsdemo1.node.consul kernel: [58061.508133] init: startpar-bridge (network-interface-tun88-stopped) state changed from spawned to post-start
Jun  4 20:07:17 dimsdemo1.node.consul kernel: [58061.508181] init: startpar-bridge (network-interface-tun88-stopped) state changed from post-start to running
Jun  4 20:07:17 dimsdemo1.node.consul kernel: [58061.508275] init: Handling stopping event
Jun  4 20:07:17 dimsdemo1.node.consul kernel: [58061.508410] init: network-interface-security (network-interface/tun88) state changed from stopping to killed
Jun  4 20:07:17 dimsdemo1.node.consul kernel: [58061.508441] init: network-interface-security (network-interface/tun88) state changed from killed to post-stop
Jun  4 20:07:17 dimsdemo1.node.consul kernel: [58061.508473] init: network-interface-security (network-interface/tun88) state changed from post-stop to waiting
Jun  4 20:07:17 dimsdemo1.node.consul kernel: [58061.508609] init: Handling started event
Jun  4 20:07:17 dimsdemo1.node.consul kernel: [58061.508713] init: Handling stopped event
Jun  4 20:07:17 dimsdemo1.node.consul kernel: [58061.508803] init: startpar-bridge (network-interface-security-network-interface/tun88-stopped) goal changed from stop to start
Jun  4 20:07:17 dimsdemo1.node.consul kernel: [58061.508863] init: startpar-bridge (network-interface-security-network-interface/tun88-stopped) state changed from waiting to starting
Jun  4 20:07:17 dimsdemo1.node.consul kernel: [58061.508967] init: Handling starting event
Jun  4 20:07:17 dimsdemo1.node.consul kernel: [58061.509008] init: startpar-bridge (network-interface-security-network-interface/tun88-stopped) state changed from starting to security
Jun  4 20:07:17 dimsdemo1.node.consul kernel: [58061.509060] init: startpar-bridge (network-interface-security-network-interface/tun88-stopped) state changed from security to pre-start
Jun  4 20:07:17 dimsdemo1.node.consul kernel: [58061.509109] init: startpar-bridge (network-interface-security-network-interface/tun88-stopped) state changed from pre-start to spawned
Jun  4 20:07:17 dimsdemo1.node.consul kernel: [58061.509733] init: startpar-bridge (network-interface-security-network-interface/tun88-stopped) main process (26931)
Jun  4 20:07:17 dimsdemo1.node.consul kernel: [58061.509753] init: startpar-bridge (network-interface-security-network-interface/tun88-stopped) state changed from spawned to post-start
Jun  4 20:07:17 dimsdemo1.node.consul kernel: [58061.509804] init: startpar-bridge (network-interface-security-network-interface/tun88-stopped) state changed from post-start to running
Jun  4 20:07:17 dimsdemo1.node.consul kernel: [58061.509897] init: Handling started event
Jun  4 20:07:17 dimsdemo1.node.consul kernel: [58061.510246] init: startpar-bridge (network-interface-tun88-stopped) main process (26930) exited normally
Jun  4 20:07:17 dimsdemo1.node.consul kernel: [58061.510303] init: startpar-bridge (network-interface-tun88-stopped) goal changed from start to stop
Jun  4 20:07:17 dimsdemo1.node.consul kernel: [58061.510366] init: startpar-bridge (network-interface-tun88-stopped) state changed from running to stopping
Jun  4 20:07:17 dimsdemo1.node.consul kernel: [58061.510433] init: startpar-bridge (network-interface-security-network-interface/tun88-stopped) main process (26931) exited normally
Jun  4 20:07:17 dimsdemo1.node.consul kernel: [58061.510501] init: startpar-bridge (network-interface-security-network-interface/tun88-stopped) goal changed from start to stop
Jun  4 20:07:17 dimsdemo1.node.consul kernel: [58061.510535] init: startpar-bridge (network-interface-security-network-interface/tun88-stopped) state changed from running to stopping
Jun  4 20:07:17 dimsdemo1.node.consul kernel: [58061.510573] init: Handling stopping event
Jun  4 20:07:17 dimsdemo1.node.consul kernel: [58061.510610] init: startpar-bridge (network-interface-tun88-stopped) state changed from stopping to killed
Jun  4 20:07:17 dimsdemo1.node.consul kernel: [58061.510642] init: startpar-bridge (network-interface-tun88-stopped) state changed from killed to post-stop
Jun  4 20:07:17 dimsdemo1.node.consul kernel: [58061.510672] init: startpar-bridge (network-interface-tun88-stopped) state changed from post-stop to waiting
Jun  4 20:07:17 dimsdemo1.node.consul kernel: [58061.510785] init: Handling stopping event
Jun  4 20:07:17 dimsdemo1.node.consul kernel: [58061.510819] init: startpar-bridge (network-interface-security-network-interface/tun88-stopped) state changed from stopping to killed
Jun  4 20:07:17 dimsdemo1.node.consul kernel: [58061.510849] init: startpar-bridge (network-interface-security-network-interface/tun88-stopped) state changed from killed to post-stop
Jun  4 20:07:17 dimsdemo1.node.consul kernel: [58061.510879] init: startpar-bridge (network-interface-security-network-interface/tun88-stopped) state changed from post-stop to waiting
Jun  4 20:07:17 dimsdemo1.node.consul kernel: [58061.511028] init: Handling stopped event
Jun  4 20:07:17 dimsdemo1.node.consul kernel: [58061.511120] init: Handling stopped event
Jun  4 20:07:17 dimsdemo1.node.consul ovpn-02_uwapl_dimsdemo1[14127]: SIGTERM[hard,] received, process exiting
Jun  4 20:07:17 dimsdemo1.node.consul ovpn-01_prsm_dimsdemo1[26949]: OpenVPN 2.3.2 x86_64-pc-linux-gnu [SSL (OpenSSL)] [LZO] [EPOLL] [PKCS11] [eurephia] [MH] [IPv6] built on Dec  1 2014
Jun  4 20:07:17 dimsdemo1.node.consul ovpn-01_prsm_dimsdemo1[26949]: Control Channel Authentication: tls-auth using INLINE static key file
Jun  4 20:07:17 dimsdemo1.node.consul ovpn-01_prsm_dimsdemo1[26949]: Outgoing Control Channel Authentication: Using 160 bit message hash 'SHA1' for HMAC authentication
Jun  4 20:07:17 dimsdemo1.node.consul ovpn-01_prsm_dimsdemo1[26949]: Incoming Control Channel Authentication: Using 160 bit message hash 'SHA1' for HMAC authentication
Jun  4 20:07:17 dimsdemo1.node.consul ovpn-01_prsm_dimsdemo1[26949]: Socket Buffers: R=[212992->131072] S=[212992->131072]
Jun  4 20:07:17 dimsdemo1.node.consul ovpn-01_prsm_dimsdemo1[26950]: NOTE: UID/GID downgrade will be delayed because of --client, --pull, or --up-delay
Jun  4 20:07:17 dimsdemo1.node.consul ovpn-01_prsm_dimsdemo1[26950]: UDPv4 link local: [undef]
Jun  4 20:07:17 dimsdemo1.node.consul ovpn-01_prsm_dimsdemo1[26950]: UDPv4 link remote: [AF_INET]140.142.29.115:500
Jun  4 20:07:17 dimsdemo1.node.consul ovpn-02_uwapl_dimsdemo1[26963]: OpenVPN 2.3.2 x86_64-pc-linux-gnu [SSL (OpenSSL)] [LZO] [EPOLL] [PKCS11] [eurephia] [MH] [IPv6] built on Dec  1 2014
Jun  4 20:07:17 dimsdemo1.node.consul ovpn-02_uwapl_dimsdemo1[26963]: Control Channel Authentication: tls-auth using INLINE static key file
Jun  4 20:07:17 dimsdemo1.node.consul ovpn-02_uwapl_dimsdemo1[26963]: Outgoing Control Channel Authentication: Using 160 bit message hash 'SHA1' for HMAC authentication
Jun  4 20:07:17 dimsdemo1.node.consul ovpn-02_uwapl_dimsdemo1[26963]: Incoming Control Channel Authentication: Using 160 bit message hash 'SHA1' for HMAC authentication
Jun  4 20:07:17 dimsdemo1.node.consul ovpn-02_uwapl_dimsdemo1[26963]: Socket Buffers: R=[212992->131072] S=[212992->131072]
Jun  4 20:07:17 dimsdemo1.node.consul ovpn-02_uwapl_dimsdemo1[26964]: NOTE: UID/GID downgrade will be delayed because of --client, --pull, or --up-delay
Jun  4 20:07:17 dimsdemo1.node.consul ovpn-02_uwapl_dimsdemo1[26964]: UDPv4 link local: [undef]
Jun  4 20:07:17 dimsdemo1.node.consul ovpn-02_uwapl_dimsdemo1[26964]: UDPv4 link remote: [AF_INET]140.142.29.118:8989
Jun  4 20:07:17 dimsdemo1.node.consul ovpn-02_uwapl_dimsdemo1[26964]: TLS: Initial packet from [AF_INET]140.142.29.118:8989, sid=adf2b40a afa33d74
Jun  4 20:07:17 dimsdemo1.node.consul ovpn-01_prsm_dimsdemo1[26950]: TLS: Initial packet from [AF_INET]140.142.29.115:500, sid=3cf9074f 2e93fa51
Jun  4 20:07:17 dimsdemo1.node.consul ovpn-01_prsm_dimsdemo1[26950]: Data Channel Encrypt: Cipher 'AES-128-CBC' initialized with 128 bit key
Jun  4 20:07:17 dimsdemo1.node.consul ovpn-01_prsm_dimsdemo1[26950]: Data Channel Encrypt: Using 160 bit message hash 'SHA1' for HMAC authentication
Jun  4 20:07:17 dimsdemo1.node.consul ovpn-01_prsm_dimsdemo1[26950]: Data Channel Decrypt: Cipher 'AES-128-CBC' initialized with 128 bit key
Jun  4 20:07:17 dimsdemo1.node.consul ovpn-01_prsm_dimsdemo1[26950]: Data Channel Decrypt: Using 160 bit message hash 'SHA1' for HMAC authentication
Jun  4 20:07:17 dimsdemo1.node.consul ovpn-01_prsm_dimsdemo1[26950]: Control Channel: TLSv1, cipher TLSv1/SSLv3 DHE-RSA-AES256-SHA, 2048 bit RSA
Jun  4 20:07:17 dimsdemo1.node.consul ovpn-01_prsm_dimsdemo1[26950]: [eclipse-prisem] Peer Connection Initiated with [AF_INET]140.142.29.115:500
Jun  4 20:07:17 dimsdemo1.node.consul ovpn-02_uwapl_dimsdemo1[26964]: Data Channel Encrypt: Cipher 'AES-128-CBC' initialized with 128 bit key
Jun  4 20:07:17 dimsdemo1.node.consul ovpn-02_uwapl_dimsdemo1[26964]: Data Channel Encrypt: Using 160 bit message hash 'SHA1' for HMAC authentication
Jun  4 20:07:17 dimsdemo1.node.consul ovpn-02_uwapl_dimsdemo1[26964]: Data Channel Decrypt: Cipher 'AES-128-CBC' initialized with 128 bit key
Jun  4 20:07:17 dimsdemo1.node.consul ovpn-02_uwapl_dimsdemo1[26964]: Data Channel Decrypt: Using 160 bit message hash 'SHA1' for HMAC authentication
Jun  4 20:07:17 dimsdemo1.node.consul ovpn-02_uwapl_dimsdemo1[26964]: Control Channel: TLSv1, cipher TLSv1/SSLv3 DHE-RSA-AES256-SHA, 2048 bit RSA
Jun  4 20:07:17 dimsdemo1.node.consul ovpn-02_uwapl_dimsdemo1[26964]: [server] Peer Connection Initiated with [AF_INET]140.142.29.118:8989
Jun  4 20:07:19 dimsdemo1.node.consul ovpn-01_prsm_dimsdemo1[26950]: SENT CONTROL [eclipse-prisem]: 'PUSH_REQUEST' (status=1)
Jun  4 20:07:19 dimsdemo1.node.consul ovpn-01_prsm_dimsdemo1[26950]: PUSH: Received control message: ...
Jun  4 20:07:19 dimsdemo1.node.consul ovpn-01_prsm_dimsdemo1[26950]: OPTIONS IMPORT: timers and/or timeouts modified
Jun  4 20:07:19 dimsdemo1.node.consul ovpn-01_prsm_dimsdemo1[26950]: OPTIONS IMPORT: LZO parms modified
Jun  4 20:07:19 dimsdemo1.node.consul ovpn-01_prsm_dimsdemo1[26950]: OPTIONS IMPORT: --ifconfig/up options modified
Jun  4 20:07:19 dimsdemo1.node.consul ovpn-01_prsm_dimsdemo1[26950]: OPTIONS IMPORT: route options modified
Jun  4 20:07:19 dimsdemo1.node.consul ovpn-01_prsm_dimsdemo1[26950]: OPTIONS IMPORT: route-related options modified
Jun  4 20:07:19 dimsdemo1.node.consul ovpn-01_prsm_dimsdemo1[26950]: OPTIONS IMPORT: --ip-win32 and/or --dhcp-option options modified
Jun  4 20:07:19 dimsdemo1.node.consul ovpn-01_prsm_dimsdemo1[26950]: ROUTE_GATEWAY 192.168.0.1/255.255.255.0 IFACE=wlan0 HWADDR=d0:53:49:d7:9e:bd
Jun  4 20:07:19 dimsdemo1.node.consul ovpn-01_prsm_dimsdemo1[26950]: TUN/TAP device tun0 opened
Jun  4 20:07:19 dimsdemo1.node.consul ovpn-01_prsm_dimsdemo1[26950]: TUN/TAP TX queue length set to 100
Jun  4 20:07:19 dimsdemo1.node.consul ovpn-01_prsm_dimsdemo1[26950]: do_ifconfig, tt->ipv6=0, tt->did_ifconfig_ipv6_setup=0
Jun  4 20:07:19 dimsdemo1.node.consul ovpn-01_prsm_dimsdemo1[26950]: /sbin/ip link set dev tun0 up mtu 1500
Jun  4 20:07:19 dimsdemo1.node.consul NetworkManager[1055]:    SCPlugin-Ifupdown: devices added (path: /sys/devices/virtual/net/tun0, iface: tun0)
Jun  4 20:07:19 dimsdemo1.node.consul NetworkManager[1055]:    SCPlugin-Ifupdown: device added (path: /sys/devices/virtual/net/tun0, iface: tun0): no ifupdown configuration found.
Jun  4 20:07:19 dimsdemo1.node.consul NetworkManager[1055]: <warn> /sys/devices/virtual/net/tun0: couldn't determine device driver; ignoring...
Jun  4 20:07:19 dimsdemo1.node.consul ovpn-01_prsm_dimsdemo1[26950]: /sbin/ip addr add dev tun0 10.86.86.4/24 broadcast 10.86.86.255
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.897552] init: Handling net-device-added event
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.897768] init: network-interface (tun0) goal changed from stop to start
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.897831] init: network-interface (tun0) state changed from waiting to starting
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.897933] init: Handling starting event
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.898119] init: network-interface-security (network-interface/tun0) goal changed from stop to start
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.898175] init: network-interface-security (network-interface/tun0) state changed from waiting to starting
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.898246] init: Handling starting event
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.898319] init: network-interface-security (network-interface/tun0) state changed from starting to security
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.898373] init: network-interface-security (network-interface/tun0) state changed from security to pre-start
Jun  4 20:07:19 dimsdemo1.node.consul ovpn-01_prsm_dimsdemo1[26950]: /sbin/ip route add 10.142.29.0/24 via 10.86.86.1
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.899415] init: network-interface-security (network-interface/tun0) pre-start process (27032)
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.899754] init: Handling queues-device-added event
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.900062] init: Handling queues-device-added event
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.900301] init: network-interface-security (network-interface/tun0) pre-start process (27032) exited normally
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.900403] init: network-interface-security (network-interface/tun0) state changed from pre-start to spawned
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.900465] init: network-interface-security (network-interface/tun0) state changed from spawned to post-start
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.900527] init: network-interface-security (network-interface/tun0) state changed from post-start to running
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.900591] init: network-interface (tun0) state changed from starting to security
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.900641] init: network-interface (tun0) state changed from security to pre-start
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.901534] init: network-interface (tun0) pre-start process (27033)
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.901884] init: Handling started event
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.902189] init: startpar-bridge (network-interface-security-network-interface/tun0-started) goal changed from stop to start
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.902361] init: startpar-bridge (network-interface-security-network-interface/tun0-started) state changed from waiting to starting
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.902728] init: Handling starting event
Jun  4 20:07:19 dimsdemo1.node.consul ovpn-01_prsm_dimsdemo1[26950]: GID set to nogroup
Jun  4 20:07:19 dimsdemo1.node.consul ovpn-01_prsm_dimsdemo1[26950]: UID set to nobody
Jun  4 20:07:19 dimsdemo1.node.consul ovpn-01_prsm_dimsdemo1[26950]: Initialization Sequence Completed
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.902874] init: startpar-bridge (network-interface-security-network-interface/tun0-started) state changed from starting to security
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.903036] init: startpar-bridge (network-interface-security-network-interface/tun0-started) state changed from security to pre-start
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.903191] init: startpar-bridge (network-interface-security-network-interface/tun0-started) state changed from pre-start to spawned
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.904568] init: startpar-bridge (network-interface-security-network-interface/tun0-started) main process (27035)
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.904606] init: startpar-bridge (network-interface-security-network-interface/tun0-started) state changed from spawned to post-start
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.904693] init: startpar-bridge (network-interface-security-network-interface/tun0-started) state changed from post-start to running
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.904841] init: Handling started event
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.905285] init: startpar-bridge (network-interface-security-network-interface/tun0-started) main process (27035) exited normally
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.905430] init: startpar-bridge (network-interface-security-network-interface/tun0-started) goal changed from start to stop
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.905509] init: startpar-bridge (network-interface-security-network-interface/tun0-started) state changed from running to stopping
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.905583] init: Handling stopping event
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.905688] init: startpar-bridge (network-interface-security-network-interface/tun0-started) state changed from stopping to killed
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.905752] init: startpar-bridge (network-interface-security-network-interface/tun0-started) state changed from killed to post-stop
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.905809] init: startpar-bridge (network-interface-security-network-interface/tun0-started) state changed from post-stop to waiting
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.906042] init: Handling stopped event
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.907410] init: network-interface (tun0) pre-start process (27033) exited normally
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.907464] init: network-interface (tun0) state changed from pre-start to spawned
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.907497] init: network-interface (tun0) state changed from spawned to post-start
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.907531] init: network-interface (tun0) state changed from post-start to running
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.907616] init: Handling started event
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.907693] init: startpar-bridge (network-interface-tun0-started) goal changed from stop to start
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.907727] init: startpar-bridge (network-interface-tun0-started) state changed from waiting to starting
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.907816] init: Handling starting event
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.907870] init: startpar-bridge (network-interface-tun0-started) state changed from starting to security
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.907897] init: startpar-bridge (network-interface-tun0-started) state changed from security to pre-start
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.907927] init: startpar-bridge (network-interface-tun0-started) state changed from pre-start to spawned
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.908460] init: startpar-bridge (network-interface-tun0-started) main process (27039)
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.908481] init: startpar-bridge (network-interface-tun0-started) state changed from spawned to post-start
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.908526] init: startpar-bridge (network-interface-tun0-started) state changed from post-start to running
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.908606] init: Handling started event
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.908945] init: startpar-bridge (network-interface-tun0-started) main process (27039) exited normally
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.909008] init: startpar-bridge (network-interface-tun0-started) goal changed from start to stop
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.909044] init: startpar-bridge (network-interface-tun0-started) state changed from running to stopping
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.909082] init: Handling stopping event
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.909120] init: startpar-bridge (network-interface-tun0-started) state changed from stopping to killed
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.909151] init: startpar-bridge (network-interface-tun0-started) state changed from killed to post-stop
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.909183] init: startpar-bridge (network-interface-tun0-started) state changed from post-stop to waiting
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58063.909293] init: Handling stopped event
Jun  4 20:07:19 dimsdemo1.node.consul ovpn-02_uwapl_dimsdemo1[26964]: SENT CONTROL [server]: 'PUSH_REQUEST' (status=1)
Jun  4 20:07:19 dimsdemo1.node.consul ovpn-02_uwapl_dimsdemo1[26964]: PUSH: Received control message: ...
Jun  4 20:07:19 dimsdemo1.node.consul ovpn-02_uwapl_dimsdemo1[26964]: OPTIONS IMPORT: timers and/or timeouts modified
Jun  4 20:07:19 dimsdemo1.node.consul ovpn-02_uwapl_dimsdemo1[26964]: OPTIONS IMPORT: --ifconfig/up options modified
Jun  4 20:07:19 dimsdemo1.node.consul ovpn-02_uwapl_dimsdemo1[26964]: OPTIONS IMPORT: route options modified
Jun  4 20:07:19 dimsdemo1.node.consul ovpn-02_uwapl_dimsdemo1[26964]: OPTIONS IMPORT: route-related options modified
Jun  4 20:07:19 dimsdemo1.node.consul ovpn-02_uwapl_dimsdemo1[26964]: ROUTE_GATEWAY 192.168.0.1/255.255.255.0 IFACE=wlan0 HWADDR=d0:53:49:d7:9e:bd
Jun  4 20:07:19 dimsdemo1.node.consul ovpn-02_uwapl_dimsdemo1[26964]: TUN/TAP device tun88 opened
Jun  4 20:07:19 dimsdemo1.node.consul ovpn-02_uwapl_dimsdemo1[26964]: TUN/TAP TX queue length set to 100
Jun  4 20:07:19 dimsdemo1.node.consul ovpn-02_uwapl_dimsdemo1[26964]: do_ifconfig, tt->ipv6=0, tt->did_ifconfig_ipv6_setup=0
Jun  4 20:07:19 dimsdemo1.node.consul ovpn-02_uwapl_dimsdemo1[26964]: /sbin/ip link set dev tun88 up mtu 1500
Jun  4 20:07:19 dimsdemo1.node.consul NetworkManager[1055]:    SCPlugin-Ifupdown: devices added (path: /sys/devices/virtual/net/tun88, iface: tun88)
Jun  4 20:07:19 dimsdemo1.node.consul NetworkManager[1055]:    SCPlugin-Ifupdown: device added (path: /sys/devices/virtual/net/tun88, iface: tun88): no ifupdown configuration found.
Jun  4 20:07:19 dimsdemo1.node.consul NetworkManager[1055]: <warn> /sys/devices/virtual/net/tun88: couldn't determine device driver; ignoring...
Jun  4 20:07:19 dimsdemo1.node.consul ovpn-02_uwapl_dimsdemo1[26964]: /sbin/ip addr add dev tun88 10.88.88.2/24 broadcast 10.88.88.255
Jun  4 20:07:19 dimsdemo1.node.consul ovpn-02_uwapl_dimsdemo1[26964]: /sbin/ip route add 192.168.88.0/24 via 10.88.88.1
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.341486] init: Handling net-device-added event
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.341622] init: network-interface (tun88) goal changed from stop to start
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.341655] init: network-interface (tun88) state changed from waiting to starting
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.341714] init: Handling starting event
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.341838] init: network-interface-security (network-interface/tun88) goal changed from stop to start
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.341869] init: network-interface-security (network-interface/tun88) state changed from waiting to starting
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.341905] init: Handling starting event
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.341945] init: network-interface-security (network-interface/tun88) state changed from starting to security
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.341976] init: network-interface-security (network-interface/tun88) state changed from security to pre-start
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.342560] init: network-interface-security (network-interface/tun88) pre-start process (27060)
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.342787] init: Handling queues-device-added event
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.342956] init: Handling queues-device-added event
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.343091] init: network-interface-security (network-interface/tun88) pre-start process (27060) exited normally
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.343149] init: network-interface-security (network-interface/tun88) state changed from pre-start to spawned
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.343187] init: network-interface-security (network-interface/tun88) state changed from spawned to post-start
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.343217] init: network-interface-security (network-interface/tun88) state changed from post-start to running
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.343275] init: network-interface (tun88) state changed from starting to security
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.343310] init: network-interface (tun88) state changed from security to pre-start
Jun  4 20:07:19 dimsdemo1.node.consul ovpn-02_uwapl_dimsdemo1[26964]: /sbin/ip route add 199.168.91.0/24 via 10.88.88.1
Jun  4 20:07:19 dimsdemo1.node.consul ovpn-02_uwapl_dimsdemo1[26964]: /sbin/ip route add 38.111.193.0/24 via 10.88.88.1
Jun  4 20:07:19 dimsdemo1.node.consul ovpn-02_uwapl_dimsdemo1[26964]: GID set to nogroup
Jun  4 20:07:19 dimsdemo1.node.consul ovpn-02_uwapl_dimsdemo1[26964]: UID set to nobody
Jun  4 20:07:19 dimsdemo1.node.consul ovpn-02_uwapl_dimsdemo1[26964]: Initialization Sequence Completed
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.343904] init: network-interface (tun88) pre-start process (27062)
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.344021] init: Handling started event
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.344112] init: startpar-bridge (network-interface-security-network-interface/tun88-started) goal changed from stop to start
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.344155] init: startpar-bridge (network-interface-security-network-interface/tun88-started) state changed from waiting to starting
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.344310] init: Handling starting event
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.344352] init: startpar-bridge (network-interface-security-network-interface/tun88-started) state changed from starting to security
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.344387] init: startpar-bridge (network-interface-security-network-interface/tun88-started) state changed from security to pre-start
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.344418] init: startpar-bridge (network-interface-security-network-interface/tun88-started) state changed from pre-start to spawned
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.344889] init: startpar-bridge (network-interface-security-network-interface/tun88-started) main process (27064)
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.344908] init: startpar-bridge (network-interface-security-network-interface/tun88-started) state changed from spawned to post-start
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.344956] init: startpar-bridge (network-interface-security-network-interface/tun88-started) state changed from post-start to running
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.345036] init: Handling started event
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.345420] init: startpar-bridge (network-interface-security-network-interface/tun88-started) main process (27064) exited normally
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.345490] init: startpar-bridge (network-interface-security-network-interface/tun88-started) goal changed from start to stop
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.345534] init: startpar-bridge (network-interface-security-network-interface/tun88-started) state changed from running to stopping
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.345573] init: Handling stopping event
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.345641] init: startpar-bridge (network-interface-security-network-interface/tun88-started) state changed from stopping to killed
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.345680] init: startpar-bridge (network-interface-security-network-interface/tun88-started) state changed from killed to post-stop
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.345709] init: startpar-bridge (network-interface-security-network-interface/tun88-started) state changed from post-stop to waiting
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.345834] init: Handling stopped event
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.347178] init: network-interface (tun88) pre-start process (27062) exited normally
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.347251] init: network-interface (tun88) state changed from pre-start to spawned
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.347299] init: network-interface (tun88) state changed from spawned to post-start
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.347333] init: network-interface (tun88) state changed from post-start to running
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.347414] init: Handling started event
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.347488] init: startpar-bridge (network-interface-tun88-started) goal changed from stop to start
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.347525] init: startpar-bridge (network-interface-tun88-started) state changed from waiting to starting
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.347619] init: Handling starting event
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.347660] init: startpar-bridge (network-interface-tun88-started) state changed from starting to security
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.347691] init: startpar-bridge (network-interface-tun88-started) state changed from security to pre-start
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.347719] init: startpar-bridge (network-interface-tun88-started) state changed from pre-start to spawned
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.348254] init: startpar-bridge (network-interface-tun88-started) main process (27069)
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.348277] init: startpar-bridge (network-interface-tun88-started) state changed from spawned to post-start
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.348328] init: startpar-bridge (network-interface-tun88-started) state changed from post-start to running
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.348422] init: Handling started event
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.348731] init: startpar-bridge (network-interface-tun88-started) main process (27069) exited normally
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.348796] init: startpar-bridge (network-interface-tun88-started) goal changed from start to stop
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.348841] init: startpar-bridge (network-interface-tun88-started) state changed from running to stopping
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.348874] init: Handling stopping event
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.348913] init: startpar-bridge (network-interface-tun88-started) state changed from stopping to killed
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.348934] init: startpar-bridge (network-interface-tun88-started) state changed from killed to post-stop
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.348953] init: startpar-bridge (network-interface-tun88-started) state changed from post-stop to waiting
Jun  4 20:07:19 dimsdemo1.node.consul kernel: [58064.349059] init: Handling stopped event
Jun  4 20:07:36 dimsdemo1.node.consul DITTRICH: Done

Diagnosing System Problems and Outages

Using dimscli

This chapter covers using dimscli as a distributed shell for diagnosing problems throughout a DIMS deployment.

Ansible has two primary CLI programs, ansible and ansible-playbook. Both of these programs are passed a set of hosts on which they are to operate using an Inventory.

Note

Read about Ansible and how it is used by the DIMS project in Section ansibleplaybooks:ansiblefundamentals of ansibleplaybooks:ansibleplaybooks.

[dimsenv] dittrich@dimsdemo1:~/dims/git/python-dimscli (develop*) $ cat complete_inventory
[all]
floyd2-p.prisem.washington.edu
foswiki-int.prisem.washington.edu
git.prisem.washington.edu
hub.prisem.washington.edu
jenkins-int.prisem.washington.edu
jira-int.prisem.washington.edu
lapp-int.prisem.washington.edu
lapp.prisem.washington.edu
linda-vm1.prisem.washington.edu
rabbitmq.prisem.washington.edu
sso.prisem.washington.edu
time.prisem.washington.edu
u12-dev-svr-1.prisem.washington.edu
u12-dev-ws-1.prisem.washington.edu
wellington.prisem.washington.edu

Using this inventory, the modules command and shell can be used to run commands as needed to diagnose all of these hosts at once.

[dimsenv] dittrich@dimsdemo1:~/dims/git/python-dimscli (develop*) $ dimscli ansible command --program "uptime" --inventory complete_inventory --remote-port 8422 --remote-user dittrich
+-------------------------------------+--------+-------------------------------------------------------------------------+
| Host                                | Status | Results                                                                 |
+-------------------------------------+--------+-------------------------------------------------------------------------+
| rabbitmq.prisem.washington.edu      | GOOD   |  22:07:53 up 33 days,  4:32,  1 user,  load average: 0.07, 0.13, 0.09   |
| wellington.prisem.washington.edu    | GOOD   |  22:07:57 up 159 days, 12:16,  1 user,  load average: 1.16, 0.86, 0.58  |
| linda-vm1.prisem.washington.edu     | GOOD   |  22:07:54 up 159 days, 12:03,  1 user,  load average: 0.00, 0.01, 0.05  |
| git.prisem.washington.edu           | GOOD   |  22:07:54 up 159 days, 12:03,  2 users,  load average: 0.00, 0.01, 0.05 |
| time.prisem.washington.edu          | GOOD   |  22:07:55 up 33 days,  4:33,  2 users,  load average: 0.01, 0.07, 0.12  |
| jenkins-int.prisem.washington.edu   | GOOD   |  22:07:55 up 159 days, 12:03,  1 user,  load average: 0.00, 0.01, 0.05  |
| u12-dev-ws-1.prisem.washington.edu  | GOOD   |  22:07:56 up 159 days, 12:03,  1 user,  load average: 0.00, 0.02, 0.05  |
| sso.prisem.washington.edu           | GOOD   |  22:07:56 up 159 days, 12:03,  1 user,  load average: 0.00, 0.01, 0.05  |
| lapp-int.prisem.washington.edu      | GOOD   |  22:07:54 up 159 days, 12:04,  2 users,  load average: 0.00, 0.01, 0.05 |
| foswiki-int.prisem.washington.edu   | GOOD   |  22:07:55 up 159 days, 12:04,  1 user,  load average: 0.00, 0.01, 0.05  |
| u12-dev-svr-1.prisem.washington.edu | GOOD   |  22:07:59 up 155 days, 14:56,  1 user,  load average: 0.05, 0.08, 0.06  |
| hub.prisem.washington.edu           | GOOD   |  06:07:53 up 141 days, 12:19,  1 user,  load average: 0.08, 0.03, 0.05  |
| floyd2-p.prisem.washington.edu      | GOOD   |  22:07:53 up 33 days,  4:32,  1 user,  load average: 0.00, 0.01, 0.05   |
| jira-int.prisem.washington.edu      | GOOD   |  22:07:54 up 159 days, 12:03,  2 users,  load average: 0.00, 0.01, 0.05 |
| lapp.prisem.washington.edu          | GOOD   |  22:07:54 up 159 days, 12:04,  2 users,  load average: 0.00, 0.01, 0.05 |
+-------------------------------------+--------+-------------------------------------------------------------------------+
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
 To: dims-devops@uw.ops-trust.net
 From: Jenkins <dims@eclipse.prisem.washington.edu>
 Subject: [dims devops] [Jenkins] [FAILURE] jenkins-update-cifbulk-server-develop-16
 Date: Thu Jan 14 20:35:21 PST 2016
 Message-ID: <20160115043521.C7D5E1C004F@jenkins>

 Started by an SCM change
 [EnvInject] - Loading node environment variables.
 Building in workspace /var/lib/jenkins/jobs/update-cifbulk-server-develop/workspace

 Deleting project workspace... done

 [ssh-agent] Using credentials ansible (Ansible user ssh key - root)
 [ssh-agent] Looking for ssh-agent implementation...
 [ssh-agent]   Java/JNR ssh-agent
 [ssh-agent] Started.

  ...

 TASK: [cifbulk-server | Make config change available and restart if updating existing] ***
 <rabbitmq.prisem.washington.edu> REMOTE_MODULE command . /opt/dims/envs/dimsenv/bin/activate && supervisorctl -c /etc/supervisord.conf reread #USE_SHELL
 failed: [rabbitmq.prisem.washington.edu] => (item=reread) => {"changed": true, "cmd": ". /opt/dims/envs/dimsenv/bin/activate && supervisorctl -c /etc/supervisord.conf reread", "delta": "0:00:00.229614", "end": "2016-01-14 20:34:49.409784", "item": "reread", "rc": 2, "start": "2016-01-14 20:34:49.180170"}
 stderr: Error: could not find config file /etc/supervisord.conf
 For help, use /usr/bin/supervisorctl -h
 <rabbitmq.prisem.washington.edu> REMOTE_MODULE command . /opt/dims/envs/dimsenv/bin/activate && supervisorctl -c /etc/supervisord.conf update #USE_SHELL
 failed: [rabbitmq.prisem.washington.edu] => (item=update) => {"changed": true, "cmd": ". /opt/dims/envs/dimsenv/bin/activate && supervisorctl -c /etc/supervisord.conf update", "delta": "0:00:00.235882", "end": "2016-01-14 20:34:50.097224", "item": "update", "rc": 2, "start": "2016-01-14 20:34:49.861342"}
 stderr: Error: could not find config file /etc/supervisord.conf
 For help, use /usr/bin/supervisorctl -h

 FATAL: all hosts have already failed -- aborting

 PLAY RECAP ********************************************************************
            to retry, use: --limit @/var/lib/jenkins/cifbulk-server-configure.retry

 rabbitmq.prisem.washington.edu : ok=11   changed=4    unreachable=0    failed=1

 Build step 'Execute shell' marked build as failure
 [ssh-agent] Stopped.
 Warning: you have no plugins providing access control for builds, so falling back to legacy behavior of permitting any downstream builds to be triggered
 Finished: FAILURE
 --
 [[ UW/DIMS ]]: All message content remains the property of the author
 and must not be forwarded or redistributed without explicit permission.
[dimsenv] dittrich@dimsdemo1:~/dims/git/ansible-playbooks (develop*) $ grep -r supervisord.conf
roles/supervisor-install/tasks/main.yml:  template: "src=supervisord.conf.j2 dest={{ dims_supervisord_conf }} owner=root group=root"
roles/supervisor-install/tasks/main.yml:  file: path=/etc/dims-supervisord.conf state=absent
roles/supervisor-install/templates/supervisor.j2:DAEMON_OPTS="-c {{ dims_supervisord_conf }} $DAEMON_OPTS"
roles/cifbulk-server/tasks/main.yml:  shell: ". {{ dimsenv_activate }} && supervisorctl -c {{ dims_supervisord_conf }} {{ item }}"
roles/cifbulk-server/tasks/main.yml:  shell: ". {{ dimsenv_activate }} && supervisorctl -c {{ dims_supervisord_conf }} start {{ name_base }}:"
roles/prisem-scripts-deploy/tasks/main.yml:  shell: ". {{ dimsenv_activate }} && supervisorctl -c {{ dims_supervisord_conf }} restart {{ item }}:"
roles/anon-server/tasks/main.yml:  shell: ". {{ dimsenv_activate }} && supervisorctl -c {{ dims_supervisord_conf }} {{ item }}"
roles/anon-server/tasks/main.yml:  shell: ". {{ dimsenv_activate }} && supervisorctl -c {{ dims_supervisord_conf }} start {{ name_base }}:"
roles/consul-install/tasks/main.yml:  shell: ". {{ dimsenv_activate }} && supervisorctl -c {{ dims_supervisord_conf }} remove {{ consul_basename }}"
roles/consul-install/tasks/main.yml:  shell: ". {{ dimsenv_activate }} && supervisorctl -c {{ dims_supervisord_conf }} {{ item }}"
roles/consul-install/tasks/main.yml:  shell: ". {{ dimsenv_activate }} && supervisorctl -c {{ dims_supervisord_conf }} start {{ consul_basename }}:"
roles/crosscor-server/tasks/main.yml:  shell: ". {{ dimsenv_activate }} && supervisorctl -c {{ dims_supervisord_conf }} {{ item }}"
roles/crosscor-server/tasks/main.yml:  shell: ". {{ dimsenv_activate }} && supervisorctl -c {{ dims_supervisord_conf }} start {{ name_base }}:"
group_vars/all:dims_supervisord_conf: '/etc/supervisord.conf'
[dimsenv] dittrich@dimsdemo1:~/dims/git/python-dimscli (develop*) $ dimscli ansible shell --program "find /etc -name supervisord.conf" --inventory complete_inventory --remote-port 8422 --remote-u
ser dittrich
+-------------------------------------+--------+----------------------------------+
| Host                                | Status | Results                          |
+-------------------------------------+--------+----------------------------------+
| rabbitmq.prisem.washington.edu      | GOOD   | /etc/supervisor/supervisord.conf |
| wellington.prisem.washington.edu    | GOOD   |                                  |
| hub.prisem.washington.edu           | GOOD   |                                  |
| git.prisem.washington.edu           | GOOD   | /etc/supervisor/supervisord.conf |
| u12-dev-ws-1.prisem.washington.edu  | GOOD   |                                  |
| sso.prisem.washington.edu           | GOOD   |                                  |
| jenkins-int.prisem.washington.edu   | GOOD   | /etc/supervisor/supervisord.conf |
| foswiki-int.prisem.washington.edu   | GOOD   |                                  |
| lapp-int.prisem.washington.edu      | GOOD   |                                  |
| u12-dev-svr-1.prisem.washington.edu | GOOD   | /etc/supervisor/supervisord.conf |
| linda-vm1.prisem.washington.edu     | GOOD   |                                  |
| lapp.prisem.washington.edu          | GOOD   |                                  |
| floyd2-p.prisem.washington.edu      | GOOD   |                                  |
| jira-int.prisem.washington.edu      | GOOD   | /etc/supervisor/supervisord.conf |
| time.prisem.washington.edu          | GOOD   |                                  |
+-------------------------------------+--------+----------------------------------+
[dimsenv] dittrich@dimsdemo1:~/dims/git/python-dimscli (develop*) $ dimscli ansible shell --program "find /etc -name '*supervisor'*" --inventory complete_inventory --remote-port 8422 --remote-use
r dittrich
+-------------------------------------+--------+-------------------------------------------------+
| Host                                | Status | Results                                         |
+-------------------------------------+--------+-------------------------------------------------+
| rabbitmq.prisem.washington.edu      | GOOD   | /etc/rc0.d/K20supervisor                        |
|                                     |        | /etc/rc3.d/S20supervisor                        |
|                                     |        | /etc/rc1.d/K20supervisor                        |
|                                     |        | /etc/default/supervisor                         |
|                                     |        | /etc/rc2.d/S20supervisor                        |
|                                     |        | /etc/rc6.d/K20supervisor                        |
|                                     |        | /etc/supervisor                                 |
|                                     |        | /etc/supervisor/supervisord.conf.20140214204135 |
|                                     |        | /etc/supervisor/supervisord.conf.20140214200547 |
|                                     |        | /etc/supervisor/supervisord.conf.20140616162335 |
|                                     |        | /etc/supervisor/supervisord.conf.20140814132409 |
|                                     |        | /etc/supervisor/supervisord.conf.20140616162451 |
|                                     |        | /etc/supervisor/supervisord.conf.20140616162248 |
|                                     |        | /etc/supervisor/supervisord.conf.20140131230939 |
|                                     |        | /etc/supervisor/supervisord.conf.20140222154901 |
|                                     |        | /etc/supervisor/supervisord.conf.20140214194415 |
|                                     |        | /etc/supervisor/supervisord.conf.20140222155042 |
|                                     |        | /etc/supervisor/supervisord.conf.20150208174308 |
|                                     |        | /etc/supervisor/supervisord.conf.20140814132717 |
|                                     |        | /etc/supervisor/supervisord.conf.20140215134451 |
|                                     |        | /etc/supervisor/supervisord.conf.20150208174742 |
|                                     |        | /etc/supervisor/supervisord.conf.20140911193305 |
|                                     |        | /etc/supervisor/supervisord.conf.20140219200951 |
|                                     |        | /etc/supervisor/supervisord.conf.20140911202633 |
|                                     |        | /etc/supervisor/supervisord.conf                |
|                                     |        | /etc/supervisor/supervisord.conf.20140222154751 |
|                                     |        | /etc/supervisor/supervisord.conf.20150208174403 |
|                                     |        | /etc/supervisor/supervisord.conf.20140814132351 |
|                                     |        | /etc/supervisor/supervisord.conf.20140814132759 |
|                                     |        | /etc/rc4.d/S20supervisor                        |
|                                     |        | /etc/init.d/supervisor                          |
|                                     |        | /etc/rc5.d/S20supervisor                        |
| wellington.prisem.washington.edu    | GOOD   |                                                 |
| linda-vm1.prisem.washington.edu     | GOOD   | /etc/rc0.d/K20supervisor                        |
|                                     |        | /etc/rc3.d/S20supervisor                        |
|                                     |        | /etc/rc1.d/K20supervisor                        |
|                                     |        | /etc/rc2.d/S20supervisor                        |
|                                     |        | /etc/rc6.d/K20supervisor                        |
|                                     |        | /etc/supervisor                                 |
|                                     |        | /etc/rc4.d/S20supervisor                        |
|                                     |        | /etc/dims-supervisord.conf                      |
|                                     |        | /etc/init.d/supervisor                          |
|                                     |        | /etc/rc5.d/S20supervisor                        |
| git.prisem.washington.edu           | GOOD   | /etc/rc0.d/K20supervisor                        |
|                                     |        | /etc/rc3.d/S20supervisor                        |
|                                     |        | /etc/rc1.d/K20supervisor                        |
|                                     |        | /etc/default/supervisor                         |
|                                     |        | /etc/rc2.d/S20supervisor                        |
|                                     |        | /etc/rc6.d/K20supervisor                        |
|                                     |        | /etc/supervisor                                 |
|                                     |        | /etc/supervisor/supervisord.conf                |
|                                     |        | /etc/rc4.d/S20supervisor                        |
|                                     |        | /etc/init.d/supervisor                          |
|                                     |        | /etc/rc5.d/S20supervisor                        |
| time.prisem.washington.edu          | GOOD   |                                                 |
| jenkins-int.prisem.washington.edu   | GOOD   | /etc/rc0.d/K20supervisor                        |
|                                     |        | /etc/rc3.d/S20supervisor                        |
|                                     |        | /etc/rc1.d/K20supervisor                        |
|                                     |        | /etc/default/supervisor                         |
|                                     |        | /etc/rc2.d/S20supervisor                        |
|                                     |        | /etc/rc6.d/K20supervisor                        |
|                                     |        | /etc/supervisor                                 |
|                                     |        | /etc/supervisor/supervisord.conf                |
|                                     |        | /etc/rc4.d/S20supervisor                        |
|                                     |        | /etc/init.d/supervisor                          |
|                                     |        | /etc/rc5.d/S20supervisor                        |
| u12-dev-ws-1.prisem.washington.edu  | GOOD   |                                                 |
| sso.prisem.washington.edu           | GOOD   |                                                 |
| lapp-int.prisem.washington.edu      | GOOD   |                                                 |
| foswiki-int.prisem.washington.edu   | GOOD   |                                                 |
| u12-dev-svr-1.prisem.washington.edu | GOOD   | /etc/rc2.d/S20supervisor                        |
|                                     |        | /etc/rc4.d/S20supervisor                        |
|                                     |        | /etc/init.d/supervisor                          |
|                                     |        | /etc/rc5.d/S20supervisor                        |
|                                     |        | /etc/rc3.d/S20supervisor                        |
|                                     |        | /etc/supervisor                                 |
|                                     |        | /etc/supervisor/supervisord.conf                |
|                                     |        | /etc/rc6.d/K20supervisor                        |
|                                     |        | /etc/rc1.d/K20supervisor                        |
|                                     |        | /etc/rc0.d/K20supervisor                        |
| hub.prisem.washington.edu           | GOOD   |                                                 |
| floyd2-p.prisem.washington.edu      | GOOD   |                                                 |
| jira-int.prisem.washington.edu      | GOOD   | /etc/rc0.d/K20supervisor                        |
|                                     |        | /etc/rc3.d/S20supervisor                        |
|                                     |        | /etc/rc1.d/K20supervisor                        |
|                                     |        | /etc/default/supervisor                         |
|                                     |        | /etc/rc2.d/S20supervisor                        |
|                                     |        | /etc/rc6.d/K20supervisor                        |
|                                     |        | /etc/supervisor                                 |
|                                     |        | /etc/supervisor/supervisord.conf                |
|                                     |        | /etc/rc4.d/S20supervisor                        |
|                                     |        | /etc/init.d/supervisor                          |
|                                     |        | /etc/rc5.d/S20supervisor                        |
| lapp.prisem.washington.edu          | GOOD   |                                                 |
+-------------------------------------+--------+-------------------------------------------------+

While the concept of putting a list of host names into a file with a label is simple to understand, it is not very flexible or scalable. Ansible supports a concept called a Dynamic Inventory. Rather than passing a hosts file using -i or --inventory, you can pass a Python script that produces a special JSON object.

What is not very widely known is that you can also trigger creation of a dynamic inventory within ansible or ansible-playbook by passing a list for the -i or --inventory option. Rather than creating a temporary file with [all] at the top, followed by a list of three host names, then passing that file with -i or --inventory, just pass a comma-separated list instead:

[dimsenv] dittrich@dimsdemo1:~/dims/git/python-dimscli (develop*) $ dimscli ansible shell --program "find /etc -name supervisord.conf" --inventory rabbitmq.prisem.washington.edu,time.prisem.washi
ngton.edu,u12-dev-svr-1.prisem.washington.edu --remote-port 8422 --remote-user dittrich
+-------------------------------------+--------+----------------------------------+
| Host                                | Status | Results                          |
+-------------------------------------+--------+----------------------------------+
| rabbitmq.prisem.washington.edu      | GOOD   | /etc/supervisor/supervisord.conf |
| time.prisem.washington.edu          | GOOD   |                                  |
| u12-dev-svr-1.prisem.washington.edu | GOOD   | /etc/supervisor/supervisord.conf |
+-------------------------------------+--------+----------------------------------+

There is a subtle trick for passing just a single host, and that is to pass the name with a trailing comma (,), as seen here:

[dimsenv] dittrich@dimsdemo1:~/dims/git/python-dimscli (develop*) $ dimscli ansible shell --program "find /etc -name supervisord.conf" --inventory rabbitmq.prisem.washington.edu, --remote-port 84
22 --remote-user dittrich
+--------------------------------+--------+----------------------------------+
| Host                           | Status | Results                          |
+--------------------------------+--------+----------------------------------+
| rabbitmq.prisem.washington.edu | GOOD   | /etc/supervisor/supervisord.conf |
+--------------------------------+--------+----------------------------------+
...

Debugging Vagrant

Vagrant has a mechanism for enabling debugging output to determine what it is doing. That mechanism is to set an environment variable VAGRANT_LOG=debug before running vagrant.

$ vagrant halt
$ VAGRANT_LOG=debug vagrant up --no-provision > /tmp/debug.log.1 2>&1

The debugging log looks like the following:

 INFO global: Vagrant version: 1.8.6
 INFO global: Ruby version: 2.2.5
 INFO global: RubyGems version: 2.4.5.1
 INFO global: VAGRANT_LOG="debug"
 INFO global: VAGRANT_OLD_ENV_TMPDIR="/tmp"
 INFO global: VAGRANT_OLD_ENV_COMMAND=""
 INFO global: VAGRANT_OLD_ENV_LANG="en_US.UTF-8"
 INFO global: VAGRANT_OLD_ENV_UNDEFINED="__undefined__"
 INFO global: VAGRANT_OLD_ENV_TERM="screen-256color"
 INFO global: VAGRANT_OLD_ENV_VAGRANT_LOG="debug"

 . . .

 INFO global: VAGRANT_INTERNAL_BUNDLERIZED="1"
 INFO global: Plugins:
 INFO global:   - bundler = 1.12.5
 INFO global:   - unf_ext = 0.0.7.2
 INFO global:   - unf = 0.1.4
 INFO global:   - domain_name = 0.5.20161129
 INFO global:   - http-cookie = 1.0.3
 INFO global:   - i18n = 0.7.0
 INFO global:   - log4r = 1.1.10
 INFO global:   - micromachine = 2.0.0
 INFO global:   - mime-types-data = 3.2016.0521
 INFO global:   - mime-types = 3.1
 INFO global:   - net-ssh = 3.0.2
 INFO global:   - net-scp = 1.1.2
 INFO global:   - netrc = 0.11.0
 INFO global:   - rest-client = 2.0.0
 INFO global:   - vagrant-scp = 0.5.7
 INFO global:   - vagrant-share = 1.1.6
 INFO global:   - vagrant-triggers = 0.5.3
 INFO global:   - vagrant-vbguest = 0.13.0

 . . .

 INFO vagrant: `vagrant` invoked: ["up"]
DEBUG vagrant: Creating Vagrant environment
 INFO environment: Environment initialized (#<Vagrant::Environment:0x00000002618e68>)
 INFO environment:   - cwd: /vm/run/blue14
 INFO environment: Home path: /home/ansible/.vagrant.d
DEBUG environment: Effective local data path: /vm/run/blue14/.vagrant
 INFO environment: Local data path: /vm/run/blue14/.vagrant
DEBUG environment: Creating: /vm/run/blue14/.vagrant
 INFO environment: Running hook: environment_plugins_loaded
 INFO runner: Preparing hooks for middleware sequence...
 INFO runner: 3 hooks defined.
 INFO runner: Running action: environment_plugins_loaded #<Vagrant::Action::Builder:0x000000025278b0>

 . .

DEBUG meta: Finding driver for VirtualBox version: 5.1.10
 INFO meta: Using VirtualBox driver: VagrantPlugins::ProviderVirtualBox::Driver::Version_5_1
 INFO base: VBoxManage path: VBoxManage
 INFO subprocess: Starting process: ["/usr/bin/VBoxManage", "showvminfo", "d1f7ffcb-3fab-4878-a77d-5fdb8d2f7fae"]
 INFO subprocess: Command not in installer, restoring original environment...
DEBUG subprocess: Selecting on IO
DEBUG subprocess: stdout: Name:            blue14_default_1482088614789_39851
Groups:          /
Guest OS:        Ubuntu (64-bit)
UUID:            d1f7ffcb-3fab-4878-a77d-5fdb8d2f7fae
Config file:     /home/ansible/VirtualBox VMs/blue14_default_1482088614789_39851/blue14_default_1482088614789_39851.vbox
Snapshot folder: /home/ansible/VirtualBox VMs/blue14_default_1482088614789_39851/Snapshots
Log folder:      /home/ansible/VirtualBox VMs/blue14_default_1482088614789_39851/Logs
Hardware UUID:   d1f7ffcb-3fab-4878-a77d-5fdb8d2f7fae
Memory size:     3072MB
Page Fusion:     off
VRAM size:       32MB
CPU exec cap:    100%

 . . .

Effective Paravirt. Provider: KVM
State:           powered off (since 2016-10-30T20:11:22.000000000)
Monitor count:   1
3D Acceleration: off
2D Video Acceleration: off
Teleporter Enabled: off

 . . .

For this debugging scenario, we are trying to add the ability to toggle whether Vagrant brings up the Virtualbox VM with or without a GUI (i.e., “headless” or not). The line we are concerned about here is the following line, which shows the startvm line used to run the Virtualbox VM:

INFO subprocess: Starting process: ["/usr/bin/VBoxManage", "startvm", "89e0e942-3b3b-4f0a-b0e4-6d0bb51fef04", "--type", "headless"]

The default for Vagrant is to start VMs in headless mode. To instead boot with a GUI, the Vagrantfile should contain a provisioner block with the following setting:

config.vm.provider "virtualbox" do |v|
  v.gui = true
end

Note

It is important to note that the Vagrantfile is Ruby code, and that the above sets a Ruby boolean to the value true, which is not necessarily the same as the string "true".

Rather than requiring that the user edit the Vagrantfile, it would be more convenient to support passing an environment variable into the child process.

Using the following code snippets, we can inherit an environment variable (which is a string) and turn it into a boolean using a string comparison operation in a ternary logical expression.

# Set GUI to boolean false if environment variable GUI == 'true'
GUI = ENV['GUI'].nil? ? false : (ENV['GUI'] == 'true')

. . .

  # Conditionally control whether startvm uses "--type gui"
  # or "--type headless" using GUI (set earlier)
  config.vm.provider "virtualbox" do |v|
    v.gui = GUI
  end

. . .

Now we can test the setting of the environment variable on a vagrant command line, again with debug logging enabled and redirected into a second log file.

$ vagrant halt
==> default: Attempting graceful shutdown of VM...
$ vagrant destroy --force
==> default: Destroying VM and associated drives...
$ GUI=true VAGRANT_LOG=debug vagrant up --no-provision > /tmp/debug.log.2 2>&1

Now looking for the specific string in the output of both files, we can compare the results and see that we have the desired effect:

$ grep 'Starting process.*startvm' /tmp/debug.log.{1,2}
/tmp/debug.log.1: INFO subprocess: Starting process: ["/usr/bin/VBoxManage", "startvm", "89e0e942-3b3b-4f0a-b0e4-6d0bb51fef04", "--type", "headless"]
/tmp/debug.log.2: INFO subprocess: Starting process: ["/usr/bin/VBoxManage", "startvm", "3921e4e9-fdb4-4191-90b3-f7415ec0b37d", "--type", "gui"]

Other Tools for Diagnosing System Problems

smartmontools

Hardware makes up the physical layer of the DIMS system. Developers are currently using Dell Precision M4800 laptops to develop the software layers of DIMS.

These laptops have had multiple issues, specifically including not sleeping properly and heating up to extreme temperatures, heating up to extreme temperatures when not sitting on solid, very well ventilated surfaces, and these specific problems have led to malfunctions with the hard drives. At least one laptop has completely stopped being able to boot. Multiple other laptops have struggled during the boot up process and have had other problems that may indicate a near-term hard drive failure.

In an effort to turn a black box into less of a black box and to try to see ahead of time if there are any indicators that may be pointing to a failure before a failure, we are now employing the use of a tool called smartmontools. This package comes with two tools – smartctl and smartd – which control and monitor storage systems using the Self-Monitoring, Analysis and Reporting Technology System (SMART) built in to a lot of modern hard drives, including the ones on the developer laptops. When using this tool as a daemon, it can give advanced warning of disk degradation and failure. (For more information, see smartmontools home.

The package will be added to the list of base packages installed on all DIMS systems, and the rest of this section will be devoted to a brief introduction for how to use the tool.

Note

These instructions were taken from ubuntu smartmontools docs. If it differs on other Linux flavors (particularly Debian Jessie), new instructions will be added.

You will be using the smartctl utility to manually monitor your drives. First, you need to double check that your hard drive is SMART-enabled.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
[dimsenv] mboggess@dimsdev2:it/dims-adminguide/docs/source (develop*) $ sudo smartctl -i /dev/sda
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.4.0-42-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Laptop SSHD
Device Model:     ST1000LM014-1EJ164
Serial Number:    W771CY1P
LU WWN Device Id: 5 000c50 089fc94f9
Firmware Version: DEMB
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Fri Oct 14 11:08:25 2016 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

This output gives you information about the hard drive, including if SMART is support and enabled.

In the event that somehow SMART is available but not enabled, run

sudo smartctl -s on /dev/sda

There are several different types of tests you can run via smartctl. A full list is documented in the help/usage output which you can obtain by running

[dimsenv] mboggess@dimsdev2:it/dims-adminguide/docs/source (develop*) $ smartctl -h

To find an estimate of the time it will take to complete the various tests, run

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
[dimsenv] mboggess@dimsdev2:it/dims-adminguide/docs/source (develop*) $ sudo smartctl -c /dev/sda
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.4.0-42-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  139) seconds.
Offline data collection
capabilities:                    (0x73) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 191) minutes.
Conveyance self-test routine
recommended polling time:        (   3) minutes.
SCT capabilities:              (0x10b5) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

As you can see, the long test is rather long–191 minutes!

To run the long test, run

[dimsenv] mboggess@dimsdev2:it/dims-adminguide/docs/source (develop*) $ sudo smartctl -t long /dev/sda
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.4.0-42-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".
Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 191 minutes for test to complete.
Test will complete after Fri Oct 14 15:00:32 2016

Use smartctl -X to abort test.

To abort the test:

[dimsenv] mboggess@dimsdev2:it/dims-adminguide/docs/source (develop*) $ sudo smartctl -X /dev/sda
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.4.0-42-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Abort SMART off-line mode self-test routine".
Self-testing aborted!

To get test results, for a SATA drive, run

[dimsenv] mboggess@dimsdev2:it/dims-adminguide/docs/source (develop*) $ sudo smartctl -a -d ata /dev/sda

To get test results, for an IDE drive, run

[dimsenv] mboggess@dimsdev2:it/dims-adminguide/docs/source (develop*) $ sudo smartctl -a /dev/sda

Additionally, you can run smartmontools as a daemon, but for now, that will be left for an admin to research and develop on their own. In the future, this has potential to be turned into an Ansible role. Documentation from Ubuntu on how to use smartmontools as a daemon can be found in the daemon subsection of the Ubuntu smartmontools documentation.

Managing CoreOS with Systemd and Other Tools

This chapter covers using systemctl commands and other debugging commands and services for diagnosing problems on a CoreOS system.

CoreOS uses systemd as both a system and service manager and as an init system. The tool systemctl has many commands which allow a user to look at and control the state of systemd.

This is by no means an exhaustive list or description of the potential of any of the tools described here, merely an overview of tools and their most useful services. See the links provided within this chapter for more information. For more debugging information relevant to DIMS, see dimsdockerfiles:debuggingcoreos.

State of systemd

There are a few ways to check on the state of systemd, as a whole system.

  1. Check all running units and their state on a node at once.

      1
      2
      3
      4
      5
      6
      7
      8
      9
     10
     11
     12
     13
     14
     15
     16
     17
     18
     19
     20
     21
     22
     23
     24
     25
     26
     27
     28
     29
     30
     31
     32
     33
     34
     35
     36
     37
     38
     39
     40
     41
     42
     43
     44
     45
     46
     47
     48
     49
     50
     51
     52
     53
     54
     55
     56
     57
     58
     59
     60
     61
     62
     63
     64
     65
     66
     67
     68
     69
     70
     71
     72
     73
     74
     75
     76
     77
     78
     79
     80
     81
     82
     83
     84
     85
     86
     87
     88
     89
     90
     91
     92
     93
     94
     95
     96
     97
     98
     99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    core@core-01 ~ $ systemctl
    UNIT                      LOAD   ACTIVE     SUB          DESCRIPTIO
    boot.automount            loaded active     waiting      Boot parti
    sys-devices-pci0000:00-0000:00:01.1-ata1-host0-target0:0:0-0:0:0:0-
    sys-devices-pci0000:00-0000:00:01.1-ata1-host0-target0:0:0-0:0:0:0-
    sys-devices-pci0000:00-0000:00:01.1-ata1-host0-target0:0:0-0:0:0:0-
    sys-devices-pci0000:00-0000:00:01.1-ata1-host0-target0:0:0-0:0:0:0-
    sys-devices-pci0000:00-0000:00:01.1-ata1-host0-target0:0:0-0:0:0:0-
    sys-devices-pci0000:00-0000:00:01.1-ata1-host0-target0:0:0-0:0:0:0-
    sys-devices-pci0000:00-0000:00:01.1-ata1-host0-target0:0:0-0:0:0:0-
    sys-devices-pci0000:00-0000:00:01.1-ata1-host0-target0:0:0-0:0:0:0-
    sys-devices-pci0000:00-0000:00:03.0-virtio0-net-eth0.device loaded
    sys-devices-pci0000:00-0000:00:08.0-virtio1-net-eth1.device loaded
    sys-devices-platform-serial8250-tty-ttyS0.device loaded active
    sys-devices-platform-serial8250-tty-ttyS1.device loaded active
    sys-devices-platform-serial8250-tty-ttyS2.device loaded active
    sys-devices-platform-serial8250-tty-ttyS3.device loaded active
    sys-devices-virtual-net-docker0.device loaded active     plugged
    sys-devices-virtual-net-vethcbb3671.device loaded active     plugge
    sys-devices-virtual-tty-ttyprintk.device loaded active     plugged
    sys-subsystem-net-devices-docker0.device loaded active     plugged
    sys-subsystem-net-devices-eth0.device loaded active     plugged
    sys-subsystem-net-devices-eth1.device loaded active     plugged
    sys-subsystem-net-devices-vethcbb3671.device loaded active     plug
    -.mount                   loaded active     mounted      /
    boot.mount                loaded active     mounted      Boot parti
    dev-hugepages.mount       loaded active     mounted      Huge Pages
    dev-mqueue.mount          loaded active     mounted      POSIX Mess
    media.mount               loaded active     mounted      External M
    sys-kernel-debug.mount    loaded active     mounted      Debug File
    tmp.mount                 loaded active     mounted      Temporary
    usr-share-oem.mount       loaded active     mounted      /usr/share
    usr.mount                 loaded active     mounted      /usr
    coreos-cloudinit-vagrant-user.path loaded active     running      c
    motdgen.path              loaded active     waiting      Watch for
    systemd-ask-password-console.path loaded active     waiting      Di
    systemd-ask-password-wall.path loaded active     waiting      Forwa
    user-cloudinit@var-lib-coreos\x2dinstall-user_data.path loaded acti
    user-configdrive.path     loaded active     waiting      Watch for
    docker-201c7bd05ea49b654aa8b02a92dbb739a06dd3e8a4cc7813dcdc15aa4282
    docker-5f41c7d23012a856462d3a7876d7165715164d2b2c6edf3f94449c21d594
    docker-8323ab8192308e5a65102dffb109466c6a7c7f43ff28f356ea154a668b5f
    app-overlay.service       loaded activating auto-restart App overla
    audit-rules.service       loaded active     exited       Load Secur
    consul.service            loaded active     running      Consul boo
    coreos-setup-environment.service loaded active     exited       Mod
    data-overlay.service      loaded activating auto-restart Data overl
    dbus.service              loaded active     running      D-Bus Syst
    docker.service            loaded active     running      Docker App
    etcd2.service             loaded active     running      etcd2
    fleet.service             loaded active     running      fleet daem
    getty@tty1.service        loaded active     running      Getty on t
    kmod-static-nodes.service loaded active     exited       Create lis
    locksmithd.service        loaded active     running      Cluster re
    settimezone.service       loaded active     exited       Set the ti
    sshd-keygen.service       loaded active     exited       Generate s
    sshd@2-10.0.2.15:22-10.0.2.2:33932.service loaded active     runnin
    swarm-agent.service       loaded active     running      Swarm agen
    swarm-manager.service     loaded active     running      Swarm mana
    system-cloudinit@usr-share-oem-cloud\x2dconfig.yml.service loaded a
    system-cloudinit@var-tmp-hostname.yml.service loaded active     exi
    system-cloudinit@var-tmp-networks.yml.service loaded active     exi
    systemd-journal-flush.service loaded active     exited       Flush
    systemd-journald.service  loaded active     running      Journal Se
    systemd-logind.service    loaded active     running      Login Serv
    systemd-networkd.service  loaded active     running      Network Se
    systemd-random-seed.service loaded active     exited       Load/Sav
    systemd-resolved.service  loaded active     running      Network Na
    systemd-sysctl.service    loaded active     exited       Apply Kern
    systemd-timesyncd.service loaded active     running      Network Ti
    systemd-tmpfiles-setup-dev.service loaded active     exited       C
    ...skipping...
    systemd-udev-trigger.service loaded active     exited       udev Co
    systemd-udevd.service     loaded active     running      udev Kerne
    systemd-update-utmp.service loaded active     exited       Update U
    systemd-vconsole-setup.service loaded active     exited       Setup
    update-engine.service     loaded active     running      Update Eng
    user-cloudinit@var-lib-coreos\x2dvagrant-vagrantfile\x2duser\x2ddat
    -.slice                   loaded active     active       Root Slice
    system-addon\x2dconfig.slice loaded active     active       system-
    system-addon\x2drun.slice loaded active     active       system-add
    system-getty.slice        loaded active     active       system-get
    system-sshd.slice         loaded active     active       system-ssh
    system-system\x2dcloudinit.slice loaded active     active       sys
    system-user\x2dcloudinit.slice loaded active     active       syste
    system.slice              loaded active     active       System Sli
    user.slice                loaded active     active       User and S
    dbus.socket               loaded active     running      D-Bus Syst
    docker-tcp.socket         loaded active     running      Docker Soc
    docker.socket             loaded active     running      Docker Soc
    fleet.socket              loaded active     running      Fleet API
    rkt-metadata.socket       loaded active     listening    rkt metada
    sshd.socket               loaded active     listening    OpenSSH Se
    systemd-initctl.socket    loaded active     listening    /dev/initc
    systemd-journald-audit.socket loaded active     running      Journa
    systemd-journald-dev-log.socket loaded active     running      Jour
    systemd-journald.socket   loaded active     running      Journal So
    systemd-networkd.socket   loaded active     running      networkd r
    systemd-udevd-control.socket loaded active     running      udev Co
    systemd-udevd-kernel.socket loaded active     running      udev Ker
    basic.target              loaded active     active       Basic Syst
    cryptsetup.target         loaded active     active       Encrypted
    getty.target              loaded active     active       Login Prom
    local-fs-pre.target       loaded active     active       Local File
    local-fs.target           loaded active     active       Local File
    multi-user.target         loaded active     active       Multi-User
    network.target            loaded active     active       Network
    paths.target              loaded active     active       Paths
    remote-fs.target          loaded active     active       Remote Fil
    slices.target             loaded active     active       Slices
    sockets.target            loaded active     active       Sockets
    swap.target               loaded active     active       Swap
    sysinit.target            loaded active     active       System Ini
    system-config.target      loaded active     active       Load syste
    time-sync.target          loaded active     active       System Tim
    timers.target             loaded active     active       Timers
    user-config.target        loaded active     active       Load user-
    logrotate.timer           loaded active     waiting      Daily Log
    rkt-gc.timer              loaded active     waiting      Periodic G
    systemd-tmpfiles-clean.timer loaded active     waiting      Daily C
    
    LOAD   = Reflects whether the unit definition was properly loaded.
    ACTIVE = The high-level unit activation state, i.e. generalization
    SUB    = The low-level unit activation state, values depend on unit
    
    119 loaded units listed. Pass --all to see loaded but inactive unit
    To show all installed unit files use 'systemctl list-unit-files'.
    

    This shows all loaded units and their state, as well as a brief description of the units.

  2. For a slightly more organized look at the state of a node, along with a list of failed unites, queued jobs, and a process tree based on CGroup:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    [dimsenv] mboggess@dimsdev2:~/core-local () $ vagrant ssh core-03
    VM name: core-03 - IP: 172.17.8.103
    Last login: Tue Jan 26 15:49:34 2016 from 10.0.2.2
    CoreOS beta (877.1.0)
    core@core-03 ~ $ systemctl status
    ● core-03
        State: starting
         Jobs: 4 queued
       Failed: 0 units
        Since: Wed 2016-01-27 12:40:52 EST; 1min 0s ago
       CGroup: /
               ├─1 /usr/lib/systemd/systemd --switched-root --system --
               └─system.slice
                 ├─dbus.service
                 │ └─509 /usr/bin/dbus-daemon --system --address=system
                 ├─update-engine.service
                 │ └─502 /usr/sbin/update_engine -foreground -logtostde
                 ├─system-sshd.slice
                 │ └─sshd@2-10.0.2.15:22-10.0.2.2:58499.service
                 │   ├─869 sshd: core [priv]
                 │   ├─871 sshd: core@pts/0
                 │   ├─872 -bash
                 │   ├─878 systemctl status
                 │   └─879 systemctl status
                 ├─systemd-journald.service
                 │ └─387 /usr/lib/systemd/systemd-journald
                 ├─systemd-resolved.service
                 │ └─543 /usr/lib/systemd/systemd-resolved
                 ├─systemd-timesyncd.service
                 │ └─476 /usr/lib/systemd/systemd-timesyncd
                 ├─systemd-logind.service
                 │ └─505 /usr/lib/systemd/systemd-logind
                 ├─systemd-networkd.service
                 │ └─837 /usr/lib/systemd/systemd-networkd
                 ├─system-getty.slice
                 │ └─getty@tty1.service
                 │   └─507 /sbin/agetty --noclear tty1 linux
                 ├─system-user\x2dcloudinit.slice
                 │ └─user-cloudinit@var-lib-coreos\x2dvagrant-vagrantfi
                 │   └─658 /usr/bin/coreos-cloudinit --from-file=/var/l
                 ├─systemd-udevd.service
                 │ └─414 /usr/lib/systemd/systemd-udevd
                 ├─locksmithd.service
                 │ └─504 /usr/lib/locksmith/locksmithd
                 └─docker.service
                   ├─547 docker daemon --dns 172.18.0.1 --dns 8.8.8.8 -
                   └─control
                     └─742 /usr/bin/systemctl stop docker
    

    This shows the status of the node (line 7), how many jobs are queued (line 8), and any failed units (line 9). It also shows which services have started, and what command they are running at the time this status “snapshot” was taken.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    core@core-01 ~ $ systemctl status
    ● core-01
        State: running
         Jobs: 2 queued
       Failed: 0 units
        Since: Wed 2016-01-27 12:40:13 EST; 3min 28s ago
       CGroup: /
               ├─1 /usr/lib/systemd/systemd --switched-root --system --
               └─system.slice
                 ├─docker-5f41c7d23012a856462d3a7876d7165715164d2b2c6ed
                 │ └─1475 /swarm join --addr=172.17.8.101:2376 consul:/
                 ├─dbus.service
                 │ └─508 /usr/bin/dbus-daemon --system --address=system
                 ├─update-engine.service
                 │ └─517 /usr/sbin/update_engine -foreground -logtostde
                 ├─system-sshd.slice
                 │ └─sshd@2-10.0.2.15:22-10.0.2.2:33932.service
                 │   ├─ 860 sshd: core [priv]
                 │   ├─ 862 sshd: core@pts/0
                 │   ├─ 863 -bash
                 │   ├─1499 systemctl status
                 │   └─1500 systemctl status
                 ├─docker-201c7bd05ea49b654aa8b02a92dbb739a06dd3e8a4cc7
                 │ └─1461 /swarm manage -H tcp://172.17.8.101:8333 cons
                 ├─swarm-agent.service
                 │ ├─1437 /bin/bash /home/core/runswarmagent.sh 172.17.
                 │ └─1449 /usr/bin/docker run --name swarm-agent --net=
                 ├─systemd-journald.service
                 │ └─398 /usr/lib/systemd/systemd-journald
                 ├─fleet.service
                 │ └─918 /usr/bin/fleetd
                 ├─systemd-resolved.service
                 │ └─554 /usr/lib/systemd/systemd-resolved
                 ├─systemd-timesyncd.service
                 │ └─476 /usr/lib/systemd/systemd-timesyncd
                 ├─swarm-manager.service
                 │ ├─1405 /bin/bash /home/core/runswarmmanager.sh 172.1
                 │ └─1421 /usr/bin/docker run --name swarm-manager --ne
                 ├─systemd-logind.service
                 │ └─505 /usr/lib/systemd/systemd-logind
                 ├─systemd-networkd.service
                 │ └─829 /usr/lib/systemd/systemd-networkd
                 ├─system-getty.slice
                 │ └─getty@tty1.service
                 │   └─498 /sbin/agetty --noclear tty1 linux
                 ├─systemd-udevd.service
                 │ └─425 /usr/lib/systemd/systemd-udevd
                 ├─consul.service
                 │ ├─940 /bin/sh -c NUM_SERVERS=$(fleetctl list-machine
                 │ └─973 /usr/bin/docker run --name=consul-core-01 -v /
                 ├─docker-8323ab8192308e5a65102dffb109466c6a7c7f43ff28f
                 │ └─1371 /bin/consul agent -config-dir=/config -node c
                 ├─locksmithd.service
                 │ └─1125 /usr/lib/locksmith/locksmithd
                 ├─docker.service
                 │ ├─ 877 docker daemon --dns 172.18.0.1 --dns 8.8.8.8
                 │ ├─1004 docker-proxy -proto tcp -host-ip 172.17.8.101
                 │ ├─1011 docker-proxy -proto tcp -host-ip 172.17.8.101
                 │ ├─1027 docker-proxy -proto tcp -host-ip 172.17.8.101
                 │ ├─1036 docker-proxy -proto tcp -host-ip 172.17.8.101
                 │ ├─1057 docker-proxy -proto udp -host-ip 172.17.8.101
                 │ ├─1071 docker-proxy -proto tcp -host-ip 172.17.8.101
                 │ ├─1089 docker-proxy -proto udp -host-ip 172.17.8.101
                 │ ├─1108 docker-proxy -proto tcp -host-ip 172.17.8.101
                 │ └─1117 docker-proxy -proto udp -host-ip 172.18.0.1 -
                 └─etcd2.service
                   └─912 /usr/bin/etcd2 -name core-01 -initial-advertis
    core@core-01 ~ $ docker ps
    CONTAINER ID        IMAGE               COMMAND                  CR
    EATED              STATUS              PORTS
    
    
                                       NAMES
    5f41c7d23012        swarm:latest        "/swarm join --addr=1"   Ab
    out a minute ago   Up About a minute
    
    
                                       swarm-agent
    201c7bd05ea4        swarm:latest        "/swarm manage -H tcp"   Ab
    out a minute ago   Up About a minute
    
    
                                       swarm-manager
    8323ab819230        progrium/consul     "/bin/start -node cor"   2
    minutes ago        Up 2 minutes        172.17.8.101:8300-8302->8300
    -8302/tcp, 172.17.8.101:8400->8400/tcp, 172.17.8.101:8500->8500/tcp
    , 172.18.0.1:53->53/udp, 172.17.8.101:8600->8600/tcp, 172.17.8.101:
    8301-8302->8301-8302/udp, 53/tcp   consul-core-01
    

    This shows the status of another node in the cluster at a different point in the startup process. It still shows the status of the node, the number of jobs queued and failed units, but there are a lot more services in the process tree. Finally, at line 68, you see how to check on the status of active, running Docker containers.

    Note

    If docker ps seems to “hang”, this generally means there is one or more Docker containers trying to get started. Just be patient, and they should show up. To check that the Docker daemon is indeed running, try to run “docker info”. It might also hang until whatever activating container starts up, but as long as it doesn’t return immediately with “Cannot connect to the Docker daemon. Is the docker daemon running on this host?”, Docker is working, just be patient.

    If docker ps doesn’t hang but shows up with just headings and no containers, but you are expecting there to be containers, run docker ps -a. This will show all docker containers, even ones that have failed for some reason.

  3. systemd logs output to its journal. The journal is queried by a tool called journalctl. To see all journal output of all systemd processes since the node was created, run

    journalctl

    This is a lot of output, so it won’t be shown here. Use this tool to see output of all the things in one gigantic set. Particularly useful if you’re trying to see how different services might be affecting each other.

  4. To only see journal output for the last boot, run

    journalctl -b

    Same type of output as journalctl, but only since the last boot.

State of systemd units

All services run on a node with systemd are referred to as units. You can check the state of these units individually.

  1. Check the status of a unit and get the tail of its log output.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    core@core-01 ~ $ systemctl status consul.service -l
    ● consul.service - Consul bootstrap
       Loaded: loaded (/run/systemd/system/consul.service; disabled; ve
    ndor preset: disabled)
       Active: active (running) since Wed 2016-01-27 12:41:56 EST; 37mi
    n ago
      Process: 941 ExecStartPost=/bin/sh -c /usr/bin/etcdctl set "/serv
    ices/consul/bootstrap/servers/$COREOS_PUBLIC_IPV4" "$COREOS_PUBLIC_
    IPV4" (code=exited, status=0/SUCCESS)
      Process: 932 ExecStartPre=/bin/sh -c /usr/bin/etcdctl mk /service
    s/consul/bootstrap/host $COREOS_PUBLIC_IPV4 || sleep 10 (code=exite
    d, status=0/SUCCESS)
      Process: 926 ExecStartPre=/usr/bin/docker rm consul-%H (code=exit
    ed, status=0/SUCCESS)
      Process: 921 ExecStartPre=/usr/bin/docker kill consul-%H (code=ex
    ited, status=1/FAILURE)
     Main PID: 940 (sh)
       Memory: 28.0M
          CPU: 117ms
       CGroup: /system.slice/consul.service
               ├─940 /bin/sh -c NUM_SERVERS=$(fleetctl list-machines |
    grep -v "MACHINE" |wc -l)       && EXPECT=$(if [ $NUM_SERVERS -lt 3
     ] ; then echo 1; else echo 3; fi)       && JOIN_IP=$(etcdctl ls /s
    ervices/consul/bootstrap/servers          | grep -v $COREOS_PUBLIC_
    IPV4          | cut -d '/' -f 6          | head -n 1)       && JOIN
    =$(if [ "$JOIN_IP" != "" ] ; then sleep 10; echo "-join $JOIN_IP";
    else echo "-bootstrap-expect $EXPECT"; fi)       && /usr/bin/docker
     run --name=consul-core-01 -v /mnt:/data            -p 172.17.8.101
    :8300:8300            -p 172.17.8.101:8301:8301            -p 172.1
    7.8.101:8301:8301/udp            -p 172.17.8.101:8302:8302
       -p 172.17.8.101:8302:8302/udp            -p 172.17.8.101:8400:84
    00            -p 172.17.8.101:8500:8500            -p 172.17.8.101:
    8600:8600            -p 172.18.0.1:53:53/udp            progrium/co
    nsul -node core-01 -server -dc=local -advertise 172.17.8.101 $JOIN
               └─973 /usr/bin/docker run --name=consul-core-01 -v /mnt:
    /data -p 172.17.8.101:8300:8300 -p 172.17.8.101:8301:8301 -p 172.17
    .8.101:8301:8301/udp -p 172.17.8.101:8302:8302 -p 172.17.8.101:8302
    :8302/udp -p 172.17.8.101:8400:8400 -p 172.17.8.101:8500:8500 -p 17
    2.17.8.101:8600:8600 -p 172.18.0.1:53:53/udp progrium/consul -node
    core-01 -server -dc=local -advertise 172.17.8.101 -bootstrap-expect
     1
    
    Jan 27 12:43:35 core-01 sh[940]: 2016/01/27 17:43:35 [WARN] raft: R
    ejecting vote from 172.17.8.103:8300 since our last term is greater
     (43, 1)
    Jan 27 12:43:35 core-01 sh[940]: 2016/01/27 17:43:35 [WARN] raft: H
    eartbeat timeout reached, starting election
    Jan 27 12:43:35 core-01 sh[940]: 2016/01/27 17:43:35 [INFO] raft: N
    ode at 172.17.8.101:8300 [Candidate] entering Candidate state
    Jan 27 12:43:35 core-01 sh[940]: 2016/01/27 17:43:35 [INFO] raft: E
    lection won. Tally: 2
    Jan 27 12:43:35 core-01 sh[940]: 2016/01/27 17:43:35 [INFO] raft: N
    ode at 172.17.8.101:8300 [Leader] entering Leader state
    Jan 27 12:43:35 core-01 sh[940]: 2016/01/27 17:43:35 [INFO] consul:
     cluster leadership acquired
    Jan 27 12:43:35 core-01 sh[940]: 2016/01/27 17:43:35 [INFO] consul:
     New leader elected: core-01
    Jan 27 12:43:35 core-01 sh[940]: 2016/01/27 17:43:35 [WARN] raft: A
    ppendEntries to 172.17.8.103:8300 rejected, sending older logs (nex
    t: 479)
    Jan 27 12:43:35 core-01 sh[940]: 2016/01/27 17:43:35 [INFO] raft: p
    ipelining replication to peer 172.17.8.102:8300
    Jan 27 12:43:35 core-01 sh[940]: 2016/01/27 17:43:35 [INFO] raft: p
    ipelining replication to peer 172.17.8.103:8300
    

    The -l is important as the output will be truncated without it.

    This command also shows a multitude of things. It gives you a unit’s state as well as from what unit file location a unit is run. Unit files can be placed in multiple locations, and they are run according to a hierarchy, but the file shown by here (line 3) is the one that systemd actually runs.

    This command also shows the status of any commands used in the stopping or starting of a service (i.e., all the ExecStart* or ExecStop* directives in a unit file). See lines 9, 12, 14, 16. This is particularly useful if you have Exec* directives that could be the cause of a unit failure.

    The command run from the ExecStart directive is shown, starting at line 20.

    Finally, this command gives essentially the tail of the service’s journal output. As you can see at line 57, a Consul leader was elected!

  2. To see the unit file systemd runs, run

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    core@core-01 ~ $ systemctl cat consul.service
    # /run/systemd/system/consul.service
    [Unit]
    Description=Consul bootstrap
    Requires=docker.service fleet.service
    After=docker.service fleet.service
    
    [Service]
    EnvironmentFile=/etc/environment
    TimeoutStartSec=0
    ExecStartPre=-/usr/bin/docker kill consul-%H
    ExecStartPre=-/usr/bin/docker rm consul-%H
    ExecStartPre=/bin/sh -c "/usr/bin/etcdctl mk /services/consul/boots
    ExecStart=/bin/sh -c "NUM_SERVERS=$(fleetctl list-machines | grep -
         && EXPECT=$(if [ $NUM_SERVERS -lt 3 ] ; then echo 1; else echo
         && JOIN_IP=$(etcdctl ls /services/consul/bootstrap/servers \
            | grep -v $COREOS_PUBLIC_IPV4 \
            | cut -d '/' -f 6 \
            | head -n 1) \
         && JOIN=$(if [ \"$JOIN_IP\" != \"\" ] ; then sleep 10; echo \"
         && /usr/bin/docker run --name=consul-%H -v /mnt:/data \
              -p ${COREOS_PUBLIC_IPV4}:8300:8300 \
              -p ${COREOS_PUBLIC_IPV4}:8301:8301 \
              -p ${COREOS_PUBLIC_IPV4}:8301:8301/udp \
              -p ${COREOS_PUBLIC_IPV4}:8302:8302 \
              -p ${COREOS_PUBLIC_IPV4}:8302:8302/udp \
              -p ${COREOS_PUBLIC_IPV4}:8400:8400 \
              -p ${COREOS_PUBLIC_IPV4}:8500:8500 \
              -p ${COREOS_PUBLIC_IPV4}:8600:8600 \
              -p 172.18.0.1:53:53/udp \
              progrium/consul -node %H -server -dc=local -advertise ${C
    ExecStartPost=/bin/sh -c "/usr/bin/etcdctl set \"/services/consul/b
    ExecStop=/bin/sh -c "/usr/bin/etcdctl rm \"/services/consul/bootstr
    ExecStop=/bin/sh -c "/usr/bin/etcdctl rm /services/consul/bootstrap
    ExecStop=/usr/bin/docker stop consul-%H
    Restart=always
    RestartSec=10s
    LimitNOFILE=40000
    
    [Install]
    WantedBy=multi-user.target
    

    This command shows the service’s unit file directives. It also shows at the top (line 2) the location of the file. In this unit file, there are directives under three headings, “Unit”, “Service”, and “Install”. To learn more about what can go in each of these sections of a unit file, see freedesktop.org’s page on systemd unit files.

  3. To make changes to a unit file, run

    systemctl edit consul.service

    This will actually create a brand new file to which you can add or override directives to the unit definition. For slightly more information, see DigitalOcean’s How to Use Systemctl to Manage Systemd Services and Units.

  4. You can also edit the actual unit file, rather than just creating an override file by running

    systemctl edit --full consul.service

  5. systemd unit files have many directives used to configure the units. Some of these are set or have defaults that you may not be aware of. To see a list of the directives for a given unit and what these directives are set to, run

      1
      2
      3
      4
      5
      6
      7
      8
      9
     10
     11
     12
     13
     14
     15
     16
     17
     18
     19
     20
     21
     22
     23
     24
     25
     26
     27
     28
     29
     30
     31
     32
     33
     34
     35
     36
     37
     38
     39
     40
     41
     42
     43
     44
     45
     46
     47
     48
     49
     50
     51
     52
     53
     54
     55
     56
     57
     58
     59
     60
     61
     62
     63
     64
     65
     66
     67
     68
     69
     70
     71
     72
     73
     74
     75
     76
     77
     78
     79
     80
     81
     82
     83
     84
     85
     86
     87
     88
     89
     90
     91
     92
     93
     94
     95
     96
     97
     98
     99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    135
    136
    137
    138
    139
    140
    141
    142
    143
    144
    145
    146
    147
    core@core-01 ~ $ systemctl show consul.service
    Type=simple
    Restart=always
    NotifyAccess=none
    RestartUSec=10s
    TimeoutStartUSec=0
    TimeoutStopUSec=1min 30s
    WatchdogUSec=0
    WatchdogTimestamp=Wed 2016-01-27 12:41:56 EST
    WatchdogTimestampMonotonic=102810100
    StartLimitInterval=10000000
    StartLimitBurst=5
    StartLimitAction=none
    FailureAction=none
    PermissionsStartOnly=no
    RootDirectoryStartOnly=no
    RemainAfterExit=no
    GuessMainPID=yes
    MainPID=940
    ControlPID=0
    FileDescriptorStoreMax=0
    StatusErrno=0
    Result=success
    ExecMainStartTimestamp=Wed 2016-01-27 12:41:56 EST
    ExecMainStartTimestampMonotonic=102810054
    ExecMainExitTimestampMonotonic=0
    ExecMainPID=940
    ExecMainCode=0
    ExecMainStatus=0
    ExecStartPre={ path=/usr/bin/docker ; argv[]=/usr/bin/docker kill c
    ExecStartPre={ path=/usr/bin/docker ; argv[]=/usr/bin/docker rm con
    ExecStartPre={ path=/bin/sh ; argv[]=/bin/sh -c /usr/bin/etcdctl mk
    ExecStart={ path=/bin/sh ; argv[]=/bin/sh -c NUM_SERVERS=$(fleetctl
    ExecStartPost={ path=/bin/sh ; argv[]=/bin/sh -c /usr/bin/etcdctl s
    ExecStop={ path=/bin/sh ; argv[]=/bin/sh -c /usr/bin/etcdctl rm "/s
    ExecStop={ path=/bin/sh ; argv[]=/bin/sh -c /usr/bin/etcdctl rm /se
    ExecStop={ path=/usr/bin/docker ; argv[]=/usr/bin/docker stop consu
    Slice=system.slice
    ControlGroup=/system.slice/consul.service
    MemoryCurrent=29401088
    CPUUsageNSec=141291138
    Delegate=no
    CPUAccounting=no
    CPUShares=18446744073709551615
    StartupCPUShares=18446744073709551615
    CPUQuotaPerSecUSec=infinity
    BlockIOAccounting=no
    BlockIOWeight=18446744073709551615
    StartupBlockIOWeight=18446744073709551615
    MemoryAccounting=no
    MemoryLimit=18446744073709551615
    DevicePolicy=auto
    EnvironmentFile=/etc/environment (ignore_errors=no)
    UMask=0022
    LimitCPU=18446744073709551615
    LimitFSIZE=18446744073709551615
    LimitDATA=18446744073709551615
    LimitSTACK=18446744073709551615
    LimitCORE=18446744073709551615
    LimitRSS=18446744073709551615
    LimitNOFILE=40000
    LimitAS=18446744073709551615
    LimitNPROC=3873
    LimitMEMLOCK=65536
    LimitLOCKS=18446744073709551615
    LimitSIGPENDING=3873
    LimitMSGQUEUE=819200
    LimitNICE=0
    LimitRTPRIO=0
    LimitRTTIME=18446744073709551615
    OOMScoreAdjust=0
    Nice=0
    IOScheduling=0
    CPUSchedulingPolicy=0
    CPUSchedulingPriority=0
    TimerSlackNSec=50000
    CPUSchedulingResetOnFork=no
    NonBlocking=no
    StandardInput=null
    StandardOutput=journal
    StandardError=inherit
    TTYReset=no
    TTYVHangup=no
    TTYVTDisallocate=no
    SyslogPriority=30
    SyslogLevelPrefix=yes
    SecureBits=0
    CapabilityBoundingSet=18446744073709551615
    MountFlags=0
    PrivateTmp=no
    PrivateNetwork=no
    PrivateDevices=no
    ProtectHome=no
    ProtectSystem=no
    SameProcessGroup=no
    UtmpMode=init
    IgnoreSIGPIPE=yes
    NoNewPrivileges=no
    SystemCallErrorNumber=0
    RuntimeDirectoryMode=0755
    KillMode=control-group
    KillSignal=15
    SendSIGKILL=yes
    SendSIGHUP=no
    Id=consul.service
    Names=consul.service
    Requires=basic.target docker.service fleet.service
    Wants=system.slice
    RequiredBy=swarm-manager.service
    Conflicts=shutdown.target
    Before=shutdown.target swarm-manager.service
    After=system.slice systemd-journald.socket fleet.service docker.ser
    Description=Consul bootstrap
    LoadState=loaded
    ActiveState=active
    SubState=running
    FragmentPath=/run/systemd/system/consul.service
    UnitFileState=disabled
    UnitFilePreset=disabled
    InactiveExitTimestamp=Wed 2016-01-27 12:41:55 EST
    InactiveExitTimestampMonotonic=102215240
    ActiveEnterTimestamp=Wed 2016-01-27 12:41:56 EST
    ActiveEnterTimestampMonotonic=102891180
    ActiveExitTimestampMonotonic=0
    InactiveEnterTimestampMonotonic=0
    CanStart=yes
    CanStop=yes
    CanReload=no
    CanIsolate=no
    StopWhenUnneeded=no
    RefuseManualStart=no
    RefuseManualStop=no
    AllowIsolate=no
    DefaultDependencies=yes
    OnFailureJobMode=replace
    IgnoreOnIsolate=no
    IgnoreOnSnapshot=no
    NeedDaemonReload=no
    JobTimeoutUSec=0
    JobTimeoutAction=none
    ConditionResult=yes
    AssertResult=yes
    ConditionTimestamp=Wed 2016-01-27 12:41:55 EST
    ConditionTimestampMonotonic=102214129
    AssertTimestamp=Wed 2016-01-27 12:41:55 EST
    AssertTimestampMonotonic=102214129
    Transient=no
    
  6. To see all logs of a given unit since the node was created, run

    journalctl -u consul.service

  7. Watch the logs of a given unit since the last reboot, run

    journalctl -b -u consul.service

  8. Watch the tail of the logs of a unit.

    journalctl -fu consul.service

  9. To see logs with explanation texts, run

      1
      2
      3
      4
      5
      6
      7
      8
      9
     10
     11
     12
     13
     14
     15
     16
     17
     18
     19
     20
     21
     22
     23
     24
     25
     26
     27
     28
     29
     30
     31
     32
     33
     34
     35
     36
     37
     38
     39
     40
     41
     42
     43
     44
     45
     46
     47
     48
     49
     50
     51
     52
     53
     54
     55
     56
     57
     58
     59
     60
     61
     62
     63
     64
     65
     66
     67
     68
     69
     70
     71
     72
     73
     74
     75
     76
     77
     78
     79
     80
     81
     82
     83
     84
     85
     86
     87
     88
     89
     90
     91
     92
     93
     94
     95
     96
     97
     98
     99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    core@core-01 ~ $ journalctl -b -x -u consul.service
    -- Logs begin at Tue 2016-01-26 15:47:27 EST, end at Wed 2016-01-27 13:50:21 EST. --
    Jan 27 12:41:55 core-01 systemd[1]: Starting Consul bootstrap...
    -- Subject: Unit consul.service has begun start-up
    -- Defined-By: systemd
    -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
    --
    -- Unit consul.service has begun starting up.
    Jan 27 12:41:56 core-01 docker[921]: Error response from daemon: Cannot kill container consul-core-01: notrunning: Container cb7c6
    Jan 27 12:41:56 core-01 docker[921]: Error: failed to kill containers: [consul-core-01]
    Jan 27 12:41:56 core-01 docker[926]: consul-core-01
    Jan 27 12:41:56 core-01 sh[932]: 172.17.8.101
    Jan 27 12:41:56 core-01 sh[940]: Error retrieving list of active machines: googleapi: Error 503: fleet server unable to communicat
    Jan 27 12:41:56 core-01 sh[941]: 172.17.8.101
    Jan 27 12:41:56 core-01 systemd[1]: Started Consul bootstrap.
    -- Subject: Unit consul.service has finished start-up
    -- Defined-By: systemd
    -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
    --
    -- Unit consul.service has finished starting up.
    --
    -- The start-up result is done.
    Jan 27 12:42:39 core-01 sh[940]: ==> WARNING: BootstrapExpect Mode is specified as 1; this is the same as Bootstrap mode.
    Jan 27 12:42:39 core-01 sh[940]: ==> WARNING: Bootstrap mode enabled! Do not enable unless necessary
    Jan 27 12:42:39 core-01 sh[940]: ==> WARNING: It is highly recommended to set GOMAXPROCS higher than 1
    Jan 27 12:42:39 core-01 sh[940]: ==> Starting raft data migration...
    Jan 27 12:42:39 core-01 sh[940]: ==> Starting Consul agent...
    Jan 27 12:42:39 core-01 sh[940]: ==> Starting Consul agent RPC...
    Jan 27 12:42:39 core-01 sh[940]: ==> Consul agent running!
    Jan 27 12:42:39 core-01 sh[940]: Node name: 'core-01'
    Jan 27 12:42:39 core-01 sh[940]: Datacenter: 'local'
    Jan 27 12:42:39 core-01 sh[940]: Server: true (bootstrap: true)
    Jan 27 12:42:39 core-01 sh[940]: Client Addr: 0.0.0.0 (HTTP: 8500, HTTPS: -1, DNS: 53, RPC: 8400)
    Jan 27 12:42:39 core-01 sh[940]: Cluster Addr: 172.17.8.101 (LAN: 8301, WAN: 8302)
    Jan 27 12:42:39 core-01 sh[940]: Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
    Jan 27 12:42:39 core-01 sh[940]: Atlas: <disabled>
    Jan 27 12:42:39 core-01 sh[940]: ==> Log data will now stream in as it occurs:
    Jan 27 12:42:39 core-01 sh[940]: 2016/01/27 17:42:39 [INFO] serf: EventMemberJoin: core-01 172.17.8.101
    Jan 27 12:42:39 core-01 sh[940]: 2016/01/27 17:42:39 [INFO] serf: EventMemberJoin: core-01.local 172.17.8.101
    Jan 27 12:42:39 core-01 sh[940]: 2016/01/27 17:42:39 [INFO] raft: Node at 172.17.8.101:8300 [Follower] entering Follower state
    Jan 27 12:42:39 core-01 sh[940]: 2016/01/27 17:42:39 [WARN] serf: Failed to re-join any previously known node
    Jan 27 12:42:39 core-01 sh[940]: 2016/01/27 17:42:39 [WARN] serf: Failed to re-join any previously known node
    Jan 27 12:42:39 core-01 sh[940]: 2016/01/27 17:42:39 [INFO] consul: adding server core-01 (Addr: 172.17.8.101:8300) (DC: local)
    Jan 27 12:42:39 core-01 sh[940]: 2016/01/27 17:42:39 [INFO] consul: adding server core-01.local (Addr: 172.17.8.101:8300) (DC: loc
    Jan 27 12:42:39 core-01 sh[940]: 2016/01/27 17:42:39 [ERR] agent: failed to sync remote state: No cluster leader
    Jan 27 12:42:39 core-01 sh[940]: 2016/01/27 17:42:39 [ERR] http: Request /v1/kv/docker/nodes/172.19.0.1:2376, error: No cluster le
    Jan 27 12:42:39 core-01 sh[940]: 2016/01/27 17:42:39 [ERR] http: Request /v1/kv/docker/nodes/172.19.0.1:2376, error: No cluster le
    Jan 27 12:42:39 core-01 sh[940]: 2016/01/27 17:42:39 [INFO] serf: EventMemberJoin: core-02 172.17.8.102
    Jan 27 12:42:39 core-01 sh[940]: 2016/01/27 17:42:39 [INFO] consul: adding server core-02 (Addr: 172.17.8.102:8300) (DC: local)
    Jan 27 12:42:39 core-01 sh[940]: 2016/01/27 17:42:39 [ERR] http: Request /v1/kv/docker/nodes/172.19.0.1:2376, error: No cluster le
    Jan 27 12:42:39 core-01 sh[940]: 2016/01/27 17:42:39 [ERR] http: Request /v1/kv/docker/nodes/172.19.0.1:2376, error: No cluster le
    Jan 27 12:42:40 core-01 sh[940]: 2016/01/27 17:42:40 [WARN] raft: Heartbeat timeout reached, starting election
    Jan 27 12:42:40 core-01 sh[940]: 2016/01/27 17:42:40 [INFO] raft: Node at 172.17.8.101:8300 [Candidate] entering Candidate state
    Jan 27 12:42:40 core-01 sh[940]: 2016/01/27 17:42:40 [ERR] raft: Failed to make RequestVote RPC to 172.17.8.103:8300: dial tcp 172
    Jan 27 12:42:40 core-01 sh[940]: 2016/01/27 17:42:40 [INFO] raft: Election won. Tally: 2
    Jan 27 12:42:40 core-01 sh[940]: 2016/01/27 17:42:40 [INFO] raft: Node at 172.17.8.101:8300 [Leader] entering Leader state
    ...skipping...
    Jan 27 12:42:41 core-01 sh[940]: 2016/01/27 17:42:41 [ERR] raft: Failed to AppendEntries to 172.17.8.103:8300: dial tcp 172.17.8.1
    Jan 27 12:42:41 core-01 sh[940]: 2016/01/27 17:42:41 [ERR] raft: Failed to heartbeat to 172.17.8.103:8300: dial tcp 172.17.8.103:8
    Jan 27 12:42:41 core-01 sh[940]: 2016/01/27 17:42:41 [ERR] raft: Failed to AppendEntries to 172.17.8.103:8300: dial tcp 172.17.8.1
    Jan 27 12:42:41 core-01 sh[940]: 2016/01/27 17:42:41 [WARN] raft: Failed to contact 172.17.8.103:8300 in 509.786599ms
    Jan 27 12:42:41 core-01 sh[940]: 2016/01/27 17:42:41 [ERR] raft: Failed to heartbeat to 172.17.8.103:8300: dial tcp 172.17.8.103:8
    Jan 27 12:42:41 core-01 sh[940]: 2016/01/27 17:42:41 [ERR] raft: Failed to heartbeat to 172.17.8.103:8300: dial tcp 172.17.8.103:8
    Jan 27 12:42:41 core-01 sh[940]: 2016/01/27 17:42:41 [ERR] raft: Failed to AppendEntries to 172.17.8.103:8300: dial tcp 172.17.8.1
    Jan 27 12:42:41 core-01 sh[940]: 2016/01/27 17:42:41 [ERR] raft: Failed to heartbeat to 172.17.8.103:8300: dial tcp 172.17.8.103:8
    Jan 27 12:42:41 core-01 sh[940]: 2016/01/27 17:42:41 [WARN] raft: Failed to contact 172.17.8.103:8300 in 981.100031ms
    Jan 27 12:42:42 core-01 sh[940]: 2016/01/27 17:42:42 [ERR] raft: Failed to AppendEntries to 172.17.8.103:8300: dial tcp 172.17.8.1
    Jan 27 12:42:42 core-01 sh[940]: 2016/01/27 17:42:42 [ERR] raft: Failed to heartbeat to 172.17.8.103:8300: dial tcp 172.17.8.103:8
    Jan 27 12:42:42 core-01 sh[940]: 2016/01/27 17:42:42 [WARN] raft: Failed to contact 172.17.8.103:8300 in 1.480625817s
    Jan 27 12:42:42 core-01 sh[940]: 2016/01/27 17:42:42 [ERR] raft: Failed to heartbeat to 172.17.8.103:8300: dial tcp 172.17.8.103:8
    Jan 27 12:42:42 core-01 sh[940]: 2016/01/27 17:42:42 [ERR] raft: Failed to AppendEntries to 172.17.8.103:8300: dial tcp 172.17.8.1
    Jan 27 12:42:43 core-01 sh[940]: 2016/01/27 17:42:43 [ERR] raft: Failed to heartbeat to 172.17.8.103:8300: dial tcp 172.17.8.103:8
    Jan 27 12:42:44 core-01 sh[940]: 2016/01/27 17:42:44 [ERR] raft: Failed to AppendEntries to 172.17.8.103:8300: dial tcp 172.17.8.1
    Jan 27 12:42:44 core-01 sh[940]: 2016/01/27 17:42:44 [ERR] raft: Failed to heartbeat to 172.17.8.103:8300: dial tcp 172.17.8.103:8
    Jan 27 12:42:46 core-01 sh[940]: 2016/01/27 17:42:46 [ERR] raft: Failed to AppendEntries to 172.17.8.103:8300: dial tcp 172.17.8.1
    Jan 27 12:42:47 core-01 sh[940]: 2016/01/27 17:42:47 [ERR] raft: Failed to heartbeat to 172.17.8.103:8300: dial tcp 172.17.8.103:8
    Jan 27 12:42:51 core-01 sh[940]: 2016/01/27 17:42:51 [ERR] raft: Failed to AppendEntries to 172.17.8.103:8300: dial tcp 172.17.8.1
    Jan 27 12:42:52 core-01 sh[940]: 2016/01/27 17:42:52 [ERR] raft: Failed to heartbeat to 172.17.8.103:8300: dial tcp 172.17.8.103:8
    Jan 27 12:43:02 core-01 sh[940]: 2016/01/27 17:43:02 [ERR] raft: Failed to AppendEntries to 172.17.8.103:8300: dial tcp 172.17.8.1
    Jan 27 12:43:05 core-01 sh[940]: 2016/01/27 17:43:05 [ERR] raft: Failed to heartbeat to 172.17.8.103:8300: dial tcp 172.17.8.103:8
    Jan 27 12:43:14 core-01 sh[940]: 2016/01/27 17:43:14 [ERR] raft: Failed to AppendEntries to 172.17.8.103:8300: dial tcp 172.17.8.1
    Jan 27 12:43:17 core-01 sh[940]: 2016/01/27 17:43:17 [ERR] raft: Failed to heartbeat to 172.17.8.103:8300: dial tcp 172.17.8.103:8
    Jan 27 12:43:23 core-01 sh[940]: 2016/01/27 17:43:23 [INFO] serf: EventMemberJoin: core-03 172.17.8.103
    Jan 27 12:43:23 core-01 sh[940]: 2016/01/27 17:43:23 [INFO] consul: adding server core-03 (Addr: 172.17.8.103:8300) (DC: local)
    Jan 27 12:43:23 core-01 sh[940]: 2016/01/27 17:43:23 [INFO] consul: member 'core-03' joined, marking health alive
    Jan 27 12:43:24 core-01 sh[940]: 2016/01/27 17:43:24 [WARN] raft: AppendEntries to 172.17.8.103:8300 rejected, sending older logs
    Jan 27 12:43:24 core-01 sh[940]: 2016/01/27 17:43:24 [WARN] raft: Rejecting vote from 172.17.8.103:8300 since we have a leader: 17
    Jan 27 12:43:24 core-01 sh[940]: 2016/01/27 17:43:24 [WARN] raft: Failed to contact 172.17.8.103:8300 in 500.297851ms
    Jan 27 12:43:25 core-01 sh[940]: 2016/01/27 17:43:25 [WARN] raft: Failed to contact 172.17.8.103:8300 in 938.153601ms
    Jan 27 12:43:25 core-01 sh[940]: 2016/01/27 17:43:25 [WARN] raft: Rejecting vote from 172.17.8.103:8300 since we have a leader: 17
    Jan 27 12:43:25 core-01 sh[940]: 2016/01/27 17:43:25 [WARN] raft: Failed to contact 172.17.8.103:8300 in 1.424666193s
    Jan 27 12:43:27 core-01 sh[940]: 2016/01/27 17:43:27 [WARN] raft: Rejecting vote from 172.17.8.103:8300 since we have a leader: 17
    Jan 27 12:43:28 core-01 sh[940]: 2016/01/27 17:43:28 [WARN] raft: Rejecting vote from 172.17.8.103:8300 since we have a leader: 17
    Jan 27 12:43:30 core-01 sh[940]: 2016/01/27 17:43:30 [WARN] raft: Rejecting vote from 172.17.8.103:8300 since we have a leader: 17
    Jan 27 12:43:31 core-01 sh[940]: 2016/01/27 17:43:31 [WARN] raft: Rejecting vote from 172.17.8.103:8300 since we have a leader: 17
    Jan 27 12:43:33 core-01 sh[940]: 2016/01/27 17:43:33 [WARN] raft: Rejecting vote from 172.17.8.103:8300 since we have a leader: 17
    Jan 27 12:43:34 core-01 sh[940]: 2016/01/27 17:43:34 [WARN] raft: Rejecting vote from 172.17.8.103:8300 since we have a leader: 17
    Jan 27 12:43:34 core-01 sh[940]: 2016/01/27 17:43:34 [ERR] raft: peer 172.17.8.103:8300 has newer term, stopping replication
    Jan 27 12:43:34 core-01 sh[940]: 2016/01/27 17:43:34 [INFO] raft: Node at 172.17.8.101:8300 [Follower] entering Follower state
    Jan 27 12:43:34 core-01 sh[940]: 2016/01/27 17:43:34 [INFO] consul: cluster leadership lost
    Jan 27 12:43:34 core-01 sh[940]: 2016/01/27 17:43:34 [INFO] raft: aborting pipeline replication to peer 172.17.8.102:8300
    Jan 27 12:43:35 core-01 sh[940]: 2016/01/27 17:43:35 [WARN] raft: Rejecting vote from 172.17.8.103:8300 since our last term is gre
    Jan 27 12:43:35 core-01 sh[940]: 2016/01/27 17:43:35 [WARN] raft: Heartbeat timeout reached, starting election
    Jan 27 12:43:35 core-01 sh[940]: 2016/01/27 17:43:35 [INFO] raft: Node at 172.17.8.101:8300 [Candidate] entering Candidate state
    Jan 27 12:43:35 core-01 sh[940]: 2016/01/27 17:43:35 [INFO] raft: Election won. Tally: 2
    Jan 27 12:43:35 core-01 sh[940]: 2016/01/27 17:43:35 [INFO] raft: Node at 172.17.8.101:8300 [Leader] entering Leader state
    Jan 27 12:43:35 core-01 sh[940]: 2016/01/27 17:43:35 [INFO] consul: cluster leadership acquired
    Jan 27 12:43:35 core-01 sh[940]: 2016/01/27 17:43:35 [INFO] consul: New leader elected: core-01
    Jan 27 12:43:35 core-01 sh[940]: 2016/01/27 17:43:35 [WARN] raft: AppendEntries to 172.17.8.103:8300 rejected, sending older logs
    Jan 27 12:43:35 core-01 sh[940]: 2016/01/27 17:43:35 [INFO] raft: pipelining replication to peer 172.17.8.102:8300
    Jan 27 12:43:35 core-01 sh[940]: 2016/01/27 17:43:35 [INFO] raft: pipelining replication to peer 172.17.8.103:8300
    Jan 27 13:30:47 core-01 sh[940]: 2016/01/27 18:30:47 [INFO] agent.rpc: Accepted client: 127.0.0.1:44510
    

    Line 2 says what the date/time range of possible logs exist, but as you can see in line 3, the first log in this set is not a Jan 26 date, as could be possible according to line 2, but a Jan 27 date, which is the last time this node was rebooted.

    This service started up just fine, so there’s no failures to point out, but this is where you’d find them and any possible explanation for those failures.

  10. If the unit you are running is running a Docker container, all relevant and helpful information may not be available to you via journalctl. To see logs from the Docker container itself, run

      1
      2
      3
      4
      5
      6
      7
      8
      9
     10
     11
     12
     13
     14
     15
     16
     17
     18
     19
     20
     21
     22
     23
     24
     25
     26
     27
     28
     29
     30
     31
     32
     33
     34
     35
     36
     37
     38
     39
     40
     41
     42
     43
     44
     45
     46
     47
     48
     49
     50
     51
     52
     53
     54
     55
     56
     57
     58
     59
     60
     61
     62
     63
     64
     65
     66
     67
     68
     69
     70
     71
     72
     73
     74
     75
     76
     77
     78
     79
     80
     81
     82
     83
     84
     85
     86
     87
     88
     89
     90
     91
     92
     93
     94
     95
     96
     97
     98
     99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    135
    136
    137
    138
    139
    140
    141
    142
    143
    144
    145
    146
    147
    148
    149
    150
    151
    152
    153
    154
    155
    156
    157
    158
    159
    160
    161
    162
    163
    164
    165
    166
    167
    168
    169
    170
    171
    172
    173
    174
    175
    176
    177
    178
    179
    180
    181
    182
    183
    184
    185
    core@core-01 ~ $ docker logs consul-core-01
    ==> WARNING: BootstrapExpect Mode is specified as 1; this is the sa
    me as Bootstrap mode.
    ==> WARNING: Bootstrap mode enabled! Do not enable unless necessary
    ==> WARNING: It is highly recommended to set GOMAXPROCS higher than
     1
    ==> Starting raft data migration...
    ==> Starting Consul agent...
    ==> Starting Consul agent RPC...
    ==> Consul agent running!
             Node name: 'core-01'
            Datacenter: 'local'
                Server: true (bootstrap: true)
           Client Addr: 0.0.0.0 (HTTP: 8500, HTTPS: -1, DNS: 53, RPC: 8
    400)
          Cluster Addr: 172.17.8.101 (LAN: 8301, WAN: 8302)
        Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
                 Atlas: <disabled>
    
    ==> Log data will now stream in as it occurs:
    
        2016/01/27 17:42:39 [INFO] serf: EventMemberJoin: core-01 172.1
    7.8.101
        2016/01/27 17:42:39 [INFO] serf: EventMemberJoin: core-01.local
     172.17.8.101
        2016/01/27 17:42:39 [INFO] raft: Node at 172.17.8.101:8300 [Fol
    lower] entering Follower state
        2016/01/27 17:42:39 [WARN] serf: Failed to re-join any previous
    ly known node
        2016/01/27 17:42:39 [WARN] serf: Failed to re-join any previous
    ly known node
        2016/01/27 17:42:39 [INFO] consul: adding server core-01 (Addr:
     172.17.8.101:8300) (DC: local)
        2016/01/27 17:42:39 [INFO] consul: adding server core-01.local
    (Addr: 172.17.8.101:8300) (DC: local)
        2016/01/27 17:42:39 [ERR] agent: failed to sync remote state: N
    o cluster leader
        2016/01/27 17:42:39 [ERR] http: Request /v1/kv/docker/nodes/172
    .19.0.1:2376, error: No cluster leader
        2016/01/27 17:42:39 [ERR] http: Request /v1/kv/docker/nodes/172
    .19.0.1:2376, error: No cluster leader
        2016/01/27 17:42:39 [INFO] serf: EventMemberJoin: core-02 172.1
    7.8.102
        2016/01/27 17:42:39 [INFO] consul: adding server core-02 (Addr:
     172.17.8.102:8300) (DC: local)
        2016/01/27 17:42:39 [ERR] http: Request /v1/kv/docker/nodes/172
    .19.0.1:2376, error: No cluster leader
        2016/01/27 17:42:39 [ERR] http: Request /v1/kv/docker/nodes/172
    .19.0.1:2376, error: No cluster leader
        2016/01/27 17:42:40 [WARN] raft: Heartbeat timeout reached, sta
    rting election
        2016/01/27 17:42:40 [INFO] raft: Node at 172.17.8.101:8300 [Can
    didate] entering Candidate state
        2016/01/27 17:42:40 [ERR] raft: Failed to make RequestVote RPC
    to 172.17.8.103:8300: dial tcp 172.17.8.103:8300: connection refuse
    d
        2016/01/27 17:42:40 [INFO] raft: Election won. Tally: 2
        2016/01/27 17:42:40 [INFO] raft: Node at 172.17.8.101:8300 [Lea
    der] entering Leader state
        2016/01/27 17:42:40 [INFO] consul: cluster leadership acquired
        2016/01/27 17:42:40 [INFO] consul: New leader elected: core-01
        2016/01/27 17:42:40 [INFO] raft: Disabling EnableSingleNode (bo
    otstrap)
        2016/01/27 17:42:40 [ERR] raft: Failed to AppendEntries to 172.
    17.8.103:8300: dial tcp 172.17.8.103:8300: connection refused
        2016/01/27 17:42:40 [INFO] raft: pipelining replication to peer
     172.17.8.102:8300
        2016/01/27 17:42:40 [ERR] raft: Failed to AppendEntries to 172.
    17.8.103:8300: dial tcp 172.17.8.103:8300: connection refused
        2016/01/27 17:42:40 [INFO] consul: member 'core-03' reaped, der
    egistering
        2016/01/27 17:42:41 [ERR] raft: Failed to AppendEntries to 172.
    17.8.103:8300: dial tcp 172.17.8.103:8300: connection refused
        2016/01/27 17:42:41 [ERR] raft: Failed to heartbeat to 172.17.8
    .103:8300: dial tcp 172.17.8.103:8300: connection refused
        2016/01/27 17:42:41 [ERR] raft: Failed to AppendEntries to 172.
    17.8.103:8300: dial tcp 172.17.8.103:8300: connection refused
        2016/01/27 17:42:41 [ERR] raft: Failed to heartbeat to 172.17.8
    .103:8300: dial tcp 172.17.8.103:8300: connection refused
        2016/01/27 17:42:41 [ERR] raft: Failed to AppendEntries to 172.
    17.8.103:8300: dial tcp 172.17.8.103:8300: connection refused
        2016/01/27 17:42:41 [ERR] raft: Failed to heartbeat to 172.17.8
    .103:8300: dial tcp 172.17.8.103:8300: connection refused
        2016/01/27 17:42:41 [ERR] raft: Failed to AppendEntries to 172.
    17.8.103:8300: dial tcp 172.17.8.103:8300: connection refused
        2016/01/27 17:42:41 [WARN] raft: Failed to contact 172.17.8.103
    :8300 in 509.786599ms
        2016/01/27 17:42:41 [ERR] raft: Failed to heartbeat to 172.17.8
    .103:8300: dial tcp 172.17.8.103:8300: connection refused
        2016/01/27 17:42:41 [ERR] raft: Failed to heartbeat to 172.17.8
    .103:8300: dial tcp 172.17.8.103:8300: connection refused
        2016/01/27 17:42:41 [ERR] raft: Failed to AppendEntries to 172.
    17.8.103:8300: dial tcp 172.17.8.103:8300: connection refused
        2016/01/27 17:42:41 [ERR] raft: Failed to heartbeat to 172.17.8
    .103:8300: dial tcp 172.17.8.103:8300: connection refused
        2016/01/27 17:42:41 [WARN] raft: Failed to contact 172.17.8.103
    :8300 in 981.100031ms
        2016/01/27 17:42:42 [ERR] raft: Failed to AppendEntries to 172.
    17.8.103:8300: dial tcp 172.17.8.103:8300: connection refused
        2016/01/27 17:42:42 [ERR] raft: Failed to heartbeat to 172.17.8
    .103:8300: dial tcp 172.17.8.103:8300: connection refused
        2016/01/27 17:42:42 [WARN] raft: Failed to contact 172.17.8.103
    :8300 in 1.480625817s
        2016/01/27 17:42:42 [ERR] raft: Failed to heartbeat to 172.17.8
    .103:8300: dial tcp 172.17.8.103:8300: connection refused
        2016/01/27 17:42:42 [ERR] raft: Failed to AppendEntries to 172.
    17.8.103:8300: dial tcp 172.17.8.103:8300: connection refused
        2016/01/27 17:42:43 [ERR] raft: Failed to heartbeat to 172.17.8
    .103:8300: dial tcp 172.17.8.103:8300: connection refused
        2016/01/27 17:42:44 [ERR] raft: Failed to AppendEntries to 172.
    17.8.103:8300: dial tcp 172.17.8.103:8300: connection refused
        2016/01/27 17:42:44 [ERR] raft: Failed to heartbeat to 172.17.8
    .103:8300: dial tcp 172.17.8.103:8300: connection refused
        2016/01/27 17:42:46 [ERR] raft: Failed to AppendEntries to 172.
    17.8.103:8300: dial tcp 172.17.8.103:8300: connection refused
        2016/01/27 17:42:47 [ERR] raft: Failed to heartbeat to 172.17.8
    .103:8300: dial tcp 172.17.8.103:8300: connection refused
        2016/01/27 17:42:51 [ERR] raft: Failed to AppendEntries to 172.
    17.8.103:8300: dial tcp 172.17.8.103:8300: connection refused
        2016/01/27 17:42:52 [ERR] raft: Failed to heartbeat to 172.17.8
    .103:8300: dial tcp 172.17.8.103:8300: connection refused
        2016/01/27 17:43:02 [ERR] raft: Failed to AppendEntries to 172.
    17.8.103:8300: dial tcp 172.17.8.103:8300: no route to host
        2016/01/27 17:43:05 [ERR] raft: Failed to heartbeat to 172.17.8
    .103:8300: dial tcp 172.17.8.103:8300: no route to host
        2016/01/27 17:43:14 [ERR] raft: Failed to AppendEntries to 172.
    17.8.103:8300: dial tcp 172.17.8.103:8300: no route to host
        2016/01/27 17:43:17 [ERR] raft: Failed to heartbeat to 172.17.8
    .103:8300: dial tcp 172.17.8.103:8300: no route to host
        2016/01/27 17:43:23 [INFO] serf: EventMemberJoin: core-03 172.1
    7.8.103
        2016/01/27 17:43:23 [INFO] consul: adding server core-03 (Addr:
     172.17.8.103:8300) (DC: local)
        2016/01/27 17:43:23 [INFO] consul: member 'core-03' joined, mar
    king health alive
        2016/01/27 17:43:24 [WARN] raft: AppendEntries to 172.17.8.103:
    8300 rejected, sending older logs (next: 479)
        2016/01/27 17:43:24 [WARN] raft: Rejecting vote from 172.17.8.1
    03:8300 since we have a leader: 172.17.8.101:8300
        2016/01/27 17:43:24 [WARN] raft: Failed to contact 172.17.8.103
    :8300 in 500.297851ms
        2016/01/27 17:43:25 [WARN] raft: Failed to contact 172.17.8.103
    :8300 in 938.153601ms
        2016/01/27 17:43:25 [WARN] raft: Rejecting vote from 172.17.8.1
    03:8300 since we have a leader: 172.17.8.101:8300
        2016/01/27 17:43:25 [WARN] raft: Failed to contact 172.17.8.103
    :8300 in 1.424666193s
        2016/01/27 17:43:27 [WARN] raft: Rejecting vote from 172.17.8.1
    03:8300 since we have a leader: 172.17.8.101:8300
        2016/01/27 17:43:28 [WARN] raft: Rejecting vote from 172.17.8.1
    03:8300 since we have a leader: 172.17.8.101:8300
        2016/01/27 17:43:30 [WARN] raft: Rejecting vote from 172.17.8.1
    03:8300 since we have a leader: 172.17.8.101:8300
        2016/01/27 17:43:31 [WARN] raft: Rejecting vote from 172.17.8.1
    03:8300 since we have a leader: 172.17.8.101:8300
        2016/01/27 17:43:33 [WARN] raft: Rejecting vote from 172.17.8.1
    03:8300 since we have a leader: 172.17.8.101:8300
        2016/01/27 17:43:34 [WARN] raft: Rejecting vote from 172.17.8.1
    03:8300 since we have a leader: 172.17.8.101:8300
        2016/01/27 17:43:34 [ERR] raft: peer 172.17.8.103:8300 has newe
    r term, stopping replication
        2016/01/27 17:43:34 [INFO] raft: Node at 172.17.8.101:8300 [Fol
    lower] entering Follower state
        2016/01/27 17:43:34 [INFO] consul: cluster leadership lost
        2016/01/27 17:43:34 [INFO] raft: aborting pipeline replication
    to peer 172.17.8.102:8300
        2016/01/27 17:43:35 [WARN] raft: Rejecting vote from 172.17.8.1
    03:8300 since our last term is greater (43, 1)
        2016/01/27 17:43:35 [WARN] raft: Heartbeat timeout reached, sta
    rting election
        2016/01/27 17:43:35 [INFO] raft: Node at 172.17.8.101:8300 [Can
    didate] entering Candidate state
        2016/01/27 17:43:35 [INFO] raft: Election won. Tally: 2
        2016/01/27 17:43:35 [INFO] raft: Node at 172.17.8.101:8300 [Lea
    der] entering Leader state
        2016/01/27 17:43:35 [INFO] consul: cluster leadership acquired
        2016/01/27 17:43:35 [INFO] consul: New leader elected: core-01
        2016/01/27 17:43:35 [WARN] raft: AppendEntries to 172.17.8.103:
    8300 rejected, sending older logs (next: 479)
        2016/01/27 17:43:35 [INFO] raft: pipelining replication to peer
     172.17.8.102:8300
        2016/01/27 17:43:35 [INFO] raft: pipelining replication to peer
     172.17.8.103:8300
        2016/01/27 18:30:47 [INFO] agent.rpc: Accepted client: 127.0.0.
    1:44510
    

    This is generally the same output what you can get from journalctl, but I think I have found other information in the docker logs than journalctl by itself.

    Note

    The name of the systemd service and the name of the Docker container might NOT be the same. They can be the same. However, if, as in this example, you name your service “foo” so the service is “foo.service”, and you name your Docker container “foo-$hostname”, running docker logs foo.service or docker logs foo will not work. Don’t get upset with Docker when it tells you there’s no such container “foo.service” when you named a container “foo-$hostname”. :)

  11. To follow the logs in real time, run

    docker logs -f consul-core-01

Managing systemd units

  1. You can start, stop, restart, and reload units with

    sudo systemctl {start|stop|reload|restart} consul.service

    You must run with sudo.

    The “reload” option works for units which can reload their configurations without restarting.

  2. When you make changes to a unit and are going to restart that unit, first you must let the system daemon know that changes are happening:

    sudo systemctl daemon-reload

Warning

This may seem obvious, but it’s a good thing to remember: if a systemd unit is running a Docker container, if you restart the unit, this doesn’t necessarily mean the Docker container gets removed and you get a new container when the unit is restarted.

Managing Virtualbox VMs

This chapter covers using Virtualbox command line tools, most importantly VBoxManage, to manage core DIMS virtual machines.

Note

See also the descriptions of dimsasbuilt:wellington and dimsasbuilt:stirling in dimsasbuilt:dimsasbuilt.

Remotely Managing Virtualbox

Virtualbox can be managed remotely using X11 (“X Window System”) clients like those in virt tools. From a system running an X11 server, you can use SSH with:

[root@wellington ~]# VBoxManage list runningvms
"vpn" {4f6ed378-8a9d-4c69-a380-2c194bc4eae0}
"foswiki" {8978f52d-1251-4fea-a3d7-8d9a0950bad1}
"lapp" {511b9f91-9323-476e-baf3-9bc64f97511e}
"jira" {c873db45-b81a-47fe-a5e3-6bdfe96b0dea}
"jenkins" {28e023eb-f4c4-40f5-b4e8-d37cfafde3be}
"linda-vm1" {df5fdc5e-d508-4007-9f5d-84a000a2b5c5}
"sso" {3916fa49-d251-4ced-9275-c8757aceaf66}
"u12-dev-ws-1" {9f58eca0-b3a6-451e-9b2b-f458c75d6869}
"u12-dev-svr-1" {cc1fefa3-61f4-4d67-b767-1f4add8f760a}
"hub" {4b530a22-df34-4fd2-89df-2e0a5844b397}
[lparsons@wellington ~]$ vboxmanage list bridgedifs
Name:            em1
GUID:            00316d65-0000-4000-8000-f04da240a9e1
DHCP:            Disabled
IPAddress:       172.28.234.234
NetworkMask:     255.255.255.0
IPV6Address:     fe80:0000:0000:0000:f24d:a2ff:fe40:a9e1
IPV6NetworkMaskPrefixLength: 64
HardwareAddress: f0:4d:a2:40:a9:e1
MediumType:      Ethernet
Status:          Up
VBoxNetworkName: HostInterfaceNetworking-em1

Name:            em2
GUID:            00326d65-0000-4000-8000-f04da240a9e3
DHCP:            Disabled
IPAddress:       0.0.0.0
NetworkMask:     0.0.0.0
IPV6Address:
IPV6NetworkMaskPrefixLength: 0
HardwareAddress: f0:4d:a2:40:a9:e3
MediumType:      Ethernet
Status:          Down
VBoxNetworkName: HostInterfaceNetworking-em2

Name:            em3
GUID:            00336d65-0000-4000-8000-f04da240a9e5
DHCP:            Disabled
IPAddress:       0.0.0.0
NetworkMask:     0.0.0.0
IPV6Address:
IPV6NetworkMaskPrefixLength: 0
HardwareAddress: f0:4d:a2:40:a9:e5
MediumType:      Ethernet
Status:          Down
VBoxNetworkName: HostInterfaceNetworking-em3

Name:            em4
GUID:            00346d65-0000-4000-8000-f04da240a9e7
DHCP:            Disabled
IPAddress:       10.11.11.1
NetworkMask:     255.255.255.0
IPV6Address:     fe80:0000:0000:0000:f24d:a2ff:fe40:a9e7
IPV6NetworkMaskPrefixLength: 64
HardwareAddress: f0:4d:a2:40:a9:e7
MediumType:      Ethernet
Status:          Up
VBoxNetworkName: HostInterfaceNetworking-em4
[lparsons@wellington ~]$ vboxmanage list hostonlyifs
Name:            vboxnet0
GUID:            786f6276-656e-4074-8000-0a0027000000
DHCP:            Disabled
IPAddress:       192.168.88.0
NetworkMask:     255.255.255.0
IPV6Address:     fe80:0000:0000:0000:0800:27ff:fe00:0000
IPV6NetworkMaskPrefixLength: 64
HardwareAddress: 0a:00:27:00:00:00
MediumType:      Ethernet
Status:          Up
VBoxNetworkName: HostInterfaceNetworking-vboxnet0

Name:            vboxnet1
GUID:            786f6276-656e-4174-8000-0a0027000001
DHCP:            Disabled
IPAddress:       192.168.57.1
NetworkMask:     255.255.255.0
IPV6Address:
IPV6NetworkMaskPrefixLength: 0
HardwareAddress: 0a:00:27:00:00:01
MediumType:      Ethernet
Status:          Down
VBoxNetworkName: HostInterfaceNetworking-vboxnet1

Name:            vboxnet2
GUID:            786f6276-656e-4274-8000-0a0027000002
DHCP:            Disabled
IPAddress:       192.168.58.1
NetworkMask:     255.255.255.0
IPV6Address:
IPV6NetworkMaskPrefixLength: 0
HardwareAddress: 0a:00:27:00:00:02
MediumType:      Ethernet
Status:          Down
VBoxNetworkName: HostInterfaceNetworking-vboxnet2

Name:            vboxnet3
GUID:            786f6276-656e-4374-8000-0a0027000003
DHCP:            Disabled
IPAddress:       172.17.8.1
NetworkMask:     255.255.255.0
IPV6Address:     fe80:0000:0000:0000:0800:27ff:fe00:0003
IPV6NetworkMaskPrefixLength: 64
HardwareAddress: 0a:00:27:00:00:03
MediumType:      Ethernet
Status:          Up
VBoxNetworkName: HostInterfaceNetworking-vboxnet3
[lparsons@wellington ~]$ sudo vboxmanage list dhcpservers
NetworkName:    HostInterfaceNetworking-vboxnet0
IP:             192.168.88.100
NetworkMask:    255.255.255.0
lowerIPAddress: 192.168.88.102
upperIPAddress: 192.168.88.254
Enabled:        Yes

NetworkName:    HostInterfaceNetworking-vboxnet2
IP:             0.0.0.0
NetworkMask:    0.0.0.0
lowerIPAddress: 0.0.0.0
upperIPAddress: 0.0.0.0
Enabled:        No

NetworkName:    HostInterfaceNetworking-vboxnet1
IP:             0.0.0.0
NetworkMask:    0.0.0.0
lowerIPAddress: 0.0.0.0
upperIPAddress: 0.0.0.0
Enabled:        No

http://superuser.com/questions/375316/closing-gui-session-while-running-virtual-mashine-virtual-box

Appendices

Add New Connection to Apache Directory Studio

Note

These instructions are based on contents from this original DIMS project FosWiki Provision New Users page.

Note

We are in the process of moving to a “split-horizon DNS” configuration using the subdomains ops.develop and/or devops.develop as opposed to the original monolithic domain prisem.washington.edu that was being overlayed with both routable and non-routable IP address mappings. As a result, some configuration using the original prisem.washington.edu domain remains, such as the DN entry information shown below.

If you have never connected to our LDAP before, you will need to add the connection to Apache Directory Studio (apache-directory-studio). You can see your saved connections in the Connections tab. To add a new connection, do the following:

  1. On the LDAP menu, select New Connection. The Network Parameter dialog will display.

    _images/apache-directory-studio-newconnection.png

    Entering Network Parameters

    1. Enter a name for the connection. Use ldap.devops.develop
    2. Enter hostname: ldap.devops.develop
    3. Port should be 389
    4. No encryption
  2. You can click Check Nework Parameter to check the connection

  3. Click Next. The Authentication dialog will display.

    _images/apache-directory-studio-connect.png

    LDAP Connection Authentication

    1. Leave Authentication Method as Simple Authentication
    2. Bind DN or user: cn=admin,dc=prisem,dc=washington,dc=edu
    3. Bind password: [See the FosWiki Provision New Users page for password.]
    4. Click the checkbox to save the password if it is not already checked.
    5. Click the Check Authentication button to make sure you can authenticate.
  4. Click Finish. The new connection will appear in the Connections list and will open. If you minimize the Welcome window, the LDAP Brower window will occupy the full application window and will remain visible as you operate on the connection.

    _images/apache-directory-studio-browser.png

    Main LDAP Browser window

Contact

Section author: Dave Dittrich (@davedittrich) <dittrich @ u.washington.edu>

License

Copyright © 2014, 2016 University of Washington. All rights reserved.

Berkeley Three Clause License
=============================

Copyright (c) 2014, 2015 University of Washington. All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.

3. Neither the name of the copyright holder nor the names of its contributors
may be used to endorse or promote products derived from this software without
specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.