6. Site Configuration Tutorial

The following file is an example site definition provided in the PulpDist source tree (as misc/example_site.json) for demonstration purposes:

{
  "SITE_SETTINGS": [
    {
      "site_id": "default",
      "name": "Default Site",
      "storage_prefix": "/var/www/pub",
      "server_prefixes": {
        "demo_server": "sync_demo",
        "other_demo_server": "sync_demo_trees"
      },
      "source_prefixes": {
        "sync_demo": "sync_demo_trees"
      },
      "exclude_from_sync": ["*dull*"],
      "exclude_from_listing": ["*justfortesting*"]
    },
    {
      "site_id": "other",
      "name": "Other Site",
      "storage_prefix": "/var/www/pub/sync_demo"
    }
  ],
  "LOCAL_MIRRORS": [
    {
      "mirror_id": "simple_sync",
      "tree_id": "simple_sync",
      "exclude_from_sync": ["*skip*"],
      "sync_filters": ["exclude_irrelevant/"],
      "notes": {
        "basic": "note",
        "site_custom": {
          "origin": "PulpDist example repository"
        }
      }
    },
    {
      "mirror_id": "versioned_sync",
      "tree_id": "versioned_sync",
      "site_id": "other",
      "exclude_from_sync": ["*skip*"],
      "sync_filters": ["exclude_dull/"],
      "exclude_from_listing": ["relevant-but*"],
      "notes": {
        "site_custom": {
          "origin": "PulpDist example repository"
        }
      }
    },
    {
      "mirror_id": "snapshot_sync",
      "tree_id": "snapshot_sync",
      "notes": {
        "site_custom": {
          "origin": "PulpDist example repository"
        }
      }
    }
  ],
  "REMOTE_TREES": [
    {
      "tree_id": "simple_sync",
      "name": "Simple Sync Demo",
      "description": "Demonstration of the simple tree sync plugin",
      "tree_path": "simple",
      "sync_type": "simple",
      "sync_hours": 0,
      "source_id": "sync_demo"
    },
    {
      "tree_id": "versioned_sync",
      "name": "Versioned Sync Demo",
      "description": "Demonstration of the versioned tree sync plugin",
      "tree_path": "versioned",
      "sync_type": "versioned",
      "sync_hours": 12,
      "source_id": "sync_demo_other",
      "listing_pattern": "relevant*",
      "exclude_from_sync": ["*skip*"],
      "sync_filters": ["exclude_irrelevant/"]
    },
    {
      "tree_id": "snapshot_sync",
      "name": "Snapshot Sync Demo",
      "description": "Demonstration of the snapshot tree sync plugin",
      "tree_path": "snapshot",
      "sync_type": "snapshot",
      "sync_hours": 1,
      "source_id": "sync_demo",
      "listing_prefix": "re*ev",
      "latest_link": "latest-relevant",
      "exclude_from_listing": ["relevant-but*"],
      "exclude_from_sync": ["*skip*"],
      "sync_filters": ["exclude_irrelevant/", "exclude_dull/"]
    }
  ],
  "REMOTE_SOURCES": [
    {
      "source_id": "sync_demo",
      "server_id": "demo_server",
      "name": "Sync Demo Trees",
      "remote_path": "demo",
      "listing_suffix": "*"
    },
    {
      "source_id": "sync_demo_other",
      "server_id": "other_demo_server",
      "name": "Other Sync Demo Trees",
      "remote_path": "demo",
      "listing_suffix": "*"
    }
  ],
  "REMOTE_SERVERS": [
    {
      "server_id": "demo_server",
      "name": "Sync Demo Server",
      "dns": "localhost"
    },
    {
      "server_id": "other_demo_server",
      "name": "Other Sync Demo Server",
      "dns": "localhost"
    }
  ],
  "RAW_REPOS": [
    {
      "repo_id": "raw_sync",
      "display_name": "Raw Sync Demo",
      "description": "Demonstration of raw sync configuration in site config",
      "notes": {
        "pulpdist": {
          "sync_hours": 24
        },
        "site_custom": {
          "origin": "PulpDist example repository"
        }
      },
      "importer_type_id": "simple_tree",
      "importer_config": {
        "tree_name": "Raw Simple Tree",
        "remote_server": "localhost",
        "remote_path": "/demo/simple/",
        "local_path": "/var/www/pub/sync_demo_raw/",
        "exclude_from_sync": ["*skip*"],
        "sync_filters": ["exclude_irrelevant/", "exclude_dull/"]
      }
    }
  ]
}

The example configuration is actually based on the PulpDist test suite - it is designed to exercise most of the major features of the PulpDist plugins in a single comprehensive scenario (some other key features, such as the use of PROTECTED files to prevent the deletion of directories, or the creation of symlinks to the most recent snapshot directory, are testing by setting up the standard scenario and adjusting some of the settings or the filesystem layout appropriately). This section aims to break the example down into components and explain how each of them works.

6.1. Working with the Example Configuration

The example configuration is designed to be used with a local rsync daemon and the misc/create_demo_tree.py script in the source repo.

Using /var/pulpdist_example_data as the location for our demonstration tree, then /etc/rsyncd.conf should look something like this:

log file = /var/log/rsyncd.log

[demo]
comment="PulpDist Example Data Source"
path=/var/pulpdist_example_data

With pulpdist installed (or else with the src directory in a source checkout as the current directory), the following command will create a demonstration tree:

python create_demo_tree.py /var/pulpdist_example_data

The file tree created is laid out as follows (see below for details of the subtree layout represented by ...):

simple/
  ...
versioned/
  ignored/
    ...
  relevant-1/
    ...
  relevant-2/
    ...
  relevant-3/
    ...
  relevant-4/
    ...
  relevant-but-not-really/
    ...
snapshot/
  ignored/
    ...
  relevant-1/
    STATUS
    ...
  relevant-2/
    STATUS
    ...
  relevant-3/
    ...
  relevant-4/
    STATUS
    ...
  relevant-but-not-really/
    ...

The common subtrees all look like the following:

data.txt
data2.txt
skip.txt
subdir/
  data.txt
  data2.txt
  skip.txt
  subdir/
    data.txt
    data2.txt
    skip.txt
subdir2/
  data.txt
  data2.txt
  dull/
    data.txt
    data2.txt
    skip.txt
  skip.txt

All STATUS files contain the text FINISHED (and nothing else), while the example text files contain the text PulpDist test data!.

6.2. The Raw Repo Definition

The example configuration includes a single Raw Repo Definition. For ease of reference, it is reproduced here:

"RAW_REPOS": [
  {
    "repo_id": "raw_sync",
    "display_name": "Raw Sync Demo",
    "description": "Demonstration of raw sync configuration in site config",
    "notes": {
      "pulpdist": {
        "sync_hours": 24
      },
      "site_custom": {
        "origin": "PulpDist example repository"
      }
    },
    "importer_type_id": "simple_tree",
    "importer_config": {
      "tree_name": "Raw Simple Tree",
      "remote_server": "localhost",
      "remote_path": "/demo/simple/",
      "local_path": "/var/www/pub/sync_demo_raw/",
      "exclude_from_sync": ["*skip*"],
      "sync_filters": ["exclude_irrelevant/", "exclude_dull/"]
    }
  }
]

Raw repos map almost directly to the Pulp settings for the corresponding plugin. This has the advantage of making them entirely self contained and very flexible, but also makes their configuration very repetitive if multiple trees are being mirrored from the same source location.

The first three fields, repo_id, display_name and description are mainly of significance for humans. The repo ID is the unique string identifier used to refer to this repository in the command line interface, while the display name and description are shown in the web interface.

The notes field uses a feature of Pulp that allows arbitrary additional information to be associated with each repository. The site_custom data is just there as an example, but the pulpdist metadata section is used to control interaction with command line client. In this case, the value 24 means that the python -m pulpdist.manage_repos cron_sync command will synchronise this repo at midnight each day if synchronisation is enabled on the repo (like all trees in the example configuration, this one has synchronisation disabled by default).

The importer_type_id field indicates which kind of synchronisation operation is being defined. The value of simple_tree indicates that this configuration entry will set up a Simple Tree Sync on the server.

Finally, the importer_config field actually sets up the synchronisation operation. In this case, a simple tree sync maps directly to a single call to rsync, so there isn’t a great deal to be configured.

The tree_name value (along with repo_id) will appear in the sync operation logs created by the server.

The remote_server and remote_path operations are used to identify the location of the source rsync daemon (rsync over ssh is not currently supported). The local_path entry states exactly where to save the mirrored files. For the example configuration, this means files will be retrieved from rsync://localhost/demo/simple/ and saved to /var/www/pub/sync_demo_raw (the Pulp plugins run as the Apache user, and saving the files to pub makes it easy to share them again).

The last two entries are a little more interesting, as they map to rsync’s filtering options. Any files or directories mentioned in exclude_from_sync are passed via rsync’s --exclude option, while those mentioned in sync_filters are passed with the --filter option. This offers a great deal of flexibility in determining exactly what gets copied from the data source into the local mirror.

6.2.1. Synchronization Behaviour

The effect of this configuration is that, after running the following two commands:

python -m pulpdist.manage_repos enable --repo raw_sync --force
python -m pulpdist.manage_repos sync --repo raw_sync --force

The following filtered tree layout should be seen in /var/www/pub/sync_demo_raw:

data.txt
data2.txt
subdir/
  data.txt
  data2.txt
  subdir/
    data.txt
    data2.txt
subdir2/
  data.txt
  data2.txt

The skip.txt files because they match the pattern in the exclude_from_sync filter.

The dull directory and its contents get excluded by the exclude_dull/ entry in the sync_filters setting.

6.3. Local Mirror Definition: Simple Tree

Where a raw repo definition aims to include all the information needed to configure the rsync task directly, local mirror definitions are designed to work as part of a wider mirroring network, where various upstream servers publish trees for consumption by downstream clients. A local mirror definition is converted to a raw repo definition by the command line client before being uploaded to the Pulp server at a site.

The example configuration includes a number of Local Mirror Definitions.To introduce the concepts involved, we’ll first review the simplest of the definitions, which describes a Simple Tree Sync task, just like the example raw repo definition.

6.3.1. Defining the Local Mirror

The basic mirror definition appears in the LOCAL_MIRRORS section of the configuration file:

"LOCAL_MIRRORS": [
  {
    "mirror_id": "simple_sync",
    "tree_id": "simple_sync",
    "exclude_from_sync": ["*skip*"],
    "sync_filters": ["exclude_irrelevant/"],
    "notes": {
      "basic": "note",
      "site_custom": {
        "origin": "PulpDist example repository"
      }
    }
  }
]

This example creates a local mirror named simple_sync at the default site (see below for more on sites), which will be a copy of the remote tree simple_sync. While the mirror and the remote tree have the same name in the example, that isn’t a requirement in general.

The notes entry just defines a few arbitrary notes that will be added to the tree definition. This can be used to record additional information about the mirror, such as the initial rationale for creating it.

The exclude_from_sync and sync_filters entries contribute to the filter settings in the derived raw repo definition.

A local mirror definition can actually override most of the settings defined for the remote tree being mirrored. However, this particular example doesn’t do that. See the config reference for details.

6.3.2. Defining the Remote Tree

The tree_id entry names a particular Remote Tree Definition in the REMOTE_TREES section:

"REMOTE_TREES": [
  {
    "tree_id": "simple_sync",
    "name": "Simple Sync Demo",
    "description": "Demonstration of the simple tree sync plugin",
    "tree_path": "simple",
    "sync_type": "simple",
    "sync_hours": 0,
    "source_id": "sync_demo"
  }
]

The tree_id is just a unique identifier for the tree, while the name and description fields are used for display to users.

The tree_path defines the name of the directory to be synchronised, relative to the base location defined by the source_id.

It is expected that this configuration format will eventually be expanded to include a list of alternate sources for the tree, but that feature is not yet supported.

The sync_type setting selects the specific importer plugin to be used. Currently only PulpDist provided plugins are supported, but this may change in future versions.

As in the raw repo example, the sync_hours ties into the cron_sync scheduling command. In this case, a setting of 0 servers to disable automatic synchronisation, even if synchronisation is enabled for the repo.

Most of the settings in the tree definition are inherited by local mirrors that don’t override them. See the config reference for details.

6.3.3. Defining the Remote Source

The source_id entry names a particular Remote Source Definition in the REMOTE_SOURCES section:

"REMOTE_SOURCES": [
  {
    "source_id": "sync_demo",
    "server_id": "demo_server",
    "name": "Sync Demo Trees",
    "remote_path": "demo",
    "listing_suffix": "*"
  }
]

The source_id is just a unique identifier for the source, while the name field is intended for display to users.

The remote_path setting defines an the leading path component to use for the remote path when deriving the raw repo definition.

The server-id defines the rsync server that hosts the content provided by this source.

The listing_suffix isn’t relevant for a simple tree definition, but can be of significance for versioned and snapshot trees. It will be discussed in more detail later in the tutorial.

See the config reference for additional options and details.

6.3.4. Defining the Remote Server

The server_id entry names a particular Remote Server Definition in the REMOTE_SERVERS section:

"REMOTE_SERVERS": [
  {
    "server_id": "demo_server",
    "name": "Sync Demo Server",
    "dns": "localhost"
  }
]

The server_id is just a unique identifier for the source, while the name field is intended for display to users.

The dns field is either a hostname or IP address for the source rsync server.

See the config reference for additional options and details.

6.3.5. Defining the Local Site

A local mirror definition may include a site_id setting that names a particular local site configuration to be used when deriving the raw repo definition. If no specific site is named, then the default site definition is used. The default site definition is also used to provide default values that are used when a specific site definition doesn’t replace them with more specific values.

This particular mirror definition is for the default Site Definition in the SITE_SETTINGS section:

"SITE_SETTINGS": [
  {
    "site_id": "default",
    "name": "Default Site",
    "storage_prefix": "/var/www/pub",
    "server_prefixes": {
      "demo_server": "sync_demo",
      "other_demo_server": "sync_demo/sync_demo_trees"
    },
    "source_prefixes": {
      "sync_demo": "sync_demo_trees"
    },
    "exclude_from_sync": ["*dull*"],
    "exclude_from_listing": ["*justfortesting*"]
  }
]

The site_id is just a unique identifier for the site, while the name field is intended for display to users.

The storage_prefix is included in all local paths.

The server_prefixes and source_prefixes mappings are used to map server_id and source_id values to local path components. For this local mirror, the relevant entries are demo_server and sync_demo respectively.

The exclude_from_sync and exclude_from_listing settings affect the filtering used for various rsync operations. For a simple sync operation, only the exclude_from_sync operation is relevant.

See the config reference for additional options and details.

6.3.6. Equivalent Raw Repo Definition

A local mirror definition isn’t used to configure a repo directly. Instead, an equivalent raw repo definition is derived from the local mirror definition and all of the related settings. The config reference gives an overview of this process.

For the simple tree mirror, the equivalent definition would look like this:

{
  "repo_id": "simple__default",
  "display_name": "Simple Sync Demo",
  "description": "Demonstration of the simple tree sync plugin",
  "notes": {
     "basic": "note",
     "pulpdist": {
       "mirror_id": "simple_sync",
       "server_id": "demo_server",
       "site_id": "default",
       "source_id": "sync_demo",
       "sync_hours": 0,
       "tree_id": "simple_sync"
     },
     "site_custom": {
       "origin": "PulpDist example repository"
     }
  },
  "importer_type_id": "simple_tree",
  "importer_config": {
    "tree_name": "simple_sync__default",
    "remote_server": "localhost",
    "remote_path": "/demo/simple/",
    "local_path": "/var/www/pub/sync_demo/sync_demo_trees/simple/",
    "exclude_from_sync": ["*dull*", "*skip*"],
    "sync_filters": ["exclude_irrelevant/"]
  }
}

The repo_id is a combination of the mirror_id and the site_id. This allows multiple nominal sites to be configured on the same Pulp server without identifier conflicts. Note that the command line client displays these merged IDs a little differently (<mirror_id>(<site_id>)). To select a mirror by its repo id, use the back end form with the double underscore separator (<mirror_id>__<site_id>).

The display_name and description in this case come directly from the remote tree definition.

The notes are a combination of those specified in the local mirror definition, along with those automatically created by the derivation process. The derived notes include the identifiers for each of the components used to derive the repo definition, along with the sync_hours setting for use by the cron_sync scheduling operation.

The importer_type_id is derived from the sync_type setting in the remote tree definition.

The import configuration details used for a simple sync operation are common to all supported importer plugins.

tree_name is always just the derived repo_id for the local mirror.

remote_server is the dns property of the remote server definition.

remote_path in this case is a combination of the remote_path entry in the remote source definition and the tree_path entry in the remote tree definition.

local_path is a combination of the storage_prefix from the site settings, the prefixes for the remote server and source respectively (both retrieved from the site settings) and finishing with the tree_path entry from the remote tree definition (this is one of those settings where the value from the remote tree definition is used if the local mirror definition doesn’t override it).

The exclude_from_sync setting includes the value from the local mirror definition along with the value from the default site settings.

The sync_filters setting is taken directly from the local mirror definition, as this particular remote tree definition omits all of the filtering options.

Unlisted configuration options are left at their default values.

6.3.7. Synchronization Behaviour

The effect of this configuration is that, after running the following two commands:

python -m pulpdist.manage_repos enable --mirror simple_sync --force
python -m pulpdist.manage_repos sync --mirror simple_sync --force

The following filtered tree layout should be seen in /var/www/pub/sync_demo/sync_demo_trees/simple:

data.txt
data2.txt
subdir/
  data.txt
  data2.txt
  subdir/
    data.txt
    data2.txt
subdir2/
  data.txt
  data2.txt

This is the same as the tree layout produced by the example raw repo definition.

6.3.8. Why Use Mirror Definitions?

From the worked example, it may seem that mirror definitions are actually harder to use than the equivalent raw repo definitions. If you only want to mirror a single tree, this is true (that’s why the option to provide a raw repo definition exists).

The primary use case for PulpDist, however, is for an internal mirroring network, where any given rsync server will be publishing multiple trees, and any given site will be downloading multiple trees (potentially from different sources).

The advantage of the mirror definition format is that it allows this arrangement to be modelled directly - when setting up a new local mirror for an existing remote tree, all you need to know is the id of the remote tree and the id of the site where the mirror is being created, rather than all of the details necessary to create the raw repo definition by hand. Avoiding the data duplication also helps ensure consistency between mirrors, and also makes various data changes substantially easier (for example, changing the hostname of a particular upstream rsync server).

6.4. Local Mirror Definition: Versioned Tree

Where a simple sync definition maps directly to a single invocation of rsync, a versioned sync performs an initial listing step to identify a set of remote directories. A separate rsync task is then invoked for each directory. This is useful when a subset of directories from a particular remote directory are being split out to separate locations in the local mirror.

The versioned tree definition in the example site configuration is set up to show the mechanism for limiting a mirror definition to a specific site. It also shows the additional filtering options that become available once the mirroring plugin switches to the two-step process of first doing a remote listing to identify the trees to be synchronised and then issuing a separate rsync command to mirror each tree.

The “versioned tree” name comes from the original use case for this plugin, which is to mirror a subset of versions from a product directory where each version is split out into a separate directory, but new maintenance releases may be added to old version directories. In practice, the plugin works for any tree where it is desirable to mirror a subset of the available top-level directories using a set of selection filters that differ from those used for the actual mirror operations.

6.4.1. Defining the Local Mirror

The basic mirror definition appears in the LOCAL_MIRRORS section of the configuration file:

"LOCAL_MIRRORS": [
  {
    "mirror_id": "versioned_sync",
    "tree_id": "versioned_sync",
    "site_id": "other",
    "exclude_from_sync": ["*skip*"],
    "sync_filters": ["exclude_dull/"],
    "exclude_from_listing": ["relevant-but*"],
    "notes": {
      "site_custom": {
        "origin": "PulpDist example repository"
      }
    }
  }
]

This example creates a local mirror named versioned_sync at the site other, which will be a copy of the remote tree versioned_sync. As with the simple_sync example, using the same name for the local mirror and the remote tree is entirely optional.

The notes entry is again used to record additional information about the mirror, such as the initial rationale for creating it.

The exclude_from_sync and sync_filters entries contribute to the filter settings in the derived raw repo definition. These filters apply to the step of synchronising the individual trees

The exclude_from_listing setting controls which remote directories will be synchronised at all.

See the config reference for additional options and details.

6.4.2. Defining the Remote Tree

The tree_id entry names a particular Remote Tree Definition in the REMOTE_TREES section:

"REMOTE_TREES": [
  {
    "tree_id": "versioned_sync",
    "name": "Versioned Sync Demo",
    "description": "Demonstration of the versioned tree sync plugin",
    "tree_path": "versioned",
    "sync_type": "versioned",
    "sync_hours": 12,
    "source_id": "sync_demo_other",
    "listing_pattern": "relevant*",
    "exclude_from_sync": ["*skip*"],
    "sync_filters": ["exclude_irrelevant/"]
  }
]

The settings here are largely the same as those for the simple local mirror.

The setting of 12``for ``sync_hours indicates that cron_sync should sync this repo at 12 AM and 12 PM each day.

The listing_pattern setting restricts the trees which will be considered for synchronisation, while exclude_from_sync and sync_filters contribute to the rsync settings for the actual tree synchronisation tasks.

See the config reference for additional options and details.

6.4.3. Defining the Remote Source

The source_id entry names a particular Remote Source Definition in the REMOTE_SOURCES section:

"REMOTE_SOURCES": [
  {
    "source_id": "sync_demo_other",
    "server_id": "other_demo_server",
    "name": "Other Sync Demo Trees",
    "remote_path": "demo",
    "listing_suffix": "*"
  }
]

Aside from referring to a different remote server, the settings here are essentially the same as those for the simple local mirror.

While the listing_suffix can be relevant for versioned tree definitions, in this case it is superseded by the listing_pattern setting in the remote tree definition.

See the config reference for additional options and details.

6.4.4. Defining the Remote Server

The server_id entry names a particular Remote Server Definition in the REMOTE_SERVERS section:

"REMOTE_SERVERS": [
  {
    "server_id": "other_demo_server",
    "name": "Other Sync Demo Server",
    "dns": "localhost"
  }
]

As this is just an example site configuration, the “other” remote server also resolves to the local machine. This can also occur in real mirroring networks if multiple logical servers end up being combined on a single physical server.

See the config reference for additional options and details.

6.4.5. Defining the Local Site

The scope of this mirror is limited to a specific site. This means the settings for the named site become relevant, while those for the default site also still apply.

Both of these Site Definitions are given in the SITE_SETTINGS section:

"SITE_SETTINGS": [
  {
    "site_id": "default",
    "name": "Default Site",
    "storage_prefix": "/var/www/pub",
    "server_prefixes": {
      "demo_server": "sync_demo",
      "other_demo_server": "sync_demo_trees"
    },
    "source_prefixes": {
      "sync_demo": "sync_demo_trees"
    },
    "exclude_from_sync": ["*dull*"],
    "exclude_from_listing": ["*justfortesting*"]
  },
  {
    "site_id": "other",
    "name": "Other Site",
    "storage_prefix": "/var/www/pub/sync_demo"
  }
]

The interesting point to note is that this site definition overrides the storage_prefix setting. This will be used in preference to the default setting when deriving the raw repo configuration.

See the config reference for additional options and details.

6.4.6. Equivalent Raw Repo Definition

For the versioned tree mirror, the equivalent raw repo definition looks like this:

{
  "repo_id": "versioned__other",
  "display_name": "Versioned Sync Demo",
  "description": "Demonstration of the versioned tree sync plugin",
  "notes": {
    "pulpdist": {
      "mirror_id": "versioned_sync",
      "source_id": "sync_demo_other",
      "server_id": "other_demo_server",
      "site_id": "other",
      "sync_hours": 12,
      "tree_id": "versioned_sync"
    },
    "site_custom": {
      "origin": "PulpDist example repository"
    }
  },
  "importer_type_id": "versioned_tree",
  "importer_config": {
    "tree_name": "versioned_sync__other",
    "remote_server": "localhost",
    "remote_path": "/demo/versioned/",
    "local_path": "/var/www/pub/sync_demo/sync_demo_trees/versioned/",
    "exclude_from_sync": ["*dull*", "*skip*"],
    "sync_filters": ["exclude_dull/", "exclude_irrelevant/"],
    "listing_pattern": "relevant*",
    "exclude_from_listing": ["*justfortesting*", "relevant-but*"]
  }
}

The derivation of most of these settings is essentially the same as that for the simple mirror.

local_path is slightly different, in that the storage_prefix comes from the settings for the other site, while the prefix for the remote server still comes from the default site, and there is no prefix at all for the nominated remote source.

The exclude_from_sync setting includes the value from the local mirror definition along with the value from the default site settings. Note that the duplicate value from the remote tree settings has been omitted.

The sync_filters setting includes the filter options from both the local mirror and remote tree defitions.

The completely new configuration settings all relate to the remote listing step.

The listing_pattern is taken directly from the remote tree configuration and is passed to rsync to indicate which directories to include in the listing.

The exclude_from_listing setting includes the value from the local mirror definition along with the value from the default site settings.

Unlisted configuration options are left at their default values.

6.4.7. Synchronization Behaviour

The effect of this configuration is that, after running the following two commands:

python -m pulpdist.manage_repos enable --mirror versioned_sync --force
python -m pulpdist.manage_repos sync --mirror versioned_sync --force

The following filtered tree layout should be seen in /var/www/pub/sync_demo/sync_demo_trees/versioned:

relevant-1/
  ...
relevant-2/
  ...
relevant-3/
  ...
relevant-4/
  ...

Where the individual tree layouts represented by ... are the same as those produced by both the local mirror and raw repo simple sync definitions.

The ignored directory is omitted because it does not match the derived listing_pattern setting.

The relevant-but-not-really directory is omitted because it matches one of the patterns in the exclude_from_listing setting.

6.5. Local Mirror Definition: Snapshot Tree

Snapshot tree definitions are very similar to versioned tree definitions, as they also perform an initial directory listing step before proceeding to separate sync operations for each identified directory.

The difference is that snapshot sync operations are designed for systems where individual trees are never modified after their initial creation (for example, a system which creates automatic nightly builds with a date-based naming scheme for the build directories).

The state of individual trees is recorded in a STATUS at the root of each directory. If this file exists and contains the text FINISHED then it indicates that the tree is available for synchronisation (if present at the remote site) or has already been synchronised (if present at the local site). For large trees, this allows a lot of wasted data transfers to be skipped: already synchronised trees don’t need to be checked for changes, and unusable trees from the remote site don’t need to be copied in the first place.

6.5.1. Defining the Local Mirror

The basic mirror definition appears in the LOCAL_MIRRORS section of the configuration file:

"LOCAL_MIRRORS": [
  {
    "mirror_id": "snapshot_sync",
    "tree_id": "snapshot_sync",
    "notes": {
      "site_custom": {
        "origin": "PulpDist example repository"
      }
    }
  }
]

This example aims to show an almost minimal local mirror definition. The only optional information here is the note indicating why this mirror exists.

See the config reference for additional options and details.

6.5.2. Defining the Remote Tree

The tree_id entry names a particular Remote Tree Definition in the REMOTE_TREES section:

"REMOTE_TREES": [
  {
    "tree_id": "snapshot_sync",
    "name": "Snapshot Sync Demo",
    "description": "Demonstration of the snapshot tree sync plugin",
    "tree_path": "snapshot",
    "sync_type": "snapshot",
    "sync_hours": 1,
    "source_id": "sync_demo",
    "listing_prefix": "re*ev",
    "latest_link": "latest-relevant",
    "exclude_from_listing": ["relevant-but*"],
    "exclude_from_sync": ["*skip*"],
    "sync_filters": ["exclude_irrelevant/", "exclude_dull/"]
  }
]

The settings here are largely the same as those for the simple local mirror.

The setting of 1``for ``sync_hours indicates that cron_sync should sync this repo every hour.

The listing_prefix setting is another way to restrict the trees which will be considered for synchronisation. Unlike listing_pattern, which completely defines the inclusion filter, listing_prefix is combined with the listing_suffix setting from the relevant remote source definition.

The exclude_from_listing filter provides a pattern for directories that would otherwise match the inclusion filter, but should still not be synchronised.

The latest_link setting indicates that a symlink should be created that always points to the most recently synchronised tree, and that it should be called latest-relevant

As with the versioned tree, exclude_from_sync and sync_filters contribute to the rsync settings for the actual tree synchronisation tasks.

See the config reference for additional options and details.

6.5.3. Defining the Remote Source

The source_id entry names a particular Remote Source Definition in the REMOTE_SOURCES section:

"REMOTE_SOURCES": [
  {
    "source_id": "sync_demo",
    "server_id": "demo_server",
    "name": "Sync Demo Trees",
    "remote_path": "demo",
    "listing_suffix": "*"
  }
]

This is the exact same source as is used for the simple local mirror definition.

The listing_suffix becomes relevant in this case, as this source is now being used for a sync operation with a listing step based on listing_prefix. While the example configuration allows any suffix, real deployments may use this setting to enforce a standard version numbering or date formatting scheme for a particular remote source.

See the config reference for additional options and details.

6.5.4. Defining the Remote Server

The server_id entry names a particular Remote Server Definition in the REMOTE_SERVERS section:

"REMOTE_SERVERS": [
  {
    "server_id": "demo_server",
    "name": "Sync Demo Server",
    "dns": "localhost"
  }
]

As the remote server is specified by the remote source, this is the exact same server as is used for the simple local mirror definition.

See the config reference for additional options and details.

6.5.5. Defining the Local Site

Like the simple local mirror, the snapshot mirror example uses the default site settings directly.

This Site Definitions is given in the SITE_SETTINGS section:

"SITE_SETTINGS": [
  {
    "site_id": "default",
    "name": "Default Site",
    "storage_prefix": "/var/www/pub",
    "server_prefixes": {
      "demo_server": "sync_demo",
      "other_demo_server": "sync_demo_trees"
    },
    "source_prefixes": {
      "sync_demo": "sync_demo_trees"
    },
    "exclude_from_sync": ["*dull*"],
    "exclude_from_listing": ["*justfortesting*"]
  }
]

The only difference with the simple local mirror is that the exclude_from_listing setting becomes relevant, as the snapshot sync plugin includes the listing step.

See the config reference for additional options and details.

6.5.6. Equivalent Raw Repo Definition

For the versioned tree mirror, the equivalent raw repo definition looks like this:

{
  "repo_id": "snapshot_sync__default",
  "display_name": "Snapshot Sync Demo",
  "description": "Demonstration of the snapshot tree sync plugin",
  "notes": {
    "pulpdist": {
      "mirror_id": "snapshot_sync",
      "source_id": "sync_demo",
      "server_id": "demo_server",
      "sync_hours": 1,
      "site_id": "default",
      "tree_id": "snapshot_sync"
    },
    "site_custom": {
      "origin": "PulpDist example repository"
    }
  },
  "importer_type_id": "snapshot_tree",
  "importer_config": {
    "sync_filters": ["exclude_irrelevant/", "exclude_dull/"],
    "remote_path": "/test_data/snapshot/",
    "latest_link_name": "latest-relevant",
    "tree_name": "snapshot_sync__default",
    "exclude_from_sync": ["*dull*", "*skip*"],
    "exclude_from_listing": ["*justfortesting*", "relevant-but*"],
    "remote_server": "localhost",
    "listing_pattern": "re*ev*",
    "local_path": "/var/www/pub/sync_demo/sync_demo_trees/snapshot/"
  }
}

The derivation of most of these settings is essentially the same as in the previous examples.

The exclude_from_sync setting includes the value from the remote tree definition along with the value from the default site settings.

The latest_link_name and sync_filters settings are taken directly from the remote tree settings.

The listing_pattern is derived by concatenating the listing_prefix from the remote tree settings with the listing_suffix from the remote source settings.

The exclude_from_listing setting includes the value from the remote tree definition along with the value from the default site settings.

Unlisted configuration options are left at their default values.

6.5.7. Synchronization Behaviour

The effect of this configuration is that, after running the following two commands:

python -m pulpdist.manage_repos enable --mirror snapshot_sync --force
python -m pulpdist.manage_repos sync --mirror snapshot_sync --force

The following filtered tree layout should be seen in /var/www/pub/sync_demo/sync_demo_trees/snapshot:

relevant-1/
  ...
relevant-2/
  ...
relevant-4/
  ...
latest-relevant -> ./relevant-4

Where the individual tree layouts represented by ... are the same as those produced by both the local mirror and raw repo simple sync definitions.

The ignored directory is omitted because it does not match the derived listing_pattern setting.

The relevant-but-not-really directory is omitted because it matches one of the patterns in the exclude_from_listing setting.

The relevant-3 directory is omitted because it does not contain the STATUS file to indicate that the tree is valid.

The latest-relevant symlink refers to relevant-4 as that is the most recent tree to be synchronised.