Interra Catalog Documentation

Note

These docs are a work in progress.

Table of Contents

Introduction

For the pros and const of this library see the ReadMe.

The following is an overview of concepts and use.

Metadata

The purpose of this project is to make it as easy as possible to catalog metadata. The primary use-case is “open data” but it can be adapted to any schema that takes a similar shape. Files can also be stored and published along side the metadata. Services such as Carto can be used to provide an API for interacting with files or what open data catalog’s refer to as a DataStore.

A document store offers a good solution for storing metadata because it limits the service area for reciving (JSON Objects), storing end editing (JSON Objects with references), and publishing (JSON Objects) metadata.

By limiting the scope and functionality of a metadata catalog this project is designed to make it easier to interact with outside services. For example the integration with ElasticSearch consists of only two methods.

Collections and Docs

Content in this catalog is divided into collections and documents similar to MongoDB or other Document Stores. Collections are types of content such as a “dataset” or “organization” though they can be anything defined by a schema. Docs are the individual content items.

The content model contains a FileStorage and MongoDB sub-classes which are options for storage. The FileStorage class treats the local file system as a document databse storing and retrieving results from disk. Using files to store metadata is a primary advantage of this project however a Mongo option is offered since it is necessary for Interra Catalog Admin and the methods for interacting with the data (ie InsertOne) are identical. Note the Mongo methods are not fully supported yet.

Structure

This project consists of the following:

config.yml
models/
schemas/
build/
sites/
app/
cli.js
plopfile.js

The rest of the files and folders are artifacts of react boilerplate which drives the app/.

config.yml

Contains variables for the location of the sites/, build/, schemas/ directories and the storage mechanism. Storage options are the default FileStorage. Mongo is not fully supported yet.

models/

Contains classes for creating, storing and building catalogs.

schemas/

Contains the base schemas for the catalog. For adding new schemas it is recommended to change the schema directory in the base config.yml file.

build/

Contains the fully-built sites separated by site name. Each site consists of an export of the collection data of the site as well as the built version of the app. The site folder is the production version of the site.

sites/

Contains the data for each site separated by site name. Each site contains a config.yml file, media files, collection data, and harvest sources.

app/

Contains the react app that builds renders the catalog.

cli.js

A cli for performing site tasks built using Caporal. Run node cli.js to see a list of available commands:

validate-site-contents site
validate-site site
build-collection-data site
build-collection-data-item site collection interraId
build-config site
build-datajson site
build-routes site
build-schema site
build-search site
build-swagger site
build-apis site
build-site site
run-dev site
run-dev-dll
run-dev-tunnel site
harvest-cache site
harvest-run site
load-doc site collection interraId

React App Front-End

The front end app is built using react boilerplate. This will likely be separated from the models at some point. Data for each site is exported to the build/ folder. Content data is exported to build/SITE-NAME/api/v1. Each site exports a swagger-based documentation of the APIs at /api which is exported to build/SITE-NAME/api/v1/swagger.json.

Development

Thanks to react boilerplate this project includes a hot-reloading dev-server. To run first build the DLLs node cli.js run-dev-dlls SITE-NAME and then run the dev server node cli.js run-dev SITE-NAME.

Create a New Site

Sites are stored in the sitesDir which is stored in the root directory’s config.yml file. The default sites directory is ./sites.

Use Plop Generator

A new site can be created using plop: node_modules/.bin/plop.

Create a Site Manually

New sites can be created manually by copying the “test-site” in ./internals/models/tests/sites/test-site.

Use Catalog Admin

Interra Catalog Admin includes a user interface for creating and editing sites.

Validating Sites

To validate your site configuration type: node cli.js validate-site SITE.

Site Configration

Below is a description of the site configuration file: config.yml.

Properties

  Type Description Required
name string The name of the site Yes
schema string The schema of the site Yes
identifier string Unique ID of the site Yes
description string Description of the site No
search string Search settings of the site Yes
private object Private settings of the site No
front-page-ic on-collection string The collection to be used for front page icons No
front-page-ic ons array[] The icons to be used for front page icons No
fontConfig object Configuration object for fonts No

Additional properties are allowed.

name

The name of the site

  • Type: string
  • Required: No
schema

The schema of the site

  • Type: string
  • Required: No
  • Allowed values:
    • "pod-full"
    • "pod"
    • "test-schema"
identifier

Unique ID of the site

  • Type: string
  • Required: No
private

Private settings for the site that are not exported to the production instance. Includes settings for AWS and other services

description

Description of the site

  • Type: string
  • Required: No
front-page-icon-collection

The collection to be used for front page icons

  • Type: string
  • Required: No
front-page-icons

The icons to be used for front page icons

  • Type: array[]
  • Required: No
fontConfig

Configuration object for fonts

  • Type: object
  • Required: No

Schemas

Schemas are collections or JSON-Schema files as well as a few other settings. Each schema should have the following files and folders:

collections/
hooks/
config.yml
map.yml
UISchema.yml
PageSchema.yml

collections/

A JSON-Schema representation of each collection for the catalog. $ref: references are supported.

hooks/

Hooks for overriding docs. This is currently required.

config.yml

The schema’s config.yml file has the following properties:

  • name Human readable name of the schema
  • api Currently “1” supported
  • collections A list of collections the schema describes
  • facets A list of facets that a catalog’s search page would use for this schema. Will be moved to individual sites.
  • references An object listing each collection that contains references to other collections and on what properties they are connected.
  • routeCollections Collections that should have routes in the catalog.

map.yml

Each doc has several required fields that the catalog needs:

  • title
  • identifier
  • created
  • modified

The map.yml file allows schemas to map one of the required fields to an existing field in the schema. This keeps schema’s from having to keep redundent data and schema’s to remain as untouched as possible. For example Project Open Data uses name for the Organization instead of title. The map.yml file allows the name to be mapped to title for organizations for use in the catalog.

UISchema.yml

Used to map fields in each schema to a widget for the document creation and edit forms in the `Interra Catalog Admin <>`_ project. This uses the React JSON-Schema Form library which provides that API. See that projects documentation for more details.

PageSchema.yml

Similar to the UISchema.yml file but for rendering collection pages in the react app. Still under development.

Harvesting

Harvests are configured and stored in the sites/SITE-NAME/harvest folder.

Harvest Sources

The sources.json file in the harvest folder provides a list of harvest sources. The list uses the following format:

  • id The human readable identifier for the harvest source
  • source The source of the harvest. Can be a remote http:// or https:// source or a local source file://.
  • type The type of source. Currently DataJSON is the only option.
  • filters Allows the filtering of sources by a key and value that will need to appear in each source document that is included in the harvest.
  • exclude The opposite of filter.
  • overrides Override a value in each doc.
  • defaults Provides a default value only if that value is missing from each source doc.

Running Harvests

Caching

Harvests sources are first cached to local files before processing. This makes dealing with remote source timeout issues easier. Cached sources are stored in the harvest/SOURCE-NAME/SOURCE-TYPE folder. To run the cache type:

node cli.js harvest-cache SITE-NAME

Running

Once files are cached type the following to run a harvest:

node cli.js harvest-run SITE-NAME

Harvest sources are now stored in the site’s collections folder as site documents. The harvest source is added to the interra object in each doc.

Publish Site

Sites are published as single page apps in the build/ folder.

The published react app uses data from two sources:

  1. Exported collection data and apis in the api/v1 folder.
  2. Contents of the site’s config.yml that are exported by webpack as the interraConfig global variable for the app

Running the node cli.js command shows the available build commands.

Building Collection Data

Collection data is stored in sites/SITE-NAME/collections. The data stored there is “referenced” meaning it contains references to documents stored in other collections. When data is exported the documents are “derefenced” so they contain the full document with all of their referenced objects. Documents use the interra.id to reference each other.

Collection data is exported to builds/SITE-NAME/api/v1/collections. To export all collection data run:

node cli.js build-collection-data SITE-NAME

To export an idividual document run:

node cli.js build-collection-data-item SITE-NAME COLLECTION-NAME DOCUMENT-NAME

Building APIs

APIs are built use the node cli.js command. Available build commands include:

  • build-datajson builds Projet Open Data’s data.json file.
  • build-routes builds a list of available routes in the routes.json file. Used by the react app to render collection page
  • build-schema builds a description of the site schema in the schema.json file.
  • build-search builds a search index in the search-index.json if elasticLunr or simpleSearch are used
  • build-swagger builds a swagger api file for the site at swagger.json.

All APIs for a site can be built at once using node cli.js build-apis SITE-NAME.

Building the React App

A version of the react app is exported to each site directory. The only difference between them are the site configuraiton contained in the interraConfig variable which is exported as part of the app by webpack.

To build the app run node cli.js build-site SITE-NAME.

Configuration

Front Page Icons

The front page icons are associated with a certain collection. To set the collection add the following in config.yml:

front-page-icon-collection:
  - [COLLECTION]
front-page-icons:
 - [COLLECTION ITEM IDS]

For example:

front-page-icon-collection: theme
front-page-icons:
 - city-planning
 - finance-and-budgeting
 - health-care
 - public-safety
 - transporation
Adding Icons to Collection Items

The actual icon types are added to the collection items with the icon key. For example:

{
  "title": "City Planning",
  "identifier": "city-planning",
  "icon": "building-12"
}
Available Icons

Below is the default icon list:

_images/fonts1.png
_images/fonts2.png
_images/fonts3.png
_images/fonts4.png
_images/fonts5.png