Image Webs Cloud Toolkit

The documentation explains how to build Image Webs.

Who is this for?

The tutorial section is written for anyone who wants to visualize their own large-scale image collection as an image web. No programming is required. The manual section is for users who want to hack on the code of the matching pipeline used to build the image webs itself. All of the code is open source and available from github.

What is this?

The Image Webs Cloud Toolkit (IWCT for short) creates a computing environment on Amazon’s EC2 cloud with the Image Webs software pipeline installed and ready to use. This environment consists of two elements: the workstation and the cluster. You connect to the workstation from your local machine. From the workstation, you can prepare your image datasets and launch a cluster to run the processing pipeline on your datasets.

_images/iwct_overview_diagram.png

IWCT Computing Environment

workstation:A persistent remote workstation setup for you in the AWS cloud.
cluster:A temporary cluster setup for you in the AWS cloud on-demand.

How much does it cost?

The IWCT software is opensource and free to use. However, you pay amazon for whatever resources you use. Roughly $1 per day to run the workstation and $6 per hour to run a typical cluster. See the estimating costs section for details.

Contents

Tutorial

Setup

You must install NX client on any computer from which you wish to work. Also because IWCT launches computer resource on the public Amazon Web Services cloud, you’ll need to create an AWS account.

Note

You’ll need to provide a credit card and phone number to create an AWS account.

Install NX Client

The NX client allows you to use another computer remotely as if it were your local machine. It provides a fast full-screen GUI session even across poor connections. Please install the NX client for your OS:

Create AWS Account
  1. Go to http://aws.amazon.com
  2. Click the Sign Up button
  3. Follow the on-screen instructions

Note

If you are a student or researcher, you can request a grant for free AWS credits. When your grant credit is exhausted, your credit card will be charged once again.

Warning

You are responsible for any charges you accrue for use of AWS resources. When you complete a work session, always login to your AWS management console and check that your workstation instances are stopped and your cluster instances terminated to avoid accidently wasting resources.

Launch workstation

The IWCT workstation console allows you to start, stop, and connect to your workstations from any internet connected computer.

Note

Make sure NX Client is installed on the computer you are using and that you can login to the AWS Management Console.

To start and connect to your workstation:

  1. Go to the IWCT workstation console: http://iwct.dasgizmo.net
  2. Click login and enter your username and password (or click register if you don’t have an account).
  3. Click the connect button next to a workstation workstation (or the create new workstation button if no workstations are listed).

Use ctrl-alt-f to use nx-client in fullscreen mode. Here are some more handy NX Client keyboard shortcuts.

Note

After login, you’ll be prompted for your AWS Access Key. Follow the guide here to get your AWS Access Key.

Prepare dataset

Note

Perform the following steps from your IWCT Workstation (i.e. connect using the NX client). You may want to open this page in a browser running on the IWCT Workstation for convenience.

You can either use an default example image dataset or provide your own image files.

Optional - Prepare your own dataset

To use your own image collection, you’ll need to place jpeg format image files in an archive file (tar or zip). Any directory structure is fine, non-JPEG files will be ignored.

Note

For convenience, place your .zip or .tar file on a webserver so you can easily download it to your workstation in the next step.

Import Data

Open a terminal (right click on desktop and select Open in terminal from the context menu) and enter the following command.

iwct_import_dataset

At the prompt Please enter url to dataset archive:, provide the path to your tar or zip image archive in URL format (or use the URL for the example dataset).

Note

If your file is copied locally on the workstation use the URL prefix file:// instead of http://.

At the prompt Please enter name for dataset (no spaces):, provide a name for your dataset (no spaces - only letters, numbers, underscores).

This creates a new dataset directory /home/ubuntu/Desktop/datasets/dataset_name containing a PERT file imageid_to_image.pert containing your dataset images.

Launch cluster

Login or register a new account at http://www.mapr.com/Log-yourself-in-mapr

_images/register_with_mapr.png

Open a terminal, and start a cluster with 5 nodes

cirrus_cluster_cli create 5

When the prompt shows this message, open the indicated URL in a browser.

_images/master_ready_console.png

Hint

Hover and right click on the URL and select Open link in browser from the context menu.


Click the Proceed anyway button.

Note

You can safely ignore the scary looking SSL warning.

_images/ssl_warning.png

Login as the root user.

username root
password mapr
_images/login.png

Click the Add license from web button.

_images/add_license.png

If not yet done, create a MapR account (set the remember me option to skip this step next time).


Select M3 license and click the Register button.

_images/register_cluster.png

Click the Return to your MapR Cluster Ui link.


Click the Apply Licenses button.

_images/apply_license.png

Return to the console and press ENTER key to continue...

_images/continue_console.png

When the script finishes, the MapR control panel should look like this (one green square for each node you requested).

_images/mapr_console.png

Run pipeline

Open a terminal (right click on desktop and select “Open in terminal” from the context menu) and enter the following command.

iwct_run_pipeline

Next you will be prompted to select a dataset from the list of datasets you previously imported using the iwct_import_dataset command.

View Results

When the iwct_run_pipeline command completes, you will have a results/html directory with a self-contained static website that can be used to view the results. Simply copy the directory to any webserver to publish your image-graph viewer on the web.

An easy way to view the result locally is to start a python webserver for the html directory like this:

cd /home/ubuntu/Desktop/datasets/{dataset_name}/results/html
python -m SimpleHTTPServer

Now open a webbrowser to the url output by the python command (e.g. http://localhost:8000).

Note

The viewer inside the html directory is self-contained and portable. Simply copy the html directory to any webserver to share your image-graph viewer with others.

Shutdown

When the you are finished running the image webs pipeline, don’t forget to terminate the cluster. Otherwise, you may end up with a LARGE bill.

cirrus_cluster_cli destroy

Warning

All files on the maprfs:// file system will be lost, be sure any data you wish to retain is copied to the IWCT workstation.

When you are finished using the workstation, click the “Turn Off” button in the IWCT Console. When your workstation is shut-off, you are not charged for instance-hours but continue to pay for retaining the instance storage (on an EBS volume). You can later return to the IWCT Console and click “Turn On” to start working where you left off. When you are done with the workstation and no longer wish to retain the storage, Open the Options menu and click Destroy to delete the workstation and storage.

Warning

You are responsible for any charges you accrue for use of AWS resources. When you complete a work session, always login to your AWS management console and check that your workstation instances are stopped and your cluster instances terminated to avoid accidently wasting resources.

Manual

Workstation

Add Storage

This section explains how to add storage space.

  1. Login to the IWCT Console
  2. Click the “Options” button next to your workstation
  3. Click the “Add Storage” item from the options menu
  4. Follow the on-screen instructions
  5. Start and connect to verify workstation storage was increased

Cluster

  • How to create, manager, destroy cluster

Estimating costs

The IWCT software is opensource and free to use. However, you pay amazon for whatever resources you use.

workstation:The workstation costs $0.14/hr when running (or about $1 for typical workday). The workstation can be turned on and off as needed (and the data persists). The workstation storage costs < $1 per month. When you are done using a workstation and no longer want to pay to keep the data, you can destroy it.
cluster:You can start a private cluster from your workstation in minutes when you need to compute, copy the output from the cluster FS back to your remote desktop, and destroy the cluster. The cost of running a cluster depends depends on the number of nodes and the type of nodes.

Guides

How to get your AWS Access Key

Go to https://console.aws.amazon.com

Click on your username in the top navigation bar and then click the Security Credentials option in the drop down menu.

_images/how_to_get_key_security_cred.png

Click the Continue button to dismiss the dialog.

_images/how_to_get_key_ignore_iam.png

Click the + button next to Access Keys and then click the Create new root key button.

_images/how_to_get_key_new_access_keys.png

Click the Download key file button.

_images/how_to_get_key_download.png

Open the downloaded file rootkey.csv to copy and paste your Access Key ID and Secret.

_images/how_to_get_key_view_csv.png

How to transfer files to and from IWCT Workstations

Connect To Server

Use the “Connect To Server” feature to easily drag and drop files from a remote server (via ssh, ftp, windows share). To do this, click Places in the bar at the top and click Connect To Server... from the drop down menu.

SCP

Example to copy a file

scp  username1@source_host:directory1/filename1 username2@destination_host:directory2/filename2

Example to copy a directory recursively

scp -r username1@source_host:directory1 username2@destination_host:directory2

Note

Use the Options menu in the IWCT workstation consle to generate the SSH or SCP command to copy files to / from your workstation.