Image Webs Cloud Toolkit¶
The documentation explains how to build Image Webs.
Who is this for?
The tutorial section is written for anyone who wants to visualize their own large-scale image collection as an image web. No programming is required. The manual section is for users who want to hack on the code of the matching pipeline used to build the image webs itself. All of the code is open source and available from github.
What is this?
The Image Webs Cloud Toolkit (IWCT for short) creates a computing environment on Amazon’s EC2 cloud with the Image Webs software pipeline installed and ready to use. This environment consists of two elements: the workstation and the cluster. You connect to the workstation from your local machine. From the workstation, you can prepare your image datasets and launch a cluster to run the processing pipeline on your datasets.

IWCT Computing Environment
workstation: | A persistent remote workstation setup for you in the AWS cloud. |
---|---|
cluster: | A temporary cluster setup for you in the AWS cloud on-demand. |
How much does it cost?
The IWCT software is opensource and free to use. However, you pay amazon for whatever resources you use. Roughly $1 per day to run the workstation and $6 per hour to run a typical cluster. See the estimating costs section for details.
Contents¶
Tutorial¶
Setup¶
You must install NX client on any computer from which you wish to work. Also because IWCT launches computer resource on the public Amazon Web Services cloud, you’ll need to create an AWS account.
Note
You’ll need to provide a credit card and phone number to create an AWS account.
Install NX Client¶
The NX client allows you to use another computer remotely as if it were your local machine. It provides a fast full-screen GUI session even across poor connections. Please install the NX client for your OS:
Create AWS Account¶
- Go to http://aws.amazon.com
- Click the Sign Up button
- Follow the on-screen instructions
Note
If you are a student or researcher, you can request a grant for free AWS credits. When your grant credit is exhausted, your credit card will be charged once again.
Warning
You are responsible for any charges you accrue for use of AWS resources. When you complete a work session, always login to your AWS management console and check that your workstation instances are stopped and your cluster instances terminated to avoid accidently wasting resources.
Launch workstation¶
The IWCT workstation console allows you to start, stop, and connect to your workstations from any internet connected computer.
Note
Make sure NX Client is installed on the computer you are using and that you can login to the AWS Management Console.
To start and connect to your workstation:
- Go to the IWCT workstation console: http://iwct.dasgizmo.net
- Click login and enter your username and password (or click register if you don’t have an account).
- Click the connect button next to a workstation workstation (or the create new workstation button if no workstations are listed).
Use ctrl-alt-f
to use nx-client in fullscreen mode. Here are some more handy NX Client keyboard shortcuts.
Note
After login, you’ll be prompted for your AWS Access Key. Follow the guide here to get your AWS Access Key.
Prepare dataset¶
Note
Perform the following steps from your IWCT Workstation (i.e. connect using the NX client). You may want to open this page in a browser running on the IWCT Workstation for convenience.
You can either use an default example image dataset or provide your own image files.
Optional - Prepare your own dataset¶
To use your own image collection, you’ll need to place jpeg format image files in an archive file (tar or zip). Any directory structure is fine, non-JPEG files will be ignored.
Note
For convenience, place your .zip or .tar file on a webserver so you can easily download it to your workstation in the next step.
Import Data¶
Open a terminal (right click on desktop and select Open in terminal from the context menu) and enter the following command.
iwct_import_dataset
At the prompt Please enter url to dataset archive:
, provide the path to your tar or zip image archive in URL format (or use the URL for the example dataset).
Note
If your file is copied locally on the workstation use the URL prefix file:// instead of http://.
At the prompt Please enter name for dataset (no spaces):
, provide a name for your dataset (no spaces - only letters, numbers, underscores).
This creates a new dataset directory /home/ubuntu/Desktop/datasets/dataset_name
containing a PERT file imageid_to_image.pert
containing your dataset images.
Launch cluster¶
Login or register a new account at http://www.mapr.com/Log-yourself-in-mapr

Open a terminal, and start a cluster with 5 nodes
cirrus_cluster_cli create 5
When the prompt shows this message, open the indicated URL in a browser.

Hint
Hover and right click on the URL and select Open link in browser from the context menu.
Click the Proceed anyway button.
Note
You can safely ignore the scary looking SSL warning.

Login as the root user.
username root password mapr

Click the Add license from web button.

If not yet done, create a MapR account (set the remember me option to skip this step next time).
Select M3 license and click the Register button.

Click the Return to your MapR Cluster Ui link.
Click the Apply Licenses button.

Return to the console and press ENTER key to continue...

When the script finishes, the MapR control panel should look like this (one green square for each node you requested).

Run pipeline¶
Open a terminal (right click on desktop and select “Open in terminal” from the context menu) and enter the following command.
iwct_run_pipeline
Next you will be prompted to select a dataset from the list of datasets you previously imported using the iwct_import_dataset command.
View Results¶
When the iwct_run_pipeline command completes, you will have a results/html
directory with a self-contained static website that can be used to view the
results. Simply copy the directory to any webserver to publish your image-graph viewer on the web.
An easy way to view the result locally is to start a python webserver for the html directory like this:
cd /home/ubuntu/Desktop/datasets/{dataset_name}/results/html
python -m SimpleHTTPServer
Now open a webbrowser to the url output by the python command (e.g. http://localhost:8000).
Note
The viewer inside the html directory is self-contained and portable. Simply copy the html directory to any webserver to share your image-graph viewer with others.
Shutdown¶
When the you are finished running the image webs pipeline, don’t forget to terminate the cluster. Otherwise, you may end up with a LARGE bill.
cirrus_cluster_cli destroy
Warning
All files on the maprfs:// file system will be lost, be sure any data you wish to retain is copied to the IWCT workstation.
When you are finished using the workstation, click the “Turn Off” button in the IWCT Console. When your workstation is shut-off, you are not charged for instance-hours but continue to pay for retaining the instance storage (on an EBS volume). You can later return to the IWCT Console and click “Turn On” to start working where you left off. When you are done with the workstation and no longer wish to retain the storage, Open the Options menu and click Destroy to delete the workstation and storage.
Warning
You are responsible for any charges you accrue for use of AWS resources. When you complete a work session, always login to your AWS management console and check that your workstation instances are stopped and your cluster instances terminated to avoid accidently wasting resources.
Manual¶
Workstation¶
Add Storage¶
This section explains how to add storage space.
- Login to the IWCT Console
- Click the “Options” button next to your workstation
- Click the “Add Storage” item from the options menu
- Follow the on-screen instructions
- Start and connect to verify workstation storage was increased
Cluster¶
- How to create, manager, destroy cluster
Estimating costs¶
The IWCT software is opensource and free to use. However, you pay amazon for whatever resources you use.
workstation: | The workstation costs $0.14/hr when running (or about $1 for typical workday). The workstation can be turned on and off as needed (and the data persists). The workstation storage costs < $1 per month. When you are done using a workstation and no longer want to pay to keep the data, you can destroy it. |
---|---|
cluster: | You can start a private cluster from your workstation in minutes when you need to compute, copy the output from the cluster FS back to your remote desktop, and destroy the cluster. The cost of running a cluster depends depends on the number of nodes and the type of nodes. |
Guides¶
How to get your AWS Access Key¶
Go to https://console.aws.amazon.com
Click on your username in the top navigation bar and then click the Security Credentials option in the drop down menu.

Click the Continue button to dismiss the dialog.

Click the + button next to Access Keys and then click the Create new root key button.

Click the Download key file button.

Open the downloaded file rootkey.csv to copy and paste your Access Key ID and Secret.

How to transfer files to and from IWCT Workstations¶
Connect To Server¶
Use the “Connect To Server” feature to easily drag and drop files from a remote server (via ssh, ftp, windows share). To do this, click Places in the bar at the top and click Connect To Server... from the drop down menu.
SCP¶
Example to copy a file
scp username1@source_host:directory1/filename1 username2@destination_host:directory2/filename2
Example to copy a directory recursively
scp -r username1@source_host:directory1 username2@destination_host:directory2
Note
Use the Options menu in the IWCT workstation consle to generate the SSH or SCP command to copy files to / from your workstation.