Welcome to Clearwater

Project Clearwater is an open-source IMS core, developed by Metaswitch Networks and released under the GNU GPLv3. You can find more information about it on our website.

Latest Release

The latest stable release of Clearwater is “The Year of the Flood”.

Architecture

Clearwater is architected from the ground up to be fully horizontally scalable, modular and to support running in virtualized environments. See our Clearwater Architecture page for a bird’s eye view of a Clearwater deployment and a guided tour of the various functional components that comprise it.

Looking Deeper

To look at the optional extra features and function of your Clearwater deployment and for discussions about Clearwater scaling and redundancy, see Exploring Clearwater.

Getting Source Code

All the source code is on GitHub, in the following repositories (and their submodules).

  • chef - Chef recipes for Clearwater deployment
  • clearwater-infrastructure - General infrastructure for Clearwater deployments
  • clearwater-logging - Logging infrastructure for Clearwater deployments
  • clearwater-live-test - Live test for Clearwater deployments
  • clearwater-readthedocs - This documentation repository
  • crest - RESTful HTTP service built on Cassandra - provides Homer (the Clearwater XDMS) and Homestead-prov (the Clearwater provisioning backend)
  • ellis - Clearwater provisioning server
  • sprout - Sprout and Bono, the Clearwater SIP router and edge proxy
  • homestead - Homestead, the Clearwater HSS-cache.
  • ralf - Ralf, the Clearwater CTF.

Contributing

You can contribute by making a GitHub pull request. See our documented Pull request process.

There is more information about contributing to Project Clearwater on the community page of our project website.

Help

If you want help, or want to help others by making Clearwater better, see the Support page.

License and Acknowledgements

Clearwater’s license is documented in LICENSE.txt.

It uses other open-source components as acknowledged in README.txt.

Clearwater Architecture

Clearwater was designed from the ground up to be optimized for deployment in virtualized and cloud environments. It leans heavily on established design patterns for building and deploying massively scalable web applications, adapting these design patterns to fit the constraints of SIP and IMS. The Clearwater architecture therefore has some similarities to the traditional IMS architecture but is not identical.

In particular ...

  • All components are horizontally scalable using simple, stateless load-balancing.
  • Minimal long-lived state is stored on cluster nodes, avoiding the need for complex data replication schemes. Most long-lived state is stored in back-end service nodes using cloud-optimized storage technologies such as Cassandra.
  • Interfaces between the front-end SIP components and the back-end services use RESTful web services interfaces.
  • Interfaces between the various components use connection pooling with statistical recycling of connections to ensure load is spread evenly as nodes are added and removed from each layer.

The following diagram shows a Clearwater deployment.

Architecture

Architecture

Bono (Edge Proxy)

The Bono nodes form a horizontally scalable SIP edge proxy providing both a SIP IMS Gm compliant interface and a WebRTC interface to clients. Client connections are load balanced across the nodes. The Bono node provides the anchor point for the client’s connection to the Clearwater system, including support for various NAT traversal mechanisms. A client is therefore anchored to a particular Bono node for the duration of its registration, but can move to another Bono node if the connection or client fails.

Sprout (SIP Router)

The Sprout nodes act as a horizontally scalable, combined SIP registrar and authoritative routing proxy, and handle client authentication and the ISC interface to application servers. The Sprout nodes also contain the in-built MMTEL application server. The Sprout cluster includes a memcached cluster storing client registration data. SIP transactions are load balanced across the Sprout cluster, so there is no long-lived association between a client and a particular Sprout node. Sprout uses web services interfaces provided by Homestead and Homer to retrieve HSS configuration such as authentication data/user profiles and MMTEL service settings.

Homestead (HSS Cache)

Homestead provides a web services interface to Sprout for retrieving authentication credentials and user profile information. It can either master the data (in which case it exposes a web services provisioning interface) or can pull the data from an IMS compliant HSS over the Cx interface. The Homestead nodes run as a cluster using Cassandra as the store for mastered/cached data.

Homer (XDMS)

Homer is a standard XDMS used to store MMTEL service settings documents for each user of the system. Documents are be created, read, updated and deleted using a standard XCAP interface. As with Homestead, the Homer nodes runs as a cluster using Cassandra as the data store.

Ralf (CTF)

Ralf provides an HTTP API that both Bono and Sprout can use to report billable events that should be passed to the CDF (Charging Data Function) over the Rf billing interface. Ralf uses a cluster of Memcached instances and a cluster of Chronos instances to store and manage session state, allowing it to conform to the Rf protocol.

Ellis

Ellis is a sample provisioning portal providing self sign-up, password management, line management and control of MMTEL service settings. It is not intended to be a part of production Clearwater deployments (it is not easy to horizontally scale because of the MySQL underpinnings for one thing) but to make the system easy to use out of the box.

Load Balancing

In a cloud scalable system like Clearwater load balancing is an important part of making the system horizontally scale in a robust way. Clearwater uses a variation on DNS load balancing to ensure even loading when clusters are being elastically resized to adapt to changes in total load.

As an example, a single domain name is configured for all the Sprout nodes. Each Bono node maintains a pool of SIP connections to the Sprout nodes, with the target node for each connection selected at random from the list of addresses returned by DNS. Bono selects a connection at random for each SIP transaction forwarded to Sprout. The connections in the pool are recycled on failure and periodically, selecting a different address from the list returned by the DNS server each time.

A similar technique is used for the HTTP connections between Sprout and Homer/Homestead - each Sprout maintains a pool of connections load balanced across the Homer/Homestead clusters and periodically forces these connections to be recycled.

Reliability and Redundancy

Traditional telco products achieve reliability using low-level data replication, often in a one-to-one design. This is both complex and costly - and does not adapt well to a virtualized/cloud environment.

The Clearwater approach to reliability is to follow common design patterns for scalable web services - keeping most components largely stateless and storing long-lived state in specially design reliable, scalable clustered data stores.

Both Bono and Sprout operate as transaction-stateful rather than dialog-stateful proxies - transaction state is typically short-lived compared to dialog state. As the anchor point for client connections for NAT traversal, the Bono node used remains on the signaling path for the duration of a SIP dialog. Any individual Sprout node is only in the signaling path for the initial transaction, and subsequent requests are routed through the entire Sprout cluster, so failure of a Sprout node does not cause established SIP dialogs to fail. Long-lived SIP state such as registration data and event subscription state is stored in a clustered, redundant shared data store (memcached) so is not tied to any individual Sprout node.

Homer and Homestead similar only retain local state for pending requests - all long lived state is stored redundantly in the associated Cassandra cluster.

Cloud Security

SIP communications are divided into a trusted zone (for flows between Sprout nodes, Bono nodes and trusted application servers) and an untrusted zone (for message flows between Bono nodes and external clients or other systems). These zones use different ports allowing the trusted zone to be isolated using security groups and/or firewall rules, while standard SIP authentication mechanisms are used to protect the untrusted ports.

Other interfaces such as the XCAP and Homestead interfaces use a combination of locked down ports, standard authentication schemes and shared secret API keys for security.

Clearwater Tour

This document is a quick tour of the key features of Clearwater, from a subscriber’s point of view.

Self-service account and line creation

In a browser, go to the Account Signup tab at http://ellis.<domain>/signup.html and supply details as requested:

  • Use whatever name you like.
  • Use a real email address, or the password reset function won’t work.
  • The signup code to enter in the final box should have been given to you by the deployment administrator.
  • Hit “Sign up”.

You are now shown a new account with a phone number (which may take a second or two to show up).

  • The password for each number is only shown once - it is only stored encrypted in the system, for security, so cannot be retrieved by the system later. It can be reset by the user if they forget it (see button towards the right).

Try adding and deleting extra lines. Lines can have non-routable numbers for a purely on-net service, or PSTN numbers. Make sure you end up with just a PSTN number assigned to the account.

Address book: for this system there is a single address book which all numbers are by default added to - see the checkbox by the line.

Connecting an X-Lite client

Now register for the new PSTN line on your newly created account with your as yet unregistered X-Lite phone.

  • Download X-Lite.
  • Install and start X-Lite
  • Navigate to Softphone / Account Settings
  • Account tab:
    • Account name: Whatever you like.
    • User ID: “SIP Username” from the line you’re adding
    • Domain: <deployment domain>, e.g., ngv.example.com
    • Password: “SIP Password” from the line you’re adding
    • Display name: Whatever you like, or omit
    • Authorization name: <SIP Username>@<domain> e.g. 6505550611@ngv.example.com
    • “Register with domain and receive calls” should be selected
    • Send outbound via: Domain
  • Topology tab:
    • Select “Auto-detect firewall traversal method using ICE (recommended)”
    • Server address: <domain>
    • User name: <SIP Username>@<domain>
    • Password: “SIP Password” from the line
  • Transport tab:
    • Signaling transport: TCP
  • hit “OK”
  • phone should register.

We have now registered for the new line.

Connecting an Android phone

See here for instructions.

Making calls

  • Make a call using your new line to a PSTN number (eg your regular mobile phone number). Clearwater performs an ENUM lookup to decide how to route the call, and rings the number.
  • Make a call using a PSTN number (eg your regular mobile phone) to your new line. Use the online global address book in Ellis to find the number to call.
  • Make a call between X-Lite and the Android client, optionally using the address book.

We’ve gone from no user account, no line, and no service to a configured user and working line in five minutes. All of this works without any on-premise equipment, without dedicated servers in a data centre, without any admin intervention at all.

Supported dialling formats

A brief note on supported dialing formats:

  • Dialling between devices on the Clearwater system. It is easiest to always dial the ten digit number, even when calling a PSTN number. Seven digit dialling is not supported, and dialing 1+10D is not supported. In theory dialling +1-10D if the destination is a Clearwater PSTN number should work, but some clients (eg. X-Lite) silently strip out the + before sending the INVITE, so only do this if you are absolutely sure your client supports it.
  • Dialling out to the PSTN. You must dial as per the NANP. Calls to US numbers can use 10D, 1-10D or +1-10D format as appropriate (the latter only if your device supports +), and international calls must use the 011 prefix (+ followed by country code does not work). If you try to make a call using one of these number formats and it doesn’t connect, then it almost certainly wouldn’t connect if you were dialling the same number via the configured SIP trunk directly.

WebRTC support

See WebRTC support in Clearwater for how to use a browser instead of a SIP phone as a client.

VoLTE call services

Call barring
  • From Ellis, press the Configure button on the X-Lite PSTN number, this will pop up the services management interface.
  • Select the Barring tab and choose the bars you wish to apply to incoming/outgoing calls.
  • Click OK then attempt to call X-Lite from the PSTN, see that the call is rejected without the called party being notified of anything.
Call diversion
  • From Ellis, press the Configure button on the X-Lite PSTN number, this will pop up the services management interface.
  • Select the Barring tab and turn off barring.
  • Select the Redirect tab, enable call diversion and add a new rule to unconditionally divert calls to your Android client if using or your chosen call diversion destination if not.
  • Click OK then call X-Lite from the PSTN and see that that number is not notified but the diverted-to number is, and can accept the call.
Other features

Clearwater also supports privacy, call hold, and call waiting.

Exploring Clearwater

After following the Install Instructions you will have a Clearwater deployment which is capable of handling on-net calls and has a number of MMTel services available through the built-in TAS.

Clearwater supports many optional extra features that you may chose to add to your deployment. Descriptions of each feature along with the process to enable it on your system can be found here.

Integrating Clearwater with existing services

An install of Clearwater will, by default, run with no external requirements and allow simple on-net calls between lines provisioned in Ellis. Clearwater also supports interoperating with various other SIP core devices to add more functionality and integrate with an existing SIP core.

Call features

Clearwater has a number of call features provided by its built-in TAS, including:

Clearwater’s support for all IR.92 Supplementary Services is discussed in this document.

Scaling and Redundancy

Clearwater is designed from the ground up to scale horizontally across multiple instances to allow it to handle large subscriber counts and to support a high level of redundancy.

Operational support

To help you manage your deployment, Clearwater provides:

Support

There are 3 support mechanisms for Clearwater.

  • Mailing list - This is for questions about Clearwater.
  • Issues - These are for problems with Clearwater. If you’re not sure whether some behavior is expected or not, it’s best to ask on the mailing list first.
  • Pull Requests - These allow you to contribute fixes and enhancements back to Clearwater.

Mailing List

The mailing list is managed at lists.projectclearwater.org. When posting, please try to

  • provide a clear subject line
  • provide as much diagnostic information as possible.

You’re more likely to get your question answered quickly and correctly that way!

Issues

Clearwater uses github’s Issues system to track problems. To access the Issues system, first determine which project the issue is with - ask in the forums if you’re not sure. Then go to that component’s repository page in github and click the Issues tab at the top.

Remember to do a search for existing issues covering your problem before raising a new one!

Pull Requests

Clearwater follows the “Fork & Pull” model of collaborative development, with changes being offered to the main Clearwater codebase via pull requests. If you’re interested in contributing,

  • thanks!
  • see the github docs for how to create a pull request
  • before creating a pull request, please make sure you’ve successfully
    • ensured the code matches our Ruby or C++ coding guidelines
    • compiled the code
    • run the unit tests for the component
    • run the live tests against your changes.

Troubleshooting and Recovery

This document describes how to troubleshoot some common problems, and associated recovery strategies.

General

  • Clearwater components are monitored by monit and should be restarted by it if the component fails or hangs. You can check that components are running by issuing monit status. If components are not being monitored as expected, run monit monitor <component> to monitor it. To restart a component, we recommend using service <component> stop to stop the component, and allowing monit to automatically start the component again.
  • The Clearwater diagnostics monitor detects crashes in native clearwater processes (Bono, Sprout and Homestead) and captures a diagnostics bundle containing a core file (among other useful information). A diagnostics bundle can also be created by running a command line script (sudo /usr/share/clearwater/bin/gather_diags).
  • By default each component logs to /var/log/<service>/, at log level 2 (which only includes errors and very high level events). To see more detailed logs you can enable debug logging; details for how to do this for each component are below. Note that if you want to run stress through your deployment, you should revert the log levels back to the default level.

Ellis

The most common problem from Ellis is it reporting “Failed to update server”. This can happen for several reasons.

  • If Ellis reports “Failed to update server” when allocating a new number (either after an explicit request or as part of creating a whole new account), check that ellis has free numbers to allocate. The create_numbers.py script is safe to re-run, to ensure that numbers have been allocated.
  • Check the /var/log/ellis/ellis-*.log files. If these indicate that a timeout occurred communicating with Homer or Homestead-prov, check that the DNS entries for Homer and Homestead-prov exist and are configured correctly. If these are already correct, check Homer or Homestead-prov to see if they are behaving incorrectly.
  • To turn on debug logging for Ellis, write LOG_LEVEL = logging.DEBUG to the local_settings.py file (at /usr/share/clearwater/ellis/local_settings.py). Then restart clearwater-infrastructure (sudo service clearwater-infrastructure restart), and restart Ellis (sudo service ellis stop - it will be restarted by monit).

To examine Ellis’ database, run mysql (as root), then type use ellis; to set the correct database. You can then issue standard SQL queries on the users and numbers tables, e.g. SELECT * FROM users WHERE email = '<email address>'.

Homer and Homestead

The most common problem on Homer and Homestead is failing to read or write to the Cassandra database.

  • Check that Cassandra is running (sudo monit status). If not, check its /var/log/cassandra/*.log files.
  • Check that Cassandra is configured correctly. First access the command-line CQL interface by running cqlsh.
    • If you’re on Homer, type use homer; to set the correct database and then describe tables; - this should report simservs. If this is missing, recreate it by running /usr/share/clearwater/cassandra-schemas/homer.sh.
    • If you’re on Homestead, there are 2 databases.
    • Type use homestead_provisioning; to set the provisioning database and then describe tables; - this should report service_profiles, public, implicit_registration_sets and private.
    • Type use homestead_cache; to set the cache database and then describe tables; as before - this should report impi, impi_mapping and impu.
    • If any of these are missing, recreate them by running /usr/share/clearwater/cassandra-schemas/homestead_cache.sh and /usr/share/clearwater/cassandra-schemas/homestead_provisioning.sh.
  • Check that Cassandra is clustered correctly (if running a multi-node system). nodetool status tells you which nodes are in the cluster, and how the keyspace is distributed among them.
  • If this doesn’t help, Homer logs to /var/log/homer/homer-*.log and Homestead logs to /var/log/homestead/homestead-*.log and /var/log/homestead-prov/homestead-*.log. To turn on debug logging for Homer or Homestead-prov, write LOG_LEVEL = logging.DEBUG to the local_settings.py file (at /usr/share/clearwater/<homer|homestead>/local_settings.py). Then restart clearwater-infrastructure (sudo service clearwater-infrastructure restart), and restart Homer/Homestead-prov (sudo service <homer|homestead-prov> stop - they will be restarted by monit). To turn on debug logging for Homestead write log_level=5 to /etc/clearwater/user_settings (creating it if it doesn’t exist already), then restart Homestead (sudo service homestead stop - it will be restarted by monit).

To examine Homer or Homestead’s database, run cqlsh and then type use homer;, use homestead_provisioning; or use homestead_cache to set the correct database. You can then issue CQL queries such as SELECT * FROM impi WHERE private_id = '<private user ID>'.

Sprout

The most common problem on Sprout is lack of communication with other nodes and processes, causing registration or calls to fail. Check that Homer, Homestead and memcached are reachable and responding.

Sprout maintains registration state in a memcached cluster. It’s a little clunky to examine this data but you can get some basic information out by running . /etc/clearwater/config ; telnet $local_ip 11211 to connect to memcached, issuing stats items. This returns a list of entries of the form STAT items:<slab ID>:.... You can then query the keys in each of the slabs with stats cachedump <slab ID> 0.

Memcached logs to /var/log/memcached.log. It logs very little by default, but it is possible to make it more verbose by editing /etc/memcached_11211.conf, uncommenting the -vv line, and then restarting memcached.

To turn on debug logging for Sprout write log_level=5 to /etc/clearwater/user_settings (creating it if it doesn’t exist already), then restart Sprout (sudo service sprout stop - it will be restarted by monit).

Sprout also uses Chronos to track registration, subscription and authorization timeouts. Chronos logs to /var/log/chronos/chronos*. Details of how to edit the Chronos configuration are here.

If you see Sprout dying/restarting with no apparent cause in /var/log/sprout/sprout*.txt, check /var/log/monit.log and /var/log/syslog around that time - these can sometimes give clues as to the cause.

Bono

The most common problem on Bono is lack of communication with Sprout. Check that Sprout is reachable and responding.

To turn on debug logging for Bono write log_level=5 to /etc/clearwater/user_settings (creating it if it doesn’t exist already), then restart Bono (sudo service bono stop - it will be restarted by monit).

If you see Bono dying/restarting with no apparent cause in /var/log/bono/bono*.txt, check /var/log/monit.log and /var/log/syslog around that time - these can sometimes give clues as to the cause.

Ralf

The most common problem on Ralf is lack of communication with a CCF. Check that your CCF is reachable and responding (if you don’t have a CCF, you don’t need a Ralf).

To turn on debug logging for Ralf write log_level=5 to /etc/clearwater/user_settings (creating it if it doesn’t exist already), then restart Ralf (sudo service ralf stop - it will be restarted by monit).

Ralf also uses Chronos to track call timeouts. Chronos logs to /var/log/chronos/chronos*. Details of how to edit the Chronos configuration are here.

If you see Ralf dying/restarting with no apparent cause in /var/log/ralf/ralf*.txt, check /var/log/monit.log and /var/log/syslog around that time - these can sometimes give clues as to the cause.

Deployment Management

Clearwater comes with a system that automate clustering and configuration sharing. If you cannot scale your deployment up or down, or if configuration changes are not being applied, this system may not be working.

  • The management system logs to /var/log/clearwater-etcd, /var/log/clearwater-cluster-manager and /var/log/clearwater-config-manager. To turn on debug logging write log_level=5 to etc/clearwater/user_settings (creating it if it doesn’t exist already), then restart the etcd processes (sudo service <clearwater-config-manager|clearwater-cluster-manager> stop - they will be restarted by monit)

  • /usr/share/clearwater/clearwater-cluster-manager/scripts/check_cluster_state will display information about the state of the various data-store clusters used by Clearwater.

  • sudo /usr/share/clearwater/clearwater-config-manager/scripts/check_config_sync will display whether the node has learned shared configuration.

  • The following commands can be useful for inspecting the state of the underlying etcd cluster used by the management system:

    clearwater-etcdctl cluster-health
    clearwater-etcdctl member list
    

Getting Help

If none of the above helped, please try the mailing list.

Installation Instructions

These pages will guide you through installing a Clearwater deployment from scratch.

Installation Methods

What are my choices?
  1. All-in-one image, either using an AMI on Amazon EC2 or an OVF image on a private virtualization platform such as VMware or VirtualBox. This is the recommended method for trying Clearwater out.

    This installation method is very easy but offers no redundancy or scalability and relatively limited performance. As a result, it is great for familiarizing yourself with Clearwater before moving up to a larger-scale deployment using one of the following methods.

  2. An automated install, using the Chef orchestration system. This is the recommended install for spinning up large scale deployments since it can handle creating an arbitrary sized deployment in a single command.

    This installation method does have its limitations though. It is currently only supported on Amazon’s EC2 cloud and assumes that DNS is being handled by Amazon’s Route53 and that Route53 controls the deployment’s root domain. It also requires access to a running Chef server. Setting this up is non-trivial but only needs to be done once, after which multiple deployments can be spun up very easily.

  3. A manual install, using Debian packages and hand configuring each machine. This is the recommended method if chef is not supported on your virtualization platform or your DNS is not provided by Amazon’s Route53.

    This install can be performed on any collection of machines (at least 5 are needed) running Ubuntu 14.04 as it makes no assumptions about the environment in which the machines are running. On the other hand, it requires manually configuring every machine, firewalls and DNS entries, meaning it is not a good choice for spinning up a large-scale deployment.

  4. Installing from source. If you are running on a non-Ubuntu-based OS, or need to test a code fix or enhancement you’ve made, you can also install Clearwater from source, building the code yourself. Per-component instructions are provided that cover the route from source code to running services. Familiarity with performing a manual install on Ubuntu will help with configuring your network correctly after the software is installed.

    If you install from source, especially on a non-Ubuntu OS, we’d love to hear about your experience, good or bad.

Installation Steps

The installation process for a full Clearwater deployment (i.e. not an all-in-one) can be described at a high level as follows:

  • Sourcing enough machines to host the deployment (minimum is 6) and installing an OS on them all. Ubuntu 14.04 is the recommended and tested OS for hosting Clearwater.
  • Preparing each machine to allow installation of the Clearwater software.
  • Installing the Clearwater software onto the machines.
  • Configuring firewalls to allow the various machines to talk to each other as required.
  • Configuring DNS records to allow the machines to find each other and to expose the deployment to clients.
Detailed Instructions

Once you’ve decided on your install method, follow the appropriate link below.

If you hit problems during this process, see the Troubleshooting and Recovery steps.

If you want to deploy with IPv6 addresses, see the IPv6 notes.

Next Steps

Once you have installed your deployment, you’ll want to test that it works.

All-in-one Images

While Clearwater is designed to be massively horizontally scalable, it is also possible to install all Clearwater components on a single node. This makes installation much simpler, and is useful for familiarizing yourself with Clearwater before moving up to a larger-scale deployment using one of the other installation methods.

This page describes the all-in-one images, their capabilities and restrictions and the installation options available.

Images

All-in-one images consist of

  • Ubuntu 14.04, configured to use DHCP
  • bono, sprout, homestead, homer and ellis
  • Clearwater auto-configuration scripts.

On boot, the image retrieves its IP configuration over DHCP and the auto-configuration scripts then configure the bono, sprout, homestead, homer and ellis software to match.

The image is designed to run on a virtual machine with a single core, 2GB RAM and 8GB of disk space. On EC2, this is an t2.small.

Capabilities and Restrictions

Since the all-in-one images include all of bono, sprout, homestead, homer and ellis, they are capable of almost anything a full deployment is capable of.

The key restrictions of all-in-one images are

  • hard-coded realm - the all-in-one image uses a hard-coded realm of example.com so your SIP URI might be sip:6505551234@example.com - by default, SIP uses this realm for routing but example.com won’t resolve to your host so you need to configure an outbound proxy on all your SIP clients (more details later)
  • performance - since all software runs on a single virtual machine, performance is significantly lower than even the smallest scale deployment
  • scalability - there is no option to scale up and add more virtual machines to boost capacity - for this, you must create a normal deployment
  • fault-tolerance - since everything runs on a single virtual machine, if that virtual machine fails, the service as a whole fails.

Installation Options

All-in-one images can be installed on EC2 or on your own virtualization platform, as long as it supports OVF (Open Virtualization Format).

Manual Build

If your virtualization platform is not EC2 and doesn’t support OVF, you may still be able to manually build an all-in-one node. To do so,

  1. install Ubuntu 14.04 - 64bit server edition

  2. find the preseed/late_command entry in the all-in-one image’s install script - as of writing this is as follows, but please check the linked file for the latest version

    d-i preseed/late_command string in-target bash -c '{ echo "#!/bin/bash" ; \
                                                echo "set -e" ; \
                                                echo "repo=... # filled in by make_ovf.sh" ; \
                                                echo "curl -L https://raw.githubusercontent.com/Metaswitch/clearwater-infrastructure/master/scripts/clearwater-aio-install.sh | sudo bash -s clearwater-auto-config-generic $repo" ; \
                                                echo "python create_numbers.py --start 6505550000 --count 1000" ; \
                                                echo "rm /etc/rc2.d/S99zclearwater-aio-first-boot" ; \
                                                echo "poweroff" ; } > /etc/rc2.d/S99zclearwater-aio-first-boot ; \
                                              chmod a+x /etc/rc2.d/S99zclearwater-aio-first-boot'
    
  3. run the command (stripping the d-i preseed/late_command string in-target prefix, filling in the repo variable - the default is http://repo.cw-ngv.com/stable, and stripping the \) - this will create an /etc/rc2.d/S99zclearwater-aio-first-boot script

  4. run the /etc/rc2.d/S99zclearwater-aio-first-boot script - this will install the all-in-one software and then shutdown (ready for an image to be taken)

  5. restart the node.

Automated Install Instructions

These instructions will take you through preparing for an automated install of Clearwater using Chef. For a high level look at the install process, and a discussion of the various install methods, see Installation Instructions. The automated install is the suggested method for installing a large-scale deployment of Clearwater. It can also be used to install an all-in-one node.

The automated install is only supported for deployments running in Amazon’s EC2 cloud, where DNS is being provided by Amazon’s Route53 service. If your proposed deployment doesn’t meet these requirements, you should use the Manual Install instructions instead.

The Install Process

The automated install system is made up of two phases:

  • A series of steps that need to be performed once, before any deployment is created.
  • A simpler set of instructions that will actually create the deployment.

Once the first phase has been completed, multiple deployments (of various sizes) can be created by repeating the second phase multiple times.

The first phase:

The second phase:

  • Creating a deployment environment
  • The automated install supports the existence and management of multiple deployments simultaneously, each deployment lives in an environment to keep them separate.
  • Creating the deployment
  • Actually provisioning the servers, installing the Clearwater software and configuring DNS.

Next steps

Once you’ve followed the instructions above, your Clearwater deployment is ready for use. Be aware that the newly created DNS entries may take some time to propagate fully through your network and until propagation is complete, your deployment may not function correctly. This propagation usually takes no more than 5 minutes.

To test the deployment, you can try making some real calls, or run the provided live test framework.

Manual Install Instructions

These instructions will take you through installing a minimal Clearwater system using the latest binary packages provided by the Clearwater project. For a high level look at the install process, and a discussion of alternative install methods, see Installation Instructions.

Prerequisites

Before starting this process you will need the following:

  • Six machines running clean installs of Ubuntu 14.04 - 64bit server edition.
    • The software has been tested on Amazon EC2 t2.small instances (i.e. 1 vCPU, 2 GB RAM), so any machines at least as powerful as one of them will be sufficient.
    • Each machine will take on a separate role in the final deployment. The system requirements for each role are the same thus the allocation of roles to machines can be arbitrary.
    • The firewalls of these devices must be independently configurable. This may require some attention when commissioning the machines. For example, in Amazon’s EC2, they should all be created in separate security groups.
    • On Amazon EC2, we’ve tested both within a VPC and without. If using a VPC, we recommend using the “VPC with a Single Public Subnet” model (in the “VPC Wizard”) as this is simplest.
  • A publicly accessible IP address of each of the above machines and a private IP address for each of them (these may be the same address depending on the machine environment). These will be referred to as <publicIP> and <privateIP> below. (If running on Amazon EC2 in a VPC, you must explicitly add this IP address by ticking the “Automatically assign a public IP address” checkbox on creation.)
  • The FQDN of the machine, which resolves to the machine’s public IP address (if the machine has no FQDN, you should instead use the public IP). Referred to as <hostname> below.
  • SSH access to the above machines to a user authorised to use sudo. If your system does not come with such a user pre-configured, add a user with sudo adduser <username> and then authorize them to use sudo with sudo adduser <username> sudo.
  • A DNS zone in which to install your deployment and the ability to configure records within that zone. This zone will be referred to as <zone> below.
  • If you are not using the Project Clearwater provided Debian repository, you will need to know the URL (and, if applicable, the public GPG key) of your repository.

Configure the APT software sources

Configure each machine so that APT can use the Clearwater repository server.

Under sudo, create /etc/apt/sources.list.d/clearwater.list with the following contents:

deb http://repo.cw-ngv.com/stable binary/

Note: If you are not installing from the provided Clearwater Debian repository, replace the URL in this file to point to your Debian package repository.

Once this is created install the signing key used by the Clearwater server with:

curl -L http://repo.cw-ngv.com/repo_key | sudo apt-key add -

You should check the key fingerprint with:

sudo apt-key finger

The output should contain the following - check the fingerprint carefully.

pub   4096R/22B97904 2013-04-30
      Key fingerprint = 9213 4604 DE32 7DF7 FEB7  2026 111D BE47 22B9 7904
uid                  Project Clearwater Maintainers <maintainers@projectclearwater.org>
sub   4096R/46EC5B7F 2013-04-30

Once the above steps have been performed, run the following to re-index your package manager:

sudo apt-get update

Determine Machine Roles

At this point, you should decide (if you haven’t already) which of the six machines will take on which of the Clearwater roles.

The six roles are:

  • ellis
  • bono - This role also hosts a restund STUN server
  • sprout
  • homer
  • homestead
  • ralf

Firewall configuration

We need to make sure the Clearwater nodes can all talk to each other. To do this, you will need to open up some ports in the firewalls in your network. The ports used by Clearwater are listed in Clearwater IP Port Usage. Configure all of these ports to be open to the appropriate hosts before continuing to the next step. If you are running on a platform that has multiple physical or virtual interfaces and the option to apply different firewall rules on each, make sure that you open these ports on the correct interfaces.

Create the per-node configuration.

On each machine create the file /etc/clearwater/local_config with the following contents.

local_ip=<privateIP>
public_ip=<publicIP>
public_hostname=<hostname>
etcd_cluster="<comma separated list of private IPs>"

Note that the etcd_cluster variable should be set to a comma separated list that contains the private IP address of the nodes you created above. For example if the nodes had addresses 10.0.0.1 to 10.0.0.6, etcd_cluster should be set to "10.0.0.1,10.0.0.2,10.0.0.3,10.0.0.4,10.0.0.5,10.0.0.6"

If you are creating a geographically redundant deployment, then:

  • etcd_cluster should contain the IP addresses of nodes in both sites
  • you should set local_site_name and remote_site_name in /etc/clearwater/local_config.

These names are arbitrary, but should reflect the node’s location (e.g. a node in site A should have local_site_name=siteA and remote_site_name=siteB, whereas a node in site B should have local_site_name=siteB and remote_site_name=siteA):

If this machine will be a Sprout or Ralf node create the file /etc/chronos/chronos.conf with the following contents:

[http]
bind-address = <privateIP>
bind-port = 7253
threads = 50

[logging]
folder = /var/log/chronos
level = 2

[alarms]
enabled = true

[exceptions]
max_ttl = 600

Install Node-Specific Software

ssh onto each box in turn and follow the appropriate instructions below according to the role the node will take in the deployment:

Ellis

Install the Ellis package with:

sudo DEBIAN_FRONTEND=noninteractive apt-get install ellis --yes
sudo DEBIAN_FRONTEND=noninteractive apt-get install clearwater-management --yes
Bono

Install the Bono and Restund packages with:

sudo DEBIAN_FRONTEND=noninteractive apt-get install bono restund --yes
sudo DEBIAN_FRONTEND=noninteractive apt-get install clearwater-management --yes
Sprout

Install the Sprout package with:

sudo DEBIAN_FRONTEND=noninteractive apt-get install sprout --yes
sudo DEBIAN_FRONTEND=noninteractive apt-get install clearwater-management --yes

If you want the Sprout nodes to include a Memento Application server, then install the Memento packages with:

sudo DEBIAN_FRONTEND=noninteractive apt-get install memento-as memento-nginx --yes
Homer

Install the Homer packages with:

sudo DEBIAN_FRONTEND=noninteractive apt-get install homer --yes
sudo DEBIAN_FRONTEND=noninteractive apt-get install clearwater-management --yes
Homestead

Install the Homestead packages with:

sudo DEBIAN_FRONTEND=noninteractive apt-get install homestead homestead-prov clearwater-prov-tools --yes
sudo DEBIAN_FRONTEND=noninteractive apt-get install clearwater-management --yes
Ralf

Install the Ralf package with:

sudo DEBIAN_FRONTEND=noninteractive apt-get install ralf --yes
sudo DEBIAN_FRONTEND=noninteractive apt-get install clearwater-management --yes

SNMP statistics

Sprout, Bono and Homestead nodes expose statistics over SNMP. This function is not installed by default. If you want to enable it follow the instruction in our SNMP documentation.

Provide Shared Configuration

Log onto any node in the deployment and create the file /etc/clearwater/shared_config with the following contents:

# Deployment definitions
home_domain=<zone>
sprout_hostname=sprout.<zone>
hs_hostname=hs.<zone>:8888
hs_provisioning_hostname=hs.<zone>:8889
ralf_hostname=ralf.<zone>:10888
xdms_hostname=homer.<zone>:7888

# Email server configuration
smtp_smarthost=<smtp server>
smtp_username=<username>
smtp_password=<password>
email_recovery_sender=clearwater@example.org

# Keys
signup_key=<secret>
turn_workaround=<secret>
ellis_api_key=<secret>
ellis_cookie_key=<secret>

If you wish to enable the optional I-CSCF function, also add the following:

# I-CSCF/S-CSCF configuration
icscf=5052
upstream_hostname=<sprout_hostname>
upstream_port=5052

If you wish to enable the optional external HSS lookups, add the following:

# HSS configuration
hss_hostname=<address of your HSS>
hss_port=3868

If you want to host multiple domains from the same Clearwater deployment, add the following (and configure DNS to route all domains to the same servers):

# Additional domains
additional_home_domains=<domain 1>,<domain 2>,<domain 3>...

If you want your Sprout nodes to include Gemini/Memento Application Servers add the following:

# Application Servers
gemini_enabled='Y'
memento_enabled='Y'

See the Chef instructions for more information on how to fill these in. The values marked <secret> must be set to secure values to protect your deployment from unauthorized access. To modify these settings after the deployment is created, follow these instructions.

Now run the following to upload the configuration to a shared database and propagate it around the cluster.

/usr/share/clearwater/clearwater-config-manager/scripts/upload_shared_config
Setting up S-CSCF configuration

If I-CSCF functionality is enabled, then you will need to set up the S-CSCF configuration. S-CSCF configuration is stored in the /etc/clearwater/s-cscf.json file on each sprout node. The file stores the configuration of each S-CSCF, their capabilities, and their relative weighting and priorities.

If you require I-CSCF functionality, log onto your sprout node and create /etc/clearwater/s-cscf.json with the following contents:

{
   "s-cscfs" : [
       {   "server" : "sip:<sprout_domain>:5054;transport=TCP",
           "priority" : 0,
           "weight" : 100,
           "capabilities" : [<comma separated capabilities>]
       }
   ]
}

Then upload it to the shared configuration database by running sudo /usr/share/clearwater/clearwater-config-manager/scripts/upload_scscf_json. This means that any sprout nodes that you add to the cluster will automatically learn the configuration.

Provision Telephone Numbers in Ellis

Log onto you Ellis node and provision a pool of numbers in Ellis. The command given here will generate 1000 numbers starting at sip:6505550000@<zone>, meaning none of the generated numbers will be routable outside of the Clearwater deployment. For more details on creating numbers, see the create_numbers.py documentation.

sudo bash -c "export PATH=/usr/share/clearwater/ellis/env/bin:$PATH ;
              cd /usr/share/clearwater/ellis/src/metaswitch/ellis/tools/ ;
              python create_numbers.py --start 6505550000 --count 1000"

On success, you should see some output from python about importing eggs and then the following.

Created 1000 numbers, 0 already present in database

This command is idempotent, so it’s safe to run it multiple times. If you’ve run it once before, you’ll see the following instead.

Created 0 numbers, 1000 already present in database

DNS Records

Clearwater uses DNS records to allow each node to find the others it needs to talk to to carry calls. At this point, you should create the DNS entries for your deployment before continuing to the next step. Clearwater DNS Usage describes the entries that are required before Clearwater will be able to carry service.

Although not required, we also suggest that you configure individual DNS records for each of the machines in your deployment to allow easy access to them if needed.

Be aware that DNS record creation can take time to propagate, you can check whether your newly configured records have propagated successfully by running ``dig <record>`` on each Clearwater machine and checking that the correct IP address is returned.

Where next?

Once you’ve reached this point, your Clearwater deployment is ready to handle calls. See the following pages for instructions on making your first call and running the supplied regression test suite.

Larger-Scale Deployments

If you’re intending to spin up a larger-scale deployment containing more than one node of each types, it’s recommended that you use the automated install process, as this makes scaling up and down very straight-forward. If for some reason you can’t, you can add nodes to the deployment using the Elastic Scaling Instructions

Standalone Application Servers

Gemini and Memento can run integrated into the Sprout nodes, or they can be run as standalone application servers.

To install Gemini or Memento as a standalone server, follow the same process as installing a Sprout node, but don’t add them to the existing Sprout DNS cluster.

The sprout_hostname setting in /etc/clearwater/shared_config on standalone application servers should be set to the cluster of the standalone application servers, for example, memento.cw-ngv.com.

All-in-one EC2 AMI Installation

This pages describes how to launch and run an all-in-one image in Amazon’s EC2 environment.

Launch Process

Project Clearwater’s all-in-one node is already available as a pre-built AMI, which can be found in the Community AMIs list on the US East region of EC2. Launching this follows exactly the same process as for other EC2 AMIs.

Before you launch the node, you will need an EC2 keypair, and a security group configured to provide access to the required ports.

To launch the node

  • From the EC2 console, make sure you’re in the US East region, then select “Instances”, “Launch instance” and then “Classic Wizard”
  • Select the “Community AMIs” tab, and search for “Clearwater”
  • Press “Select” for the Clearwater all-in-one AMI. Take the most recent version unless you have a good reason not to.
  • Choose the Instance Type you require (the node runs fine for basic functional testing on an t2.small)
  • From the remaining pages of parameters, the only ones that need to be set are Name (give the node whatever convenient name you wish), keypair and security group.

On the last page, press “Launch”, and wait for the node to be started by EC2.

Running and Using the Image

Once the node has launched, you can SSH to it using the keypair you supplied at launch time, and username ubuntu.

You can then try making your first call and running the live tests - for these you will need the signup key, which is secret. You will probably want to change this to a more secure value - see “Modifying Clearwater settings” for how to do this.

All-in-one OVF Installation

This pages describes how to install an all-in-one image on your own virtualization platform using OVF (Open Virtualization Format).

Supported Platforms

This process should work on any virtualization platform that supports OVFs using x86-64 CPUs, but has only been tested on

The image uses DHCP to gets its IP configuration, so the virtualization platform must either serve DHCP natively or be connected to a network with a DHCP server.

If you install/run the OVF on another platform, please let us know how you get on on the mailing list so we can update this page.

Installation Process

To install the OVF, you must first download it from http://vm-images.cw-ngv.com/cw-aio.ova and save it to disk.

Then, you must import it into your virtualization platform. The process for this varies.

  • On VMware Player, choose File->Open a Virtual Machine from the menu and select the cw-aio.ova file you downloaded. On the Import Virtual Machine dialog that appears, the defaults are normally fine, so you can just click Import.
  • On VirtualBox, choose File->Import Appliance... from the menu. In the Appliance Import Wizard, click Choose..., select the cw-aio.ova file you downloaded and click Next. On the next tab, you can view the settings and then click Import.
  • On VMware ESXi, using the VMware vSphere Client, choose File->Deploy OVF Template... from the menu. Select the cw-aio.ova file you downloaded and click through assigning it a suitable name, location and attached network (which must support DHCP) before finally clicking Finish to create the virtual machine.

Running and Using the Image

Once you’ve installed the virtual machine, you should start it in the usual way for your virtualization platform.

If you attach to the console, you should see an Ubuntu loading screen and then be dropped at a cw-aio login prompt. The username is ubuntu and the password is cw-aio. Note that the console is hard-coded to use a US keyboard, so if you have a different keyboard you might find that keys are remapped - in particular the - key.

The OVF provides 3 network services.

  • SSH - username is ubuntu and password is cw-aio
  • HTTP to ellis for subscriber management - sign-up code is secret. You will probably want to change this to a more secure value - see “Modifying Clearwater settings” for how to do this.
  • SIP to bono for call signaling - credentials are provisioned through ellis.

How these network services are exposed can vary depending on the capabilities of the platform.

  • VMware Player sets up a private network between your PC and the virtual machine and normally assigns the virtual machine IP address 192.168.28.128. To access ellis, you’ll need to point your browser at http://192.168.28.128. To register over SIP, you’ll need to configure an outbound proxy of 192.168.28.128 port 5060.
  • VirtualBox uses NAT on the local IP address, exposing SSH on port 8022, HTTP on port 8080 and SIP on port 8060. To access ellis, you’ll need to point your browser at http://localhost:8080. To register over SIP, you’ll need to configure an outbound proxy of localhost port 8060.
  • VMware ESXi runs the host as normal on the network, so you can connect to it directly. To find out its IP address, log in over the console and type hostname -I. To access ellis, just point your browser at this IP address. To register over SIP, you’ll need to configure an outbound proxy for this IP address.

Once you’ve successfully connected to ellis, try making your first call - just remember to configure the SIP outbound proxy as discussed above.

Installing a Chef server

This is the first step in preparing to install a Clearwater deployment using the automated install process. These instructions will guide you through installing a Chef server on an EC2 instance.

Prerequisites

  • An Amazon EC2 account.
  • A DNS root domain configured as a hosted zone with Route53 (Amazon’s built-in DNS service, accessible from the EC2 console). This domain will be referred to as <zone> in this document.

Create the instance

Create a t2.small AWS EC2 instance running Ubuntu Server 14.04.2 LTS using the AWS web interface. The SSH keypair you provide here is referred to below as <amazon_ssh_key>. It is easiest if you use the same SSH keypair for all of your instances.

Configure its security group to allow access SSH, HTTP, and HTTPS access.

Configure a DNS entry for this machine, chef-server.<zone>. (The precise name isn’t important, but we use this consistently in the documentation that follows.) It should have a non-aliased A record pointing at the public IP address of the instance as displayed in the EC2 console.

Once the instance is up and running and you can connect to it over SSH, you may continue to the next steps.

If you make a mistake, simply delete the instance permanently by selecting “Terminate” in the EC2 console, and start again. The terminated instance may take a few minutes to disappear from the console.

Install and configure the Chef server

The chef documentation explains how to install and configure the chef server. These instructions involve setting up a user and an organization.

  • The user represents you as a user of chef. Pick whatever user name and password you like. We refer to these as <chef-user-name> and <chef-user-password>
  • Organizations allow different groups to use the same chef server, but be isolated from one another. You can choose any organization name you like (e.g. “clearwater”). We refer to this as <org-name>

Follow steps 1-6 in the chef docs.

Once you have completed these steps, copy the <chef-user-name>.pem file off of the chef server - you will need it when installing a chef workstation.

Next steps

Once your server is installed, you can continue on to install a chef workstation.

Installing a Chef Client

These instructions cover commissioning a Chef client node on an EC2 server as part of the automated install process for Clearwater.

Prerequisites

  • An Amazon EC2 account.
  • A DNS root domain configured with Route53 (Amazon’s built-in DNS service, accessible from the EC2 console. This domain will be referred to as <zone> in this document.
  • You must have installed a Chef server and thus know the <chef-user-name> and <chef-user-password> for your server.
  • A web-browser with which you can visit the Chef server Web UI.

Create the instance

Create a t2.micro AWS EC2 instance running Ubuntu Server 14.04.2 LTS using the AWS web interface. Configure its security group to allow access on port 22 (for SSH). The SSH keypair you provide here is referred to below as <amazon_ssh_key>. It is easiest if you use the same SSH keypair for all of your instances.

Configure a DNS entry for this machine, chef workstation.<zone>. (The precise name isn’t important, but we use this consistently in the documentation that follows.) It should have a non-aliased A record pointing at the public IP address of the instance as displayed in the EC2 console.

Once the instance is up and running and you can connect to it over SSH, you may continue to the next steps.

If you make a mistake, simply delete the instance permanently by selecting “Terminate” in the EC2 console, and start again. The terminated instance may take a few minutes to disappear from the console.

Install Ruby 1.9.3

The Clearwater chef plugins use features from Ruby 1.9.3. To start run the following command.

curl -L https://get.rvm.io | bash -s stable

This may fail due to missing GPG signatures. If this happens it will suggest a command to run to resolve the problem (e.g. gpg –keyserver hkp://keys.gnupg.net –recv-keys 409B6B1796C275462A1703113804BB82D39DC0E3`). Run the command suggested, then run the above command again, which should now succeed).

Next install the required ruby version.

source ~/.rvm/scripts/rvm
rvm autolibs enable
rvm install 1.9.3
rvm use 1.9.3

At this point, ruby --version should indicate that 1.9.3 is in use.

Installing the Clearwater Chef extensions

On the chef workstation machine, install git and dependent libraries.

sudo apt-get install git libxml2-dev libxslt1-dev

Clone the Clearwater Chef repository.

git clone -b stable --recursive git://github.com/Metaswitch/chef.git ~/chef

This will have created a chef folder in your home directory, navigate there now.

cd ~/chef

Finally install the Ruby libraries that are needed by our scripts.

bundle install

Creating a Chef user

You will need to configure yourself as a user on the chef server in order to use chef.

  • If you are the person who created the chef server you wil already have added yourself as a user, and will know your username, organization name, and you will have a private key (<chef-user-name>, <org-name>, <chef-user-name>.pem respectively). These will be needed later.

  • If you did not create the chef server, you will need to add an account for yourself. Log SSH on to the chef server and run the following commands, substituting in appropriate values for USER_NAME, FIRST_NAME, LAST_NAME, PASSWORD and ORG_NAME. We’ll refer to the username as <chef-user-name> and the organization as <org-name>. This will create a <chef-user-name>.pem file in the current directory - save it for later.

    sudo chef-server-ctl user-create USER_NAME FIRST_NAME LAST_NAME EMAIL PASSWORD --filename USER_NAME.pem
    sudo chef-server-ctl org-user-add ORG_NAME USER_NAME --admin
    

Configure the chef workstation machine

Back on the chef workstation machine, create a .chef folder in your home directory.

mkdir ~/.chef

Copy the <chef-user-name>.pem file you saved off earlier to ~/.chef/<chef-user-name.pem>

Copy the validator key from the chef server to your client. You will need to either copy the Amazon SSH key to the client and use it, or copy the validator

scp -i <amazon_ssh_key>.pem ubuntu@chef-server.<zone>:<org-name>-validator.pem ~/.chef/

or (on an intermediate box with the SSH key available)

scp -i <amazon_ssh_key>.pem ubuntu@chef-server.<zone>:<org-name>-validator.pem .
scp -i <amazon_ssh_key>.pem <org-name>-validator.pem ubuntu@chef workstation.<zone>:~/.chef/

Configure knife using the built in auto-configuration tool.

knife configure
  • Use the default value for the config location.
  • The Chef server URL should be https://chef-server.<zone>/organizations/<org-name>
  • The Chef client name should be <chef-user-name>
  • The validation client name should be <org-name>-validator
  • The validation key location should be ~/.chef/<org-name>-validator.pem.
  • The chef repository path should be ~/chef/

Obtain AWS access keys

To allow the Clearwater extensions to create AWS instances or configure Route53 DNS entries, you will need to supply your AWS access key and secret access key. To find your AWS keys, you must be logged in as the main AWS user, not an IAM user. Go to http://aws.amazon.com and click on My Account/Console then Security Credentials. From there, under the Access Credentials section of the page, click on the Access Keys tab to view your access key. The access key is referred to as <accessKey> below. To see your secret access key, just click on the Show link under Secret Access Key. The secret access key will be referred to as <secretKey> below.

Add deployment-specific configuration

Now add the following lines to the bottom of your ~/.chef/knife.rb file.

# AWS deployment keys.
knife[:aws_access_key_id]     = "<accessKey>"
knife[:aws_secret_access_key] = "<secretKey>"

# Signup key. Anyone with this key can create accounts
# on the deployment. Set to a secure value.
knife[:signup_key]        = "secret"

# TURN workaround password, used by faulty WebRTC clients.
# Anyone with this password can use the deployment to send
# arbitrary amounts of data. Set to a secure value.
knife[:turn_workaround]   = "password"

# Ellis API key. Used by internal scripts to
# provision, update and delete user accounts without a password.
# Set to a secure value.
knife[:ellis_api_key]     = "secret"

# Ellis cookie key. Used to prevent spoofing of Ellis cookies. Set
# to a secure value.
knife[:ellis_cookie_key]  = "secret"

# SMTP credentials as supplied by your email provider.
knife[:smtp_server]       = "localhost"
knife[:smtp_username]     = ""
knife[:smtp_password]     = ""

# Sender to use for password recovery emails. For some
# SMTP servers (e.g., Amazon SES) this email address
# must be validated or email sending will fail.
knife[:email_sender]      = "clearwater@example.com"

Fill in the values appropriate to your deployment using a text editor as directed.

  • The AWS deployment keys are the ones you obtained above.

  • The keys and passwords marked “Set to a secure value” above should be set to secure random values, to protect your deployment from unauthorised access. An easy way to generate a secure random key on a Linux system is as follows:

    head -c6 /dev/random | base64
    

The signup_key must be supplied by new users when they create an account on the system.

The turn_workaround must be supplied by certain WebRTC clients when using TURN. It controls access to media relay function.

The ellis_api_key and ellis_cookie_key are used internally.

  • The SMTP credentials are required only for password recovery. If you leave them unchanged, this function will not work.

Test your settings

Test that knife is configured correctly

knife client list

This should return a list of clients and not raise any errors.

Upload Clearwater definitions to Chef server

The Chef server needs to be told the definitions for the various Clearwater node types. To do this, run

cd ~/chef
knife cookbook upload apt
knife cookbook upload chef-solo-search
knife cookbook upload clearwater
find roles/*.rb -exec knife role from file {} \;

You will need to re-do this step if make any changes to your knife.rb settings.

Next steps

At this point, the Chef server is up and running and ready to manage installs and the chef client is ready to create deployments. The next step is to create a deployment environment.

Creating a Chef deployment environment

These instructions make up part of the automated install process for Clearwater. They will take you through creating a deployment environment, which will be used to define the specific settings for a Clearwater deployment.

Prerequisites

Creating a keypair

You need to configure the AWS keypair used to access your Clearwater deployment. You may reuse the one you created when creating your Chef client and server; or you may create a new one (EC2 Dashboard > Network & Security > Key Pairs > Create Key Pair) for finer-grained access rights. The name you give your keypair will be referred to as <keypair_name> from this point.

Amazon will give you a <keypair_name>.pem file; copy that file to the /home/ubuntu/.chef/ directory on your Chef client, and make it readable only to your user with chmod 0700 /home/ubuntu/.chef ; chmod 0400 /home/ubuntu/.chef/<keypair_name>.pem.

Creating the environment

Before creating an environment, choose a name (e.g. “clearwater”) which will be referred to as <name> in the following steps. Now, on the chef workstation machine, create a file in ~/chef/environments/<name>.rb with the following contents:

name "<name>"
description "Clearwater deployment - <name>"
cookbook_versions "clearwater" => "= 0.1.0"
override_attributes "clearwater" => {
  "root_domain" => "<zone>",
  "availability_zones" => ["us-east-1a", "us-east-1b"],
  "repo_server" => "http://repo.cw-ngv.com/stable",
  "number_start" => "6505550000",
  "number_count" => 1000,
  "keypair" => "<keypair_name>",
  "keypair_dir" => "~/.chef/",
  "pstn_number_count" => 0,
  "gr" => false
}

Note that the value of keypair should not include the trailing .pem. Note also that for an all-in-one node the root-domain parameter is superfluous and will be ignored.

By default, your deployment will be created in the US East (North Virginia) region. However, if you want to deploy in another region, you must

  • set the region property in the override_attributes "clearwater" block (e.g. to us-west-2)
  • set the availability_zones property correspondingly (e.g. ["us-west-2a", "us-west-2b"])
  • open ~/chef/plugins/knife/boxes.rb, search for @@default_image, and comment in the EC2 AMI entry for the region you desire.

These fields override attributes defined and documented in the clearwater-infrastructure role.

If you want to use a different SIP registration period from the default (which is 5 minutes) add a line like "reg_max_expires" => <timeout_in_secs>, to the override_attributes "clearwater" block.

If you want to test geographic redundancy function, use "gr" => true instead of "gr" => false. This will cause odd-numbered and even-numbered nodes to be treated as separate logical “sites” - they are not actually in different EC2 regions (cross-region security group rules make that difficult), but does allow you to see and test GR function.

To modify these settings after the deployment is created, follow these instructions.

Uploading the environment

The newly created environment needs to be uploaded to the Chef server before it can be used.

cd ~/chef
knife environment from file environments/<name>.rb

Next steps

At this point, your deployment environment is created and can be used to create a new deployment.

Creating a deployment with Chef

This is the final stage in creating a Clearwater deployment using the automated install process. Here we will actually create the deployment - commissioning servers and configuring DNS records.

Prerequisites

Creating a Deployment

You now have two options - you can create an All-in-One node, where all the Clearwater components are run on a single machine instance, or a larger deployment which can potentially have numerous instances of each component.

Creating an All-in-One (“AIO”) node

To create a single machine instance running all the Clearwater components, run the following on the chef workstation machine.

cd ~/chef
knife box create cw_aio -E <name>
Optional arguments

The following modifier is available.

  • --index <value> - Name the new node <name>-cw_aio-<value> to permit distinguishing it from others.
Creating a larger deployment

To kick off construction of the deployment, run the following on the chef workstation machine.

cd ~/chef
knife deployment resize -E <name> --ralf-count 1 -V

Follow the on-screen prompts.

This will:

  • Commission AWS instances
  • Install the Clearwater software
  • Configure security groups
  • Configure DNS
  • Start the Clearwater services
Optional arguments

The following modifiers are available to set the scale of your deployment.

  • --bono-count NUM - Create NUM Bono nodes (default is 1)
  • --sprout-count NUM - Create NUM Sprout nodes (default is 1)
  • --homer-count NUM - Create NUM Homer nodes (default is 1)
  • --homestead-count NUM - Create NUM Homestead nodes (default is 1)
  • --ralf-count NUM - Create NUM Ralf nodes (default is 0)
  • --subscribers NUM - Auto-scale the deployment to handle NUM subscribers.
  • Due to a known limitation of the install process, Ellis will allocate 1000 numbers regardless of this value.
  • To bulk provision subscribers (without using Ellis), follow these instructions

More detailed documentation on the available Chef commands is available here.

Next steps

Now your deployment is installed and ready to use, you’ll want to test it.

Making calls through Clearwater

These instructions will take you through the process of making a call on a Clearwater deployment.

Prerequisites

  • You’ve installed Clearwater
  • You have access to two SIP clients.
  • If you have installed Clearwater on VirtualBox using the All-In-One image you must use Zoiper as one of your clients. For the other client (or for other install modes) you may use any standard SIP client, we’ve tested with the following:
  • You have access to a web-browser. We’ve tested with:
  • Google Chrome

Work out your base domain

If you installed Clearwater manually, your base DNS name will simply be <zone>. If you installed using the automated install process, your base DNS name will be <name>.<zone>. If you installed an All-in-One node, your base name will be example.com.

For the rest of these instructions, the base DNS name will be referred to as <domain>.

Work out your All-in-One node’s identity

This step is only required if you installed an All-in-One node, either from an AMI or an OVF. If you installed manually or using the automated install process, just move on to the next step.

If you installed an All-in-One node from an Amazon AMI, you need the public DNS name that EC2 has assigned to your node. This will look something like ec2-12-34-56-78.compute-1.amazonaws.com and can be found on the EC2 Dashboard on the “instances” panel.

If you installed an All-in-One node from an OVF image in VMPlayer or VMWare, you need the IP address that was assigned to the node via DHCP. You can find this out by logging into the node’s console and typing hostname -I.

If you installed an All-in-One node from an OVF in VirtualBox, you simply need localhost.

For the rest of these instructions, the All-in-One node’s identity will be referred to as <aio-identity>.

Work out your Ellis URL

If you installed Clearwater manually or using the automated install process, your Ellis URL will simply be http://ellis.<domain>. If you installed an All-in-One node, your Ellis URL will be http://<aio-identity>.

Create a number for your client

In your browser, navigate to the Ellis URL you worked out above.

Sign up as a new user, using the signup code you set as signup_key when configuring your deployment.

Ellis will automatically allocate you a new number and display its password to you. Remember this password as it will only be displayed once. From now on, we will use <username> to refer to the SIP username (e.g. 6505551234) and <password> to refer to the password.

Configure your client

Client configuration methods vary by client, but the following information should be sufficient to allow your client to register with Clearwater.

  • SIP Username: <username>
  • SIP Password: <password>
  • SIP Domain: <domain>
  • Authorization Name: <username>@<domain>
  • Transport: TCP
  • STUN/TURN/ICE:
  • Enabled: true
  • Server: <domain> (or <aio-identity> on an All-in-One node)
  • Username: <username>@<domain>
  • Password: <password>

Extra configuration to use an All-in-One node

If you are using an All-in-One node, you will also need to configure an outbound proxy at your client.

  • Outbound Proxy address: <aio-identity>
  • Port: 5060 (or 8060 if installed in VirtualBox)

Once these settings have been applied, your client will register with Clearwater. Note that X-Lite may need to be restarted before it will set up a STUN connection.

Configure a second number and client

Create a new number in Ellis, either by creating a new Ellis user, or by clicking the Add Number button in the Ellis UI to add one for the existing user.

Configure a second SIP client with the new number’s credentials as above.

Make the call

From one client (Zoiper if running an All-in-One node in VirtualBox), dial the <username> of the other client to make the call. Answer the call and check you have two-way media.

Next steps

Now that you’ve got a basic call working, check that all the features of your deployment are working by running the live tests or explore Clearwater to see what else Clearwater offers.

Using the Android SIP Client with Clearwater

These instructions detail how to enable the stock Android SIP client. The instructions vary depending on your handset and Android model, in particular the Nexus 6 running 5.1.1 has different menus to the other known supported devices listed. If you get them working on another device, please add to the list. Equally, if they do not work, please add your device to the unsupported list.

Instructions

  1. Ensure that you have a data connection (either through 3G or WiFi).
  2. Launch your standard dialer (often called Phone or Dialer).
  3. Bring up the menu. Depending on your phone, this may be via a physical Menu button, or via an icon (consisting of three dots in a vertical line in either the top right or bottom right).
  4. From the menu, navigate to the internet calling accounts menu.
    • On many devices this is located in Settings (or Call Settings) > Internet Call Settings > Accounts.
    • On the Nexus 6 running 5.1.1 this is located in Settings > Calls > Calling accounts > Internet Call Settings > Internet calling (SIP) accounts.
  5. Press Add account and enter its configuration.
    • Set Username, Password and Server to their usual values.
    • Open the Optional settings panel to see and set the following options.
    • Set Transport type to TCP.
    • Set Authentication username to “sip:<username>@<server>”, substituting (note: once we switch to storing user digest by private id, this will change)
    • Some SIP clients do not have an Authentication username field. To make calls through Clearwater using these clients you will have to configure an external HSS to allow one subscriber to have two private identities. This will allow the subscriber to have one private identity of the form ``1234@example.com`` (the IMS standard form), and one private identity ``1234`` (which the phone will default to in the absence of a Authentication username).
  6. Once done, choose Save.
  7. Enable making internet calls.
    • On many devices this is located in the Call menu (which can be accessed from the dialler by selecting Settings as above). Once here under Internet Call Settings select Use Internet Calling and set this to Ask for each call.
    • On a Nexus 6 running 5.1.1 this is located in the Calling accounts menu. From here select Use internet calling and set this to Only for Internet Calls.
  8. Enable receiving internet calls (note this significantly reduces your battery life).
    • On many devices this is located in Settings > Internet Call Settings > Accounts once here select Receive Incoming Calls.
    • On the Nexus 6 running 5.1.1 this is located in Settings > Calls > Calling accounts once here select Receive Incoming Calls.
Adding a SIP contact to your phone book
  • For many devices:
    • navigate to the People App. Select + in the top right to add a new contact to the phone book.
    • Under Add more information select Add another field. In this menu choose Internet call and enter your contact’s SIP address.
  • For the Nexus 6 running 5.1.1:
    • navigate to the Contacts App. Select + in the bottom right to add a new contact to the phone book.
    • Under the Sip field enter your contact’s SIP address.
Making a call
  • Make a call from the standard dialler to another Clearwater line, e.g. 6505551234 and hit the Call button.
  • You will be prompted whether to use internet calling. Select internet calling and your call should connect.

Known supported devices

  • Samsung Galaxy Nexus - 4.2.2
  • Orange San Francisco II - 2.3.5
  • HTC One M9 - 5.1
  • Samsung Galaxy S4 - 4.3
  • Nexus 6 - 5.1.1

Using Zoiper with Clearwater

These instructions detail how to configure the Zoiper Android/iOS SIP client to work against a Clearwater system.

Instructions

  1. Download the application from the Play Store or iTunes.
  2. Once installed, go to Config -> Accounts -> Add account -> SIP
  3. Fill in the following details:
  • Account name: Clearwater
  • Host: the root domain of your Clearwater deployment
  • Username: your username, e.g. 6505551234
  • Password: password for this account
  • Authentication user: <username>@<server> e.g. 6505551234@my-clearwater.com
  1. Hit Save
  2. If your account was successfully enabled you should see a green tick notification
  3. Go back to the main Config menu and select Codecs
  4. Unselect everything except uLaw and aLaw.
  5. Hit Save
  6. You are now ready to make calls.

Known supported devices

  • Samsung Galaxy S3
  • Samsung Galaxy S2
  • Samsung Galaxy S4
  • Nexus One
  • Huawei G510
  • Huawei G610
  • Qmobile A8
  • HTC 1X
  • iPhone 4
  • iPhone 4S

Known unsupported devices

  • Galaxy Nexus (no media)

Upgrading a Clearwater Deployment

This article explains how to upgrade a Project Clearwater deployment.

Quick Upgrade

The quickest way to upgrade is to simply upgrade all your nodes in any order. There is no need to wait for one node to finish upgrading before moving onto the next.

This is recommended if you:

  • Do not have a fault-tolerant deployment (e.g. the All-in-One image, or a deployment with one node of each type) OR
  • Do not need to provide uninterrupted service to users of your system.

Note that during a quick upgrade your users may be unable to register, or make and receive calls. If you do have a fault-tolerant deployment and need uninterrupted service, see “Seamless Upgrade” below.

Manual Install

If you installed your system using the Manual Install Instructions, run sudo apt-get update -o Dir::Etc::sourcelist="sources.list.d/clearwater.list" -o Dir::Etc::sourceparts="-" -o APT::Get::List-Cleanup="0" && sudo apt-get install clearwater-infrastructure && sudo clearwater-upgrade on each node.

Chef Install

If you installed your deployment with chef:

  • Follow the instructions to update the Chef server
  • Run sudo apt-get update -o Dir::Etc::sourcelist="sources.list.d/clearwater.list" -o Dir::Etc::sourceparts="-" -o APT::Get::List-Cleanup="0" && sudo apt-get install clearwater-infrastructure && sudo chef-client && sudo clearwater-upgrade on each node.

Seamless Upgrade

If your deployment contains at least two nodes of each type (excluding Ellis) it is possible to perform a seamless upgrade that does not result in any loss of service.

To achieve this the nodes must be upgraded one at a time and in a specific order: first all Ralf nodes (if present), then Homestead, Homer, Sprout, Memento (if present), Gemini (if present), Bono, and finally Ellis.

For example if your deployment has two Homesteads, two Homers, two Sprouts, two Bonos, and one Ellis, you should upgrade them in the following order:

  • Homestead-1
  • Homestead-2
  • Homer-1
  • Homer-2
  • Sprout-1
  • Sprout-2
  • Bono-1
  • Bono-2
  • Ellis
Manual Install

If you installed your system using the Manual Install Instructions run sudo apt-get update -o Dir::Etc::sourcelist="sources.list.d/clearwater.list" -o Dir::Etc::sourceparts="-" -o APT::Get::List-Cleanup="0" && sudo apt-get install clearwater-infrastructure && sudo clearwater-upgrade on each node in the order described above.

Chef Install

If you installed your deployment with chef:

  • Follow the instructions to update the Chef server
  • Run sudo apt-get update -o Dir::Etc::sourcelist="sources.list.d/clearwater.list" -o Dir::Etc::sourceparts="-" -o APT::Get::List-Cleanup="0" && sudo apt-get install clearwater-infrastructure && sudo chef-client && sudo clearwater-upgrade on each node in the order described above.

Modifying Clearwater Settings

This page discusses how to change settings on a Clearwater system. Most settings can be changed on an existing deployment (e.g. security keys and details of external servers), but some are so integral to the system (e.g. the SIP home domain) that the best way to change it is to recreate the Clearwater deployment entirely.

Modifying Settings in /etc/clearwater/shared_config

This will have a service impact of up to five minutes.

Settings in /etc/clearwater/shared_config can be safely changed without entirely recreating the system. The one exception to this is the home_domain; if you want to change this go to the “Starting from scratch” section instead.

To change one of these settings, if you are using Clearwater’s automatic configuration sharing functionality:

  • Edit /etc/clearwater/shared_config on one node and change to the new value.
  • Run /usr/share/clearwater/clearwater-config-manager/scripts/upload_shared_config to upload the new config to etcd.

If you are not using automatic clustering, do the following on each node:

  • Edit /etc/clearwater/shared_config and change the setting to the new value.
  • Run sudo service clearwater-infrastructure restart to ensure that the configuration changes are applied consistently across the node.
  • Restart the individual component by running the appropriate command below. This will stop the service and allow monit to restart it.
    • Sprout - sudo service sprout quiesce
    • Bono - sudo service bono quiesce
    • Homestead - sudo service homestead stop && sudo service homestead-prov stop
    • Homer - sudo service homer stop
    • Ralf -sudo service ralf stop
    • Ellis - sudo service ellis stop
    • Memento - sudo service memento stop

Modifying Sprout JSON Configuration

This configuration can be freely modified without impacting service.

Some of the more complex sprout-specific configuration is stored in JSON files

  • /etc/clearwater/s-cscf.json - contains information to allow the Sprout I-CSCF to select an appropriate S-CSCF to handle some requests.
  • /etc/clearwater/bgcf.json - contains routing rules for the Sprout BGCF.
  • /etc/clearwater/enum.json - contains ENUM rules when using file-based ENUM instead of an external ENUM server.

To change one of these files, if you are using Clearwater’s automatic configuration sharing functionality:

  • Edit the file on one of your sprout nodes.
  • Run one of sudo /usr/share/clearwater/clearwater-config-manager/scripts/upload_{scscf|bgcf|enum}_json depending on which file you modified.
  • The change will be automatically propagated around the deployment and will start being used.

If you are not using automatic clustering do the following on each node:

  • Make the necessary changes to the file.
  • Run sudo service sprout reload to make sprout pick up the changes.

Starting from scratch

This will have a service impact of up to half an hour.

If other settings (such as the Clearwater home domain) are being changed, we recommend that users delete their old deployment and create a new one from scratch, either with Chef or manually. This ensures that the new settings are consistently applied.

Clearwater Configuration Options Reference

This document describes all the Clearwater configuration options that can be set in /etc/clearwater/shared_config, /etc/clearwater/local_config or /etc/clearwater/user_settings.

You should follow this process when changing most of these settings. However for settings in the “Local settings” or “User settings” you should:

  • Modify the configuration file
  • Run sudo service clearwater-infrastructure restart to regenerate any dependent configuration files
  • Restart the relevant Clearwater service(s) using the following commands as appropriate for the node.
    • Sprout - sudo service sprout quiesce
    • Bono - sudo service bono quiesce
    • Homestead - sudo service homestead stop && sudo service homestead-prov stop
    • Homer - sudo service homer stop
    • Ralf -sudo service ralf stop
    • Ellis - sudo service ellis stop
    • Memento - sudo service memento stop

Local Settings

This section describes settings that are specific to a single node and are not applicable to any other nodes in the deployment. They are entered early on in the node’s life and are not normally changed. These options should be set in /etc/clearwater/local_config. Once this file has been created it is highly recommended that you do not change it unless instructed to do so. If you find yourself needing to change these settings, you should destroy and recreate then node instead.

  • local_ip - this should be set to an IP address which is configured on an interface on this system, and can communicate on an internal network with other Clearwater nodes and IMS core components like the HSS.
  • public_ip - this should be set to an IP address accessible to external clients (SIP UEs for Bono, web browsers for Ellis). It does not need to be configured on a local interface on the system - for example, in a cloud environment which puts instances behind a NAT.
  • public_hostname - this should be set to a hostname which resolves to public_ip, and will communicate with only this node (i.e. not be round-robined to other nodes). It can be set to public_ip if necessary.
  • node_idx - an index number used to distinguish this node from others of the same type in the cluster (for example, sprout-1 and sprout-2). Optional.
  • etcd_cluster - this is a comma separated list of IP addresses, for example etcd_cluster=10.0.0.1,10.0.0.2. It should be set on one of two ways:
  • If the node is forming a new deployment, it should contain the IP addresses of all the nodes that are forming the new deployment (including this node).
  • If the node is joining an existing deployment, it should contain the IP addresses of all the nodes that are currently in the deployment.
  • etcd_cluster_key - this is the name of the etcd datastore clusters that this node should join. It defaults to the function of the node (e.g. a Homestead node defaults to using ‘homestead’ as its etcd datastore cluster name when it joins the Cassandra cluster). This must be set explicitly on nodes that colocate function.

Core options

This section describes options for the basic configuration of a Clearwater deployment - such as the hostnames of the six node types and external services such as email servers or the Home Subscriber Server. These options should be set in the /etc/clearwater/shared_config file (in the format name=value, e.g. home_domain=example.com).

  • home_domain - this is the main SIP domain of the deployment, and determines which SIP URIs Clearwater will treat as local. It will usually be a hostname resolving to all the P-CSCFs (e.g. the Bono nodes). Other domains can be specified through additional_home_domains, but Clearwater will treat this one as the default (for example, when handling tel: URIs).
  • sprout_hostname - a hostname that resolves by DNS round-robin to all Sprout nodes in the cluster.
  • bono_hostname - a hostname that resolves by DNS round-robin to all Bono nodes in the cluster.
  • hs_hostname - a hostname that resolves by DNS round-robin to all Homesteads in the cluster. Should include the HTTP port (usually 8888). This is also used (without the port) as the Origin-Realm of the Diameter messages Homestead sends.
  • hs_provisioning_hostname - a hostname that resolves by DNS round-robin to all Homesteads in the cluster. Should include the HTTP provisioning port (usually 8889). Not needed when using an external HSS.
  • ralf_hostname - a hostname that resolves by DNS round-robin to all Ralf nodes in the cluster. Should include the port (usually 9888). This is also used (without the port) as the Origin-Realm of the Diameter messages Ralf sends. Optional if no Ralf nodes exist.
  • cdf_identity - a Diameter identity that represents the address of an online Charging Function. Subscribers provisioned through Ellis will have this set as their Primary Charging Collection Function on P-Charging-Function-Addresses headers on responses to their successful REGISTERs, and Bono will add similarly in originating requests.
  • xdms_hostname - a hostname that resolves by DNS round-robin to all Homer nodes in the cluster. Should include the port (usually 7888).
  • hss_realm - this sets the Destination-Realm of your external HSS. When this field is set, Homestead will then attempt to set up multiple Diameter connections using an SRV lookup on this realm.
  • hss_hostname - this sets the Destination-Host of your external HSS, if you have one. Homestead will also try and establish a Diameter connection to this host (on port 3868) if no SRV-discovered peers exist.
  • signup_key - this sets the password which Ellis will require before allowing self-sign-up.
  • turn_workaround - if your STUN/TURN clients are not able to authenticate properly (for example, because they can’t send the @ sign), this specifies an additional password which will authenticate clients even without a correct username.
  • smtp_smarthost - Ellis allows password recovery by email. This sets the SMTP server used to send those emails.
  • smtp_username - Ellis allows password recovery by email. This sets the username used to log in to the SMTP server.
  • smtp_password - Ellis allows password recovery by email. This sets the password used to log in to the SMTP server.
  • email_recovery_sender - Ellis allows password recovery by email. This sets the email address those emails are sent from.
  • ellis_api_key - sets a key which can be used to authenticate automated requests to Ellis, by setting it as the value of the X-NGV-API header. This is used to expire demo users regularly.
  • ellis_hostname - a hostname that resolves to Ellis, if you don’t want to use ellis.home_domain. This should match Ellis’s SSL certificate, if you are using one.
  • memento_hostname - a hostname that resolves by DNS round-robin to all Mementos in the cluster (the default is memento.<home_domain>). This should match Memento’s SSL certificate, if you are using one.

Advanced options

This section describes optional configuration options, particularly for ensuring conformance with other IMS devices such as HSSes, ENUM servers, application servers with strict requirements on Record-Route headers, and non-Clearwater I-CSCFs. These options should be set in the /etc/clearwater/shared_config file (in the format name=value, e.g. icscf=5052).

  • icscf - the port which Sprout nodes are providing I-CSCF service on. If not set, Sprout will only provide S-CSCF function.

  • scscf - the port which Sprout nodes are providing S-CSCF service on. If this not set but icscf is, Sprout will only provide I-CSCF function. If neither is set, this will default to 5054 and Sprout will only provide S-CSCF function.

  • homestead_provisioning_port - the HTTP port the Homestead provisioning interface listens on. Defaults to 8889. Not needed when using an external HSS.

  • sas_server - the IP address or hostname of your Metaswitch Service Assurance Server for call logging and troubleshooting. Optional.

  • reg_max_expires - determines the maximum expires= parameter Sprout will set on Contact headers at registrations, and therefore the amount of time before a UE has to re-register - must be less than 2^31 ms (approximately 25 days). Default is 300 (seconds).

  • sub_max_expires - determines the maximum Expires header Sprout will set in subscription responses, and therefore the amount of time before a UE has to re-subscribe - must be less than 2^31 ms (approximately 25 days).

  • upstream_hostname - the I-CSCF which Bono should pass requests to. Defaults to the sprout_hostname.

  • upstream_port - the port on the I-CSCF which Bono should pass requests to. Defaults to 5052. If set to 0, Bono will use SRV resolution of the upstream_hostname hostname to determine a target for traffic.

  • sprout_rr_level - this determines how the Sprout S-CSCF adds Record-Route headers. Possible values are:

    • pcscf - a Record-Route header is only added just after requests come from or go to a P-CSCF - that is, at the start of originating handling and the end of terminating handling
    • pcscf,icscf - a Record-Route header is added just after requests come from or go to a P-CSCF or I-CSCF - that is, at the start and end of originating handling and the start and end of terminating handling
    • pcscf,icscf,as - a Record-Route header is added after requests come from or go to a P-CSCF, I-CSCF or application server - that is, at the start and end of originating handling, the start and end of terminating handling, and between each application server invoked
  • force_hss_peer - when set to an IP address or hostname, Homestead will create a connection to the HSS using this value, but will still use the hss_realm and hss_hostname settings for the Destination-Host and Destination-Realm Diameter AVPs. This is useful when your HSS’s Diameter configuration does not match the DNS records.

  • hss_mar_lowercase_unknown - some Home Subscriber Servers (particularly old releases of OpenIMSCore HSS) expect the string ‘unknown’ rather than ‘Unknown’ in Multimedia-Auth-Requests when Clearwater cannot tell what authentication type is expected. Setting this option to ‘Y’ will make Homestead send requests in this format.

  • hss_mar_force_digest - if Clearwater cannot tell what authentication type a subscriber is trying to use, this forces it to assume ‘SIP Digest’ and report that in the Multimedia-Auth-Request, rather than ‘Unknown’.

  • hss_mar_force_aka - if Clearwater cannot tell what authentication type a subscriber is trying to use, this forces it to assume ‘Digest-AKA-v1’ and report that in the Multimedia-Auth-Request, rather than ‘Unknown’.

  • force_third_party_reg_body - if the HSS does not allow the IncludeRegisterRequest/IncludeRegisterResponse fields (which were added in 3GPP Rel 9) to be configured, setting force_third_party_reg_body=Y makes Clearwater behave as though they had been sent, allowing interop with application servers that need them.

  • enforce_user_phone - by default, Clearwater will do an ENUM lookup on any SIP URI that looks like a phone number, due to client support for user-phone not being widespread. When this option is set to ‘Y’, Clearwater will only do ENUM lookups for URIs which have the user=phone parameter.

  • enforce_global_only_lookups - by default, Clearwater will do ENUM lookups for SIP and Tel URIs containing global and local numbers (as defined in RFC 3966). When this option is set to ‘Y’, Clearwater will only do ENUM lookups for SIP and Tel URIs that contain global numbers.

  • hs_listen_port - the Diameter port which Homestead listens on. Defaults to 3868.

  • ralf_listen_port - the Diameter port which Ralf listens on. Defaults to 3869 to avoid clashes when colocated with Homestead.

  • alias_list - this defines additional hostnames and IP addresses which Sprout or Bono will treat as local for the purposes of SIP routing (e.g. when removing Route headers).

  • default_session_expires - determines the Session-Expires value which Sprout will add to INVITEs, to force UEs to send keepalive messages during calls so they can be tracked for billing purposes. This cannot be set to a value less than 90 seconds, as specified in RFC 4028, section 4.

  • max_session_expires - determines the maximum Session-Expires/Min-SE value which Sprout will accept in requests. This cannot be set to a value less than 90 seconds, as specified in RFC 4028, sections 4 and 5.

  • enum_server - a comma-separated list of DNS servers which can handle ENUM queries.

  • enum_suffix - determines the DNS suffix used for ENUM requests (after the digits of the number). Defaults to “e164.arpa”

  • enum_file - if set (to a file path), and if enum_server is not set, Sprout will use this local JSON file for ENUM lookups rather than a DNS server. An example file is at http://clearwater.readthedocs.org/en/stable/ENUM/index.html#deciding-on-enum-rules.

  • icscf_uri - the SIP address of the external I-CSCF integrated with your Sprout node (if you have one).

  • scscf_uri - the SIP address of the Sprout S-CSCF. This defaults to sip:$sprout_hostname:$scscf;transport=TCP - this includes a specific port, so if you need NAPTR/SRV resolution, it must be changed to not include the port.

  • additional_home_domains - this option defines a set of home domains which Sprout and Bono will regard as locally hosted (i.e. allowing users to register, not routing calls via an external trunk). It is a comma-separated list.

  • billing_realm - this sets the Destination-Realm on Diameter messages to your external CDR. CDR connections are not based on this but on configuration at the P-CSCF (which sets the P-Charging-Function-Addresses header).

  • diameter_timeout_ms - determines the number of milliseconds Homestead will wait for a response from the HSS before failing a request. Defaults to 200.

  • max_peers - determines the maximum number of Diameter peers which Ralf or Homestead can have open connections to at the same time.

  • num_http_threads (Ralf/Memento) - determines the number of threads that will be used to process HTTP requests. For Memento this defaults to the number of CPU cores on the system. For Ralf it defaults to 50 times the number of CPU cores (Memento and Ralf use different threading models, hence the different defaults). Note that for Homestead, this can only be set in /etc/clearwater/user_settings.

  • num_http_worker_threads - determines the number of threads that will be used to process HTTP requests once they have been parsed. Only used by Memento.

  • ralf_diameteridentity - determines the Origin-Host that will be set on the Diameter messages Ralf sends. Defaults to public_hostname (with some formatting changes if public_hostname is an IPv6 address).

  • hs_diameteridentity - determines the Origin-Host that will be set on the Diameter messages Homestead sends. Defaults to public_hostname (with some formatting changes if public_hostname is an IPv6 address).

  • gemini_enabled - When this field is set to ‘Y’, then the node (either a Sprout or a standalone application server) will include a Gemini AS.

  • memento_enabled - When this field is set to ‘Y’, then the node (either a Sprout or a standalone application server) will include a Memento AS.

  • max_call_list_length - determines the maximum number of complete calls a subscriber can have in the call list store. This defaults to no limit. This is only relevant if the node includes a Memento AS.

  • call_list_store_ttl - determines how long each call list fragment should be kept in the call list store. This defaults to 604800 seconds (1 week). This is only relevant if the node includes a Memento AS.

  • memento_disk_limit - determines the maximum size that the call lists database may occupy. This defaults to 20% of disk space. This is only relevant if the node includes a Memento AS. Can be specified in Bytes, Kilobytes, Megabytes, Gigabytes, or a percentage of the available disk. For example:

    memento_disk_limit=10240 # Bytes
    memento_disk_limit=100k  # Kilobytes
    memento_disk_limit=100M  # Megabytes
    memento_disk_limit=100G  # Gigabytes
    memento_disk_limit=45%   # Percentage of available disk
    
  • memento_threads - determines the number of threads dedicated to adding call list fragments to the call list store. This defaults to 25 threads. This is only relevant if the node includes a Memento AS.

  • memento_notify_url - If set to an HTTP URL, memento will make a POST request to this URL whenever a subscriber’s call list changes. The body of the POST request will be a JSON document with the subscriber’s IMPU in a field named impu. This is only relevant if the node includes a Memento AS. If empty, no notifications will be sent. Defaults to empty.

  • signaling_dns_server - a comma-separated list of DNS servers for non-ENUM queries. Defaults to 127.0.0.1 (i.e. uses dnsmasq)

  • target_latency_us - Target latency (in microsecs) for requests above which throttling applies. This defaults to 100000 microsecs

  • max_tokens - Maximum number of tokens allowed in the token bucket (used by the throttling code). This defaults to 20 tokens

  • init_token_rate - Initial token refill rate of tokens in the token bucket (used by the throttling code). This defaults to 250 tokens per second per core

  • min_token_rate - Minimum token refill rate of tokens in the token bucket (used by the throttling code). This defaults to 10.0

  • override_npdi - Whether the I-CSCF, S-CSCF and BGCF should check for number portability data on requests that already have the ‘npdi’ indicator. This defaults to false

  • exception_max_ttl - determines the maximum time before a process exits if it crashes. This defaults to 600 seconds

  • check_destination_host - determines whether the node checks the Destination-Host on a Diameter request when deciding whether it should process the request. This defaults to true.

  • astaire_cpu_limit_percentage - the maximum percentage of total CPU that Astaire is allowed to consume when resyncing memcached data (as part of a scale-up, scale-down, or following a memcached failure). Note that this only limits the CPU usage of the Astaire process, and does not affect memcached’s CPU usage. Must be an integer. Defaults to 5.

  • sip_blacklist_duration - the time in seconds for which SIP peers are blacklisted when they are unresponsive (defaults to 30 seconds).

  • http_blacklist_duration - the time in seconds for which HTTP peers are blacklisted when they are unresponsive (defaults to 30 seconds).

  • diameter_blacklist_duration - the time in seconds for which Diameter peers are blacklisted when they are unresponsive (defaults to 30 seconds).

  • snmp_ip - the IP address to send alarms to (defaults to being unset). If this is set then Sprout, Ralf, Homestead and Chronos will send alarms - more details on the alarms are here. This can be a single IP address, or a comma-separated list of IP addresses.

  • impu_cache_ttl - the number of seconds for which Homestead will cache the SIP Digest from a Multimedia-Auth-Request. Defaults to 0, as Sprout does enough caching to ensure that it can handle an authenticated REGISTER after a challenge, and subsequent challenges should be rare.

  • sip_tcp_connect_timeout - the time in milliseconds to wait for a SIP TCP connection to be established (defaults to 2000 milliseconds).

  • sip_tcp_send_timeout - the time in milliseconds to wait for sent data to be acknowledgered at the TCP level on a SIP TCP connection (defaults to 2000 milliseconds).

  • session_continued_timeout_ms - if an Application Server with default handling of ‘continue session’ is unresponsive, this is the time that Sprout will wait (in milliseconds) before bypassing the AS and moving onto the next AS in the chain (defaults to 2000 milliseconds).

  • session_terminated_timeout_ms - if an Application Server with default handling of ‘terminate session’ is unresponsive, this is the time that Sprout will wait (in milliseconds) before terminating the session (defaults to 4000 milliseconds).

  • pbxes - a comma separated list of IP address that Bono considers to be PBXes that are incapable of registering. Non-REGISTER requests from these addresses are passed upstream to Sprout with a Proxy-Authorization header. It is strongly recommended that Sprout’s non_register_authentication option is set to if_proxy_authorization_present so that the request will be challenged. Bono also permits requests to these addresses from the core to pass through it.

  • pbx_service_route - the SIP URI to which Bono routes originating calls from non-registering PBXes (which are identified by the pbxes option). This is used to route requests directly to the S-CSCF rather than going via an I-CSCF (which could change the route header and prevent the S-CSCF from processing the request properly). This URI is used verbatim and should almost always include the lr, orig, and auto-reg parameters. If this option is not specified, the requests are routed to the address specified by the upstream_hostname and upstream_port options.

    • e.g. sip:sprout.example.com:5054;transport=tcp;lr;orig;auto-reg
  • non_register_authentication - controls when Sprout will challenge a non-REGISTER request using SIP Proxy-Authentication. Possible values are never (meaning Sprout will never challenge) or if_proxy_authorization_present (meaning Sprout will only challenge requests that have a Proxy-Authorization header).

  • ralf_threads - used on Sprout nodes, this determines how many worker threads should be started to do Ralf request processing (defaults to 25).

Experimental options

This section describes optional configuration options which may be useful, but are not heavily-used or well-tested by the main Clearwater development team. These options should be set in the /etc/clearwater/shared_config file (in the format name=value, e.g. cassandra_hostname=db.example.com).

  • cassandra_hostname - if using an external Cassandra cluster (which is a fairly uncommon configuration), a hostname that resolves to one or more Cassandra nodes.
  • ralf_secure_listen_port - this determines the port Ralf listens on for TLS-secured Diameter connections.
  • hs_secure_listen_port - this determines the port Homestead listens on for TLS-secured Diameter connections.
  • ellis_cookie_key - an arbitrary string that enables Ellis nodes to determine whether they should be in the same cluster. This function is not presently used.
  • stateless_proxies - a comma separated list of domain names that are treated as SIP stateless proxies. Stateless proxies are not blacklisted if a SIP transaction sent to them times out. This field should reflect how the servers are identified in SIP. For example if a cluster of nodes is identified by the name ‘cluster.example.com’, the option should be set to ‘cluster.example.com’ instead of the hostnames or IP addresses of individual servers.
  • hss_reregistration_time - determines how many seconds should pass before Homestead sends a Server-Assignment-Request with type RE_REGISTRATION to the HSS. (On first registration, it will always send a SAR with type REGISTRATION). This determines a minimum value - after this many seconds have passed, Homestead will send the Server-Assignment-Request when the next REGISTER is received. Note that Homestead invalidates its cache of the registration and iFCs after twice this many seconds have passed, so it is not safe to set this to less than half of reg_max_expires. The default value of this option is whichever is the greater of the following.
    • Half of the value of reg_max_expires.

User settings

This section describes settings that may vary between systems in the same deployment, such as log level (which may be increased on certain machines to track down specific issues) and performance settings (which may vary if some servers in your deployment are more powerful than others). These settings are set in /etc/clearwater/user_settings, not /etc/clearwater/shared_config (in the format name=value, e.g. log_level=5).

  • log_level - determines how verbose Clearwater’s logging is, from 1 (error logs only) to 5 (debug-level logs). Defaults to 2.
  • log_directory - determines which folder the logs are created in. This folder must exist, and be owned by the service. Defaults to /var/log/ (this folder is created and has the correct permissions set for it by the install scripts of the service).
  • max_log_directory_size - determines the maximum size of each Clearwater process’s log_directory in bytes. Defaults to 1GB. If you are co-locating multiple Clearwater processes, you’ll need to reduce this value proportionally.
  • num_pjsip_threads - determines how many PJSIP transport-layer threads should run at once. Defaults to 1, and it may be dangerous to change this as it is not necessarily thread-safe.
  • num_worker_threads - for Sprout and Bono nodes, determines how many worker threads should be started to do SIP/IMS processing. Defaults to 50 times the number of CPU cores on the system.
  • upstream_connections - determines the maximum number of TCP connections which Bono will open to the I-CSCF(s). Defaults to 50.
  • upstream_recycle_connections - the average number of seconds before Bono will destroy and re-create a connection to Sprout. A higher value means slightly less work, but means that DNS changes will not take effect as quickly (as new Sprout nodes added to DNS will only start to receive messages when Bono creates a new connection and does a fresh DNS lookup).
  • authentication - by default, Clearwater performs authentication challenges (SIP Digest or IMS AKA depending on HSS configuration). When this is set to ‘Y’, it simply accepts all REGISTERs - obviously this is very insecure and should not be used in production.
  • num_http_threads (Homestead) - determines the number of HTTP worker threads that will be used to process requests. Defaults to 50 times the number of CPU cores on the system.

Other configuration options

There is further documentation for Chronos configuration here and Homer/Homestead-prov configuration here.

Migrating to Automatic Clustering and Configuration Sharing

Clearwater now supports an automatic clustering and configuration sharing feature. This makes Clearwater deployments much easier to manage. However deployments created before the ‘For Whom The Bell Tolls’ release do not use this feature. This article explains how to migrate a deployment to take advantage of the new feature.

Upgrade the Deployment

Upgrade to the latest stable Clearwater release. You will also need to update your firewall settings to support the new clearwater management packages; open port 2380 and 4000 between every node (see here for the complete list).

Verify Configuration Files

Do the following on each node in turn:

  1. Run /usr/share/clearwater/infrastructure/migration-utils/configlint.py. This examines the existing /etc/clearwater/config file and checks that the migration scripts can handle all the settings defined in it.
  2. If configlint.py produces a warning about a config option, this can mean one of two things:
    • The config option is invalid (for example, because there is a typo, or this option has been retired). Check the configuration options reference for a list of valid options.
    • The config option is valid, but the migration script doesn’t recognise the option and won’t automatically migrate it. In this case, you will need to make a note of this config option now, and add it back in after the rest of the migration has run. (A later step in this process covers that.)

Once you have checked your configuration file and taken a note of any unrecognised settings, continue with the next step.

Prepare Local Configuration Files

Do the following on each node in turn:

  1. Run sudo /usr/share/clearwater/infrastructure/migration-utils/migrate_local_config /etc/clearwater/config. This examines the existing /etc/clearwater/config file and produces a new /etc/clearwater/local_config which contains the settings only relevant to this node. Check that this file looks sensible.
  2. Edit /etc/clearwater/local_config to add a line etcd_cluster="<NodeIPs>" where NodeIPs is a comma separated list of the private IP addresses of nodes in the deployment. For example if your deployment contained nodes with IP addresses of 10.0.0.1 to 10.0.0.6, NodeIPs would be 10.0.0.1,10.0.0.2,10.0.0.3,10.0.0.4,10.0.0.5,10.0.0.6. If your deployment was GR, this should include the IP addresses of nodes in both sites.
  3. If your deployment was geographically redundant, you should choose arbitrary names for each site (e.g. ‘site1’ and ‘site2’), and set the local_site_name and remote_site_name settings in /etc/clearwater/local_config accordingly. For example, if the node is in ‘site1’, you should have local_site_name=site1 and remote_site_name=site2.
  4. If the node is a Sprout or Ralf node, run sudo /usr/share/clearwater/bin/chronos_configuration_split.py. This examines the existing /etc/chronos/chronos.conf file and extracts the clustering settings into a new file called /etc/chronos/chronos_cluster.conf. Check each of these files by hand to make sure they look sensible. If the chronos_cluster.conf file already exists, then the script will exit with a warning. In this case, please check the configuration files by hand, and either delete the chronos_cluster.conf file and re-run the script, or manually split the configuration yourself. Details of the expected configuration are here.
  5. Run sudo touch /etc/clearwater/no_cluster_manager on all nodes. This temporarily disables the cluster manager (which is installed in the next step) so that you can program it with the current deployment topology.

Prepare Shared Configuration Files

This step merges the config from all the nodes in your deployment into a single config file that will be managed by etcd.

  1. Log onto to one of the nodes in the deployment and create a temporary directory (e.g. ~/config-migration). Copy the /etc/clearwater/config file from each node in the deployment to this directory. Give each of these a name of the form <node>_orig_config, e.g. sprout-1_orig_config.
  2. Run sudo /usr/share/clearwater/infrastructure/migration-utils/migrate_shared_config, passing it all the config files in the temporary directory
    • sudo /usr/share/clearwater/infrastructure/migration-utils/migrate_shared_config *_orig_config
  3. This will produce the file /etc/clearwater/shared_config. Check the contents of this file looks sensible. If running configlint.py indicated that any configuration would not be automatically migrated, add that configuration to /etc/clearwater/shared_config now.
  4. Copy this file to /etc/clearwater/shared_config on each node in the deployment.
  5. On each node in turn:
    • Run sudo /usr/share/clearwater/infrastructure/migration-utils/switch_to_migrated_config
    • Run sudo service clearwater-infrastructure restart to regenerate any dependant configuration files
    • Restart the Clearwater services on the node.

Install Clustering and Configuration Management Services

On each node run sudo apt-get install clearwater-management.

Upload the Current Cluster Settings

Now you need to tell the cluster manager about the current topology of the various database clusters that exist in a Clearwater deployment. For each of the nodes types listed below, log onto one of the nodes of that type and run the specified commands.

Sprout
/usr/share/clearwater/clearwater-cluster-manager/scripts/load_from_memcached_cluster sprout
/usr/share/clearwater/clearwater-cluster-manager/scripts/load_from_chronos_cluster sprout
Ralf
/usr/share/clearwater/clearwater-cluster-manager/scripts/load_from_memcached_cluster ralf
/usr/share/clearwater/clearwater-cluster-manager/scripts/load_from_chronos_cluster ralf
Homestead
/usr/share/clearwater/clearwater-cluster-manager/scripts/load_from_cassandra_cluster homestead
Homer
/usr/share/clearwater/clearwater-cluster-manager/scripts/load_from_cassandra_cluster homer
Memento
/usr/share/clearwater/clearwater-cluster-manager/scripts/load_from_memcached_cluster memento
/usr/share/clearwater/clearwater-cluster-manager/scripts/load_from_cassandra_cluster memento

Upload the Shared Configuration

Run the following commands on one of your Sprout nodes. This will upload the configuration that is shared across the deployment to etcd. If you add any new nodes to the deployment they will automatically learn this configuration from etcd.

  • sudo /usr/share/clearwater/clearwater-config-manager/scripts/upload_shared_config
  • sudo /usr/share/clearwater/clearwater-config-manager/scripts/upload_scscf_json
  • sudo /usr/share/clearwater/clearwater-config-manager/scripts/upload_bgcf_json
  • sudo /usr/share/clearwater/clearwater-config-manager/scripts/upload_enum_json

Tidy Up

The final step is to re-enable the cluster manager by running the following commands:

sudo rm /etc/clearwater/no_cluster_manager

Configuring an Application Server

This document explains how to configure an application server for use in a Clearwater deployment.

Network configuration

Each application server must be able to communicate with Sprout and Bono in both directions.

  • The application server must be able to contact all sprout and bono nodes on the trusted SIP ports, 5054 and 5058 (respectively, for both TCP and UDP).
  • All sprout and bono nodes must be able to contact the application server on the port and protocol configured in your subscribers’ iFCs (typically TCP and UDP port 5060).
  • The application server must also be able to exchange SIP messages (in both directions) with any other application server that may appear in the signaling path of any call going through it.
  • Since the application server is in the trusted zone, it should not be accessible to untrusted SIP entities.

If Clearwater and your application server are both in the Amazon EC2 cloud, you can achieve all of these by placing the application server in the <deployment>-sprout security group.

Application server configuration

No special application server configuration is required.

  • Your application server should be prepared to accept SIP messages from any of your deployment’s bono or sprout nodes, and from any SIP entity that may appear in the signaling path of any call going through it.
  • If your application server needs to spontaneously originate calls, it should do this via Sprout’s trusted interface: sprout.<deployment-domain>:5054 over either TCP or UDP.

The headers set by Clearwater are described in the Application Server Guide.

iFC configuration

To configure a subscriber to invoke your application server, you must configure the appropriate iFCs for that subscriber. You can do this via the Ellis UI, the Homestead API, or if you are using an HSS, directly in the HSS.

Web UI configuration via Ellis

Ellis allows you to specify a mapping between application server names and XML nodes. This is done by editing the /usr/share/clearwater/ellis/web-content/js/app-servers.json file, which is in JSON format. An example file would be:

{
"MMTEL" : "<InitialFilterCriteria><Priority>0</Priority><TriggerPoint><ConditionTypeCNF></ConditionTypeCNF><SPT><ConditionNegated>0</ConditionNegated><Group>0</Group><Method>INVITE</Method><Extension></Extension></SPT></TriggerPoint><ApplicationServer><ServerName>sip:mmtel.example.com</ServerName><DefaultHandling>0</DefaultHandling></ApplicationServer></InitialFilterCriteria>",
"Voicemail" : "<InitialFilterCriteria><Priority>1</Priority><TriggerPoint><ConditionTypeCNF></ConditionTypeCNF><SPT><ConditionNegated>0</ConditionNegated><Group>0</Group><Method>INVITE</Method><Extension></Extension></SPT></TriggerPoint><ApplicationServer><ServerName>sip:vm.example.com</ServerName><DefaultHandling>0</DefaultHandling></ApplicationServer></InitialFilterCriteria>"
}

Once this is saved, the list of application server names will appear in the Ellis UI (on the ‘Application Servers’ tab of the Configure dialog), and selecting or deselecting them will add or remove the relevant XML from Homestead. This change takes effect immediately. If an node in the iFC XML is not included in app-servers.json, Ellis will leave it untouched.

Direct configuration via cURL

You can also configure iFCs directly using Homestead.

Here is an example of how to use curl to configure iFCs directly. This configures a very basic iFC, which fires on INVITEs only, with no conditions on session case. See an iFC reference for more details, e.g., 3GPP TS 29.228 appendices B and F.

To simplify the following commands, define the following variables - set the user, application server(s), and Homestead name as appropriate for your deployment.

user=sip:6505550269@example.com
as_hostnames="as1.example.com mmtel.example.com" # the list of zero or more application servers to invoke - don't forget mmtel
hs_hostname=hs.example.com:8888

Be careful - the user must be exactly right, including the sip: prefix, and also +1 if and only if it is a PSTN line. We have seen problems with PSTN lines; if the above syntax does not work, try URL-encoding the user, e.g., for +16505550269@example.com write user=sip%3A%2B16505550269%40example.com.

To retrieve the current configuration, invoke curl as follows. You must be able to access $hs_hostname; check your firewall configuration.

curl http://$hs_hostname:8889/public/$user/service_profile

This will return a 303 if the user exists, with the service profile URL in the Location header, e.g.

Location: /irs/<irs-uuid>/service_profiles/<service-profile-uuid>

Then invoke curl as follows, using the values from the Location header retrieved above:

curl http://$hs_hostname:8889/irs/<irs-uuid>/service_profiles/<service-profile-uuid>/filter_criteria

To update the configuration, invoke curl as follows. The first stage builds the new XML document and the second pushes it to the server.

{ cat <<EOF
<?xml version="1.0" encoding="UTF-8"?>
<ServiceProfile>
EOF
for as_hostname in $as_hostnames ; do
  cat <<EOF
  <InitialFilterCriteria>
    <Priority>1</Priority>
    <TriggerPoint>
      <ConditionTypeCNF>0</ConditionTypeCNF>
      <SPT>
        <ConditionNegated>0</ConditionNegated>
        <Group>0</Group>
        <Method>INVITE</Method>
        <Extension></Extension>
      </SPT>
    </TriggerPoint>
    <ApplicationServer>
      <ServerName>sip:$as_hostname</ServerName>
      <DefaultHandling>0</DefaultHandling>
    </ApplicationServer>
  </InitialFilterCriteria>
EOF
done
cat <<EOF
</ServiceProfile>
EOF
} | curl -X PUT http://$hs_hostname:8889/irs/<irs-uuid>/service_profiles/<service-profile-uuid>/filter_criteria --data-binary @-

The subscriber will now have the desired configuration. You can confirm this by running the retrieval command again.

External HSS Integration

All Clearwater deployments include a Homestead cluster. Homestead presents an HTTP RESTful interface to HSS data. This HSS data can be stored in either

  • a Cassandra database located on the Homestead cluster
  • an external HSS, accessible over a standard IMS Cx/Diameter interface.

This page describes

  • how it works
  • how to enable it
  • restrictions.

How It Works

When Clearwater is deployed without an external HSS, all HSS data is mastered in Homestead’s own Cassandra database.

When Clearwater is deployed with an external HSS, HSS data is queried from the external HSS via its Cx/Diameter interface and is then cached in the Cassandra database.

Clearwater uses the following Cx message types.

  • Multimedia-Auth - to retrieve authentication details
  • Server-Assignment - to retrieve Initial Filter Criteria documents and register for change notifications
  • Push-Profile - to be notified of changes to Initial Filter Criteria documents
  • User-Authorization - to retrieve S-CSCF details on initial registrations
  • Location-Information - to retrieve S-CSCF details on calls
  • Registration-Termination - to be notified of de-registrations

How to Enable It

This section discusses how to enable support for an external HSS.

Before you start

Before enabling support for an external HSS, you must

Do not configure any Clearwater subscribers via Ellis!

  • Any subscribers you create before enabling the external HSS will override subscribers retrieved from the external HSS, and you will not be able to use Ellis to manage them.
  • After enabling the external HSS, you will not be able to create subscribers through Ellis at all - they must be created through the HSS’s own management interface.
Enabling external HSS support on an existing deployment

To enable external HSS support, you will need to modify the contents of /etc/clearwater/shared_config so that the block that reads

# HSS configuration
hss_hostname=0.0.0.0
hss_port=3868

instead reads

# HSS configuration
hss_hostname=<address of your HSS>
hss_realm=<realm your HSS is located in>
hss_port=<port of your HSS's Cx interface>

Both hss_hostname and hss_realm are optional. If a realm is configured, homestead will try NAPTR/SRV resolution on the realm to find and connect to (2 by default) diameter peers in the realm. If a hostname is also configured, this will be used in the Destination-Host field on the diameter messages, so that the messages will be routed to that host. If just a hostname is configured, homestead will just attempt to create and maintain a single connection to that host.

This process explains how to modify these settings and ensure that all nodes pick up the changes.

Configuring your external HSS

Homestead will now query the external HSS for subscriber details when they are required by the rest of the Clearwater deployment, such as when servicing SIP registrations.

In order to register and make calls, you need to create subscriber records on your external HSS, and the detailed process for this will depend on which HSS you have chosen. Generally, however, you will need to

  1. create a public user identity with the desired SIP URI
  2. create an associated private user identity with its ID formed by removing the sip: scheme prefix from the public user ID
  3. configure the public user identity’s Initial Filter Criteria to include an application server named sip:mmtel.your.home.domain, where your.home.domain is replaced with your home domain - this enables MMTEL services for this subscriber.
Allowing one subscriber to have two private identities

If you try to use an Android SIP client that doesn’t contain an Authentication username field, the client will default to a username like ``1234`` (rather than ``1234@example.com`` - the IMS standard form). To register a subscriber you will have to configure your external HSS so that the subscriber you are trying to register has two private identities (``1234`` and ``1234@example.com``).

The detailed process for this will depend on which HSS you have chosen. Generally, however, you will need to

  1. create a subscriber as usual- with public user identity “sip:<username>@<server>
  2. create two new private identities, one having identity “<username>” and the other with identity “<username>@<server>
  3. associate the public user identity that was created in step 1: “sip:<username>@<server>” to both of the private identities created in step 2.

This should allow the SIP client to register with that subscriber.

Restrictions

  • Since Homestead uses the Cx/Diameter interface to the HSS, and this interface is read-only, the Homestead API is read-only when external HSS integration is enabled.
  • Since Homestead’s API is read-only, this means that Ellis can’t be used alongside a deployment using an external HSS. Provisioning and subscriber management must be performed via the HSS’s own management interface.
  • Clearwater currently only supports SIP digest or AKA authentication; details for other authentication methods are silently ignored from Multimedia-Auth responses. Other authentication methods may be added in future.
  • Homestead currently assumes that private user IDs are formed by removing the sip: prefix from the public user ID. This restriction may be relaxed in future.
  • While Homestead caches positive results from the external HSS, it does not currently cache negative results (e.g. for non-existent users). Repeated requests for a non-existent user will increase the load on the external HSS. This restriction may be relaxed in future.

OpenIMS HSS Integration

As discussed on the External HSS Integration page, Clearwater can integrate with an external HSS.

This page describes how to install and configure the OpenIMSCore HSS as this external HSS. It assumes that you have already read the External HSS Integration page.

Installation with Chef

If you have a deployment environment created by following the automated install instructions, then you can create a HSS by running knife box create -E <env> openimscorehss. You should then follow the configuration instructions below.

Installing OpenIMSCore HSS manually

To install OpenIMSCore HSS,

  1. Follow the “Adding this PPA to your system” steps at https://launchpad.net/~rkd-u/+archive/ubuntu/fhoss
  2. Run sudo apt-get update; sudo apt-get install openimscore-fhoss
  3. Answer the installation questions appropriately, including providing the IMS home domain, your local IP, and the MySQL root password (which you will have to provide both when installing MySQL and when intalling the HSS).

Configuration

Logging in

OpenIMSCore HSS provides the administration UI over port 8080. The admin username is hssAdmin.

  • If the HSS was installed using Chef, the hssAdmin password will be the signup_key setting from knife.rb.
  • If the HSS was installed manually, the hssAdmin password will be “hss”. This can be changed by editing /usr/share/java/fhoss-0.2/conf/tomcat-users.xml and running sudo service openimscore-fhoss restart.
Adding the MMTEL Application Server

To enable the MMTEL application server built into Clearwater for all subscribers,

  1. access the OpenIMSCore HSS web UI, running on port 8080
  2. log in as hssAdmin
  3. navigate to Services->``Application Servers``->``Search``
  4. perform a search with no search string to find all application servers
  5. select default_as
  6. change server_name to be sip:mmtel.your.home.domain, where your.home.domain is replaced with your home domain
  7. save your change.
Configuring Subscribers

To configure a subscriber,

  1. create an IMS subscription
  2. create an associated public user identity
  3. create an associated private user identity, specifying
    1. the private ID to be the public ID without the sip: scheme prefix
    2. SIP Digest authentication.
  4. add the home domain to the subscriber’s visited networks

CDF Integration

Project Clearwater deployments include a cluster of Ralf nodes. The nodes provide an HTTP interface to Sprout and Bono on which they can report billable events. Ralf then acts as a CTF (Charging Triggering Function) and may pass these events on to a configured CDF (Charging Data Function) over the Rf interface.

By default, having spun up a Clearwater deployment, either manually or through the automated Chef install process, your deployment is not configured with a CDF and thus will not generate billing records. This document describes how the CDF is chosen and used and how to integrate your Project Clearwater deployment with a CDF and thus enable Rf billing.

How it works

When Sprout or Bono have handled an item of work for a given subscriber, they generate a record of that work and transmit it to the Ralf cluster for billing to the CDF. This record will be used by Ralf to generate an Rf billing ACR (Accounting-Request) message.

To determine which CDF to send the ACR to, the node acting as the P-CSCF/IBCF is responsible for adding a P-Charging-Function-Addresses header to all SIP messages it proxies into the core. This header contains a prioritised list of CDFs to send ACRs to.

Bono supports adding this header when acting as either a P-CSCF or IBCF and configuring the contents of this header is the only requirement for enabling Rf billing in your deployment.

How to enable it

This section discusses how to enable Rf billing to a given CDF.

Before you begin

Before connecting your deployment to a CDF, you must

  • install Clearwater
  • install an external CDF - details for this will vary depending on which CDF you are using.
  • ensure your CDF’s firewall allows incoming connections from the nodes in the Ralf cluster on the DIAMETER port (default 3868).
Setting up DNS

Ralf implements the behavior specified in RFC3588 to locate and connect to the billing realm. This requires either:

  • The DIAMETER realm resolves to a NAPTR record which returns an AAA+D2T entry which in turn resolves to SRV/A records which finally resolve to IPv4 addresses of reachable nodes in the realm.
  • The DIAMETER realm resolves to a collection of A records which directly resolve to reachable nodes in the realm.
Configuring the billing realm

To point Ralf at the billing DIAMETER realm, add the following line to /etc/clearwater/shared_config and follow this process to apply the change

billing_realm=<DIAMETER billing realm>
Selecting a specific CDF in the realm

Note: Bono only has support for selecting CDF identities based of static configuration of a single identity. Other P-CSCFs may have support for load-balancing or enabling backup CDF identities.

If you have a CDF set up to receive Rf billing messages from your deployment, you will need to modify the /etc/clearwater/shared_config file and follow this process to apply the change:

cdf_identity=<CDF DIAMETER Identity>

Restrictions

The very first release of Ralf, from the Counter-Strike release of Project Clearwater, does not generate Rf billing messages since the related changes to Sprout and Bono (to report billable events) were not enabled. This version was released to allow systems integrators to get a head start on spinning up and configuring Ralf nodes rather than having to wait for the next release.

Clearwater IP Port Usage

The nodes in Clearwater attempt talk to each other over IP. This document lists the ports that are used by a deployment of Clearwater.

All nodes

All nodes need to allow the following ICMP messages:

Echo Request
Echo Reply
Destination Unreachable

They also need the following ports open to the world (0.0.0.0/0):

TCP/22 for SSH access

If your deployment uses SNMP monitoring software (cacti for example), each node will also have to open appropriate ports for this service.

  • SNMP

    UDP/161-162
    

If your deployment uses our automatic clustering and configuration sharing feature, open the following ports between every node

  • etcd

    TCP/2380
    TCP/4000
    

All-in-one

All-in-one nodes need the following ports opened to the world

  • Web UI

    TCP/80
    TCP/443
    
  • STUN signaling:

    TCP/3478
    UDP/3478
    
  • SIP signaling:

    TCP/5060
    UDP/5060
    TCP/5062
    
  • RTP forwarding:

    UDP/32768-65535
    

Ellis

The Ellis node needs the following ports opened to the world:

  • Web UI

    TCP/80
    TCP/443
    

Bono

The Bono nodes need the following ports opened to the world:

  • STUN signaling:

    TCP/3478
    UDP/3478
    
  • SIP signaling:

    TCP/5060
    UDP/5060
    TCP/5062
    
  • RTP forwarding:

    UDP/32768-65535
    

They also need the following ports open to all other Bono nodes and to all the Sprout nodes:

  • Internal SIP signaling:

    TCP/5058
    

Sprout

The Sprout nodes need the following ports open to all Bono nodes:

  • Internal SIP signaling:

    TCP/5054
    TCP/5052 (if I-CSCF function is enabled)
    

They also need the following ports opened to all other Sprout nodes:

  • Shared registration store:

    For releases using a memcached store - that is all releases up to release 28 (Lock, Stock and Two Smoking Barrels) and from release 32 (Pulp Fiction) onwards

    TCP/11211
    
  • Chronos:

    TCP/7253
    
  • Cassandra (if including a Memento AS):

    TCP/7000
    TCP/9160
    

They also need the following ports opened to all homestead nodes:

  • Registration Termination Requests (if using an HSS):

    TCP/9888
    

They also need the following ports opened to the world:

  • HTTP interface (if including a Memento AS):

    TCP/443
    

Homestead

The Homestead nodes need the following ports open to all the Sprout nodes and the Ellis node:

  • RESTful interface:

    TCP/8888
    

They also need the following ports open to just the Ellis node:

  • RESTful interface:

    TCP/8889
    

They also need the following ports opened to all other Homestead nodes:

  • Cassandra:

    TCP/7000
    

They also need the following ports opened to the world:

Homer

The Homer nodes need the following ports open to all the Sprout nodes and the Ellis node:

  • RESTful interface:

    TCP/7888
    

They also need the following ports opened to all other Homer nodes:

  • Cassandra:

    TCP/7000
    

They also need the following ports opened to the world:

Ralf

The Ralf nodes need the following ports open to all the Sprout and Bono nodes:

  • RESTful interface:

    TCP/10888
    

They also need to following ports open to all other Ralf nodes:

  • Chronos:

    TCP/7253
    
  • Memcached:

    TCP/11211
    

Standalone Application Servers

Standalone Project Clearwater application servers (e.g. Memento and Gemini) need the following ports open to all Sprout nodes:

  • SIP signaling:

    TCP/5054
    

They also need the following ports open to all other standalone application servers (if they include a Memento AS):

  • Cassandra:

    TCP/7000
    TCP/9160
    

They also need the following ports open to all Homestead nodes (if they include a Memento AS):

  • RESTful interface:

    TCP/11888
    

They also need the following ports opened to the world (if they include a Memento AS):

  • HTTP interface:

    TCP/443
    

Clearwater DNS Usage

DNS is the Domain Name System. It maps service names to hostnames and then hostnames to IP addresses. Clearwater uses DNS.

This document describes

  • Clearwater’s DNS strategy and requirements
  • how to configure AWS Route 53 and BIND to meet these.

DNS is also used as part of the ENUM system for mapping E.164 numbers to SIP URIs. This isn’t discussed in this document - instead see the separate ENUM document.

If you are installing an All-in-One Clearwater node, you do not need any DNS records and can ignore the rest of this page.

Strategy

Clearwater makes heavy use of DNS to refer to its nodes. It uses it for

  • identifying individual nodes, e.g. sprout-1.example.com might resolve the IP address of the first sprout node
  • identifying the nodes within a cluster, e.g. sprout.example.com resolves to all the IP addresses in the cluster
  • fault-tolerance
  • selecting the nearest site in a multi-site deployments, using latency-based routing.

Clearwater also supports using DNS for identifying non-Clearwater nodes. In particular, it supports DNS for identifying SIP peers using NAPTR and SRV records, as described in RFC 3263.

Resiliency

By default, Clearwater routes all DNS requests through an instance of dnsmasq running on localhost. This round-robins requests between the servers in /etc/resolv.conf, as described in its FAQ:

By default, dnsmasq treats all the nameservers it knows about as equal: it picks the one to use using an algorithm designed to avoid nameservers which aren’t responding.

If the signaling_dns_server option is set in /etc/clearwater/shared_config (which is mandatory when using traffic separation), Clearwater will not use dnsmasq. Instead, resiliency is achieved by being able to specify up to three servers in a comma-separated list (e.g. signaling_dns_server=1.2.3.4,10.0.0.1,192.168.1.1), and Clearwater will fail over between them as follows:

  • It will always query the first server in the list first
  • If this returns SERVFAIL or times out (which happens after a randomised 500ms-1000ms period), it will resend the query to the second server
  • If this returns SERVFAIL or times out, it will resend the query to the third server
  • If all servers return SERVFAIL or time out, the DNS query will fail

Clearwater caches DNS responses for several minutes (to reduce the load on DNS servers, and the latency introduced by querying them). If a cache entry is stale, but the DNS servers return SERVFAIL or time out when Clearwater attempts to refresh it, Clearwater will continue to use the cached value until the DNS servers become responsive again. This minimises the impact of a DNS server failure on calls.

Requirements

DNS Server

Clearwater requires the DNS server to support

Support for latency-based routing and health-checking are required for multi-site deployments.

Support for RFC 2915 (NAPTR records) is also suggested, but not required. NAPTR records specify the transport (UDP, TCP, etc.) to use for a particular service - without it, UEs will default (probably to UDP).

AWS Route 53 supports all these features except NAPTR. BIND supports all these features except latency-based routing (although there is a patch for this) and health-checking.

DNS Records

Clearwater requires the following DNS records to be configured.

  • bono
    • bono-1.<zone>, bono-2.<zone>... (A and/or AAAA) - per-node records for bono
    • <zone> (A and/or AAAA) - cluster record for bono, resolving to all bono nodes - used by UEs that don’t support RFC 3263 (NAPTR/SRV)
    • <zone> (NAPTR, optional) - specifies transport requirements for accessing bono - service SIP+D2T maps to _sip._tcp.<zone> and SIP+D2U maps to _sip._udp.<zone>
    • _sip._tcp.<zone> and _sip._udp.<zone> (SRV) - cluster SRV records for bono, resolving to port 5060 on each of the per-node records
  • sprout
    • sprout-1.<zone>, sprout-2.<zone>... (A and/or AAAA) - per-node records for sprout
    • sprout.<zone> (A and/or AAAA) - cluster record for sprout, resolving to all sprout nodes - used by P-CSCFs that don’t support RFC 3263 (NAPTR/SRV)
    • sprout.<zone> (NAPTR, optional) - specifies transport requirements for accessing sprout - service SIP+D2T maps to _sip._tcp.sprout.<zone>
    • _sip._tcp.sprout.<zone> (SRV) - cluster SRV record for sprout, resolving to port 5054 on each of the per-node records
    • icscf.sprout.<zone> (NAPTR, only required if using sprout as an I-CSCF) - specifies transport requirements for accessing sprout - service SIP+D2T maps to _sip._tcp.icscf.sprout.<zone>
    • _sip._tcp.icscf.sprout.<zone> (SRV, only required if using sprout as an I-CSCF) - cluster SRV record for sprout, resolving to port 5052 on each of the per-node records
  • homestead
    • homestead-1.<zone>, homestead-2.<zone>... (A and/or AAAA) - per-node records for homestead
    • hs.<zone> (A and/or AAAA) - cluster record for homestead, resolving to all homestead nodes
  • homer
    • homer-1.<zone>, homer-2.<zone>... (A and/or AAAA) - per-node records for homer
    • homer.<zone> (A and/or AAAA) - cluster record for homer, resolving to all homer nodes
  • ralf
    • ralf-1.<zone>, ralf-2.<zone>... (A and/or AAAA) - per-node records for ralf
    • ralf.<zone> (A and/or AAAA) - cluster record for ralf, resolving to all ralf nodes
  • ellis
    • ellis-1.<zone> (A and/or AAAA) - per-node record for ellis
    • ellis.<zone> (A and/or AAAA) - “cluster”/access record for ellis
  • standalone application server (e.g. gemini/memento)
    • <standalone name>-1.<zone> (A and/or AAAA) - per-node record for each standalone application server
    • <standalone name>.<zone> (A and/or AAAA) - “cluster”/access record for the standalone application servers

Of these, the following must be resolvable by UEs - the others need only be resolvable within the core of the network. If you have a NAT-ed network, the following must resolve to public IP addresses, while the others should resolve to private IP addresses.

  • bono
    • <zone> (A and/or AAAA)
    • <zone> (NAPTR, optional)
    • _sip._tcp.<zone> and _sip._udp.<zone> (SRV)
  • ellis
    • ellis.<zone> (A and/or AAAA)
  • memento
    • memento.<zone> (A and/or AAAA)

If you are not deploying with some of these components, you do not need the DNS records to be configured for them. For example, if you are using a different P-CSCF (and so don’t need bono), you don’t need the bono DNS records. Likewise, if you are deploying with an external HSS (and so don’t need ellis), you don’t need the ellis DNS records.

Configuration

Clearwater can work with any DNS server that meets the requirements above. However, most of our testing has been performed with

The Clearwater nodes also need to know the identity of their DNS server. Ideally, this is done via DHCP within your virtualization infrastructure. Alternatively, you can configure it manually.

The UEs need to know the identity of the DNS server too. In a testing environment, you may be able to use DHCP or manual configuration. In a public network, you will need to register the <zone> domain name you are using and arranging for an NS record for <zone> to point to your DNS server.

AWS Route 53

Clearwater’s automated install automatically configures AWS Route 53. There is no need to follow the following instructions if you are using the automated install.

The official AWS Route 53 documentation is a good reference, and most of the following steps are links into it.

To use AWS Route 53 for Clearwater, you need to

For the geographically-redundant records, you need to

Note that AWS Route 53 does not support NAPTR records.

BIND

To use BIND, you need to

  • install it
  • create an entry for your “zone” (DNS suffix your deployment uses)
  • configure the zone with a “zone file”
  • restart BIND.

Note that BIND does not support latency-based routing or health-checking.

Installation

To install BIND on Ubuntu, issue sudo apt-get install bind9.

Creating Zone Entry

To create an entry for your zone, edit the /etc/bind/named.conf.local file to add a line of the following form, replacing <zone> with your zone name.

zone "<zone>" IN { type master; file "/etc/bind/db.<zone>"; };
Configuring Zone

Zones are configured through “zone files” (defined in RFC 1034 and RFC 1035).

If you followed the instructions above, the zone file for your zone is at /etc/bind/db.<zone>.

For Clearwater, you should be able to adapt the following example zone file by correcting the IP addresses and duplicating (or removing) entries where you have more (or fewer) than 2 nodes in each tier.

$TTL 5m ; Default TTL

; SOA, NS and A record for DNS server itself
@                 3600 IN SOA  ns admin ( 2014010800 ; Serial
                                          3600       ; Refresh
                                          3600       ; Retry
                                          3600       ; Expire
                                          300 )      ; Minimum TTL
@                 3600 IN NS   ns
ns                3600 IN A    1.0.0.1 ; IPv4 address of BIND server
ns                3600 IN AAAA 1::1    ; IPv6 address of BIND server

; bono
; ====
;
; Per-node records - not required to have both IPv4 and IPv6 records
bono-1                 IN A     2.0.0.1
bono-2                 IN A     2.0.0.2
bono-1                 IN AAAA  2::1
bono-2                 IN AAAA  2::2
;
; Cluster A and AAAA records - UEs that don't support RFC 3263 will simply
; resolve the A or AAAA records and pick randomly from this set of addresses.
@                      IN A     2.0.0.1
@                      IN A     2.0.0.2
@                      IN AAAA  2::1
@                      IN AAAA  2::2
;
; NAPTR and SRV records - these indicate a preference for TCP and then resolve
; to port 5060 on the per-node records defined above.
@                      IN NAPTR 1 1 "S" "SIP+D2T" "" _sip._tcp
@                      IN NAPTR 2 1 "S" "SIP+D2U" "" _sip._udp
_sip._tcp              IN SRV   0 0 5060 bono-1
_sip._tcp              IN SRV   0 0 5060 bono-2
_sip._udp              IN SRV   0 0 5060 bono-1
_sip._udp              IN SRV   0 0 5060 bono-2

; sprout
; ======
;
; Per-node records - not required to have both IPv4 and IPv6 records
sprout-1               IN A     3.0.0.1
sprout-2               IN A     3.0.0.2
sprout-1               IN AAAA  3::1
sprout-2               IN AAAA  3::2
;
; Cluster A and AAAA records - P-CSCFs that don't support RFC 3263 will simply
; resolve the A or AAAA records and pick randomly from this set of addresses.
sprout                 IN A     3.0.0.1
sprout                 IN A     3.0.0.2
sprout                 IN AAAA  3::1
sprout                 IN AAAA  3::2
;
; NAPTR and SRV records - these indicate TCP support only and then resolve
; to port 5054 on the per-node records defined above.
sprout                 IN NAPTR 1 1 "S" "SIP+D2T" "" _sip._tcp.sprout
_sip._tcp.sprout       IN SRV   0 0 5054 sprout-1
_sip._tcp.sprout       IN SRV   0 0 5054 sprout-2
;
; Per-node records for I-CSCF (if enabled) - not required to have both
; IPv4 and IPv6 records
sprout-3               IN A     3.0.0.3
sprout-3               IN AAAA  3::3
;
; Cluster A and AAAA records - P-CSCFs that don't support RFC 3263 will simply
; resolve the A or AAAA records and pick randomly from this set of addresses.
icscf.sprout           IN A     3.0.0.3
icscf.sprout           IN AAAA  3::3
;
; NAPTR and SRV records for I-CSCF (if enabled) - these indicate TCP
; support only and then resolve to port 5052 on the per-node records
; defined above.
icscf.sprout           IN NAPTR 1 1 "S" "SIP+D2T" "" _sip._tcp.icscf.sprout
_sip._tcp.icscf.sprout IN SRV   0 0 5052 sprout-3

; homestead
; =========
;
; Per-node records - not required to have both IPv4 and IPv6 records
homestead-1            IN A     4.0.0.1
homestead-2            IN A     4.0.0.2
homestead-1            IN AAAA  4::1
homestead-2            IN AAAA  4::2
;
; Cluster A and AAAA records - sprout picks randomly from these.
hs                     IN A     4.0.0.1
hs                     IN A     4.0.0.2
hs                     IN AAAA  4::1
hs                     IN AAAA  4::2
;
; (No need for NAPTR or SRV records as homestead doesn't handle SIP traffic.)

; homer
; =====
;
; Per-node records - not required to have both IPv4 and IPv6 records
homer-1                IN A     5.0.0.1
homer-2                IN A     5.0.0.2
homer-1                IN AAAA  5::1
homer-2                IN AAAA  5::2
;
; Cluster A and AAAA records - sprout picks randomly from these.
homer                  IN A     5.0.0.1
homer                  IN A     5.0.0.2
homer                  IN AAAA  5::1
homer                  IN AAAA  5::2
;
; (No need for NAPTR or SRV records as homer doesn't handle SIP traffic.)

; ralf
; =====
;
; Per-node records - not required to have both IPv4 and IPv6 records
ralf-1                IN A     6.0.0.1
ralf-2                IN A     6.0.0.2
ralf-1                IN AAAA  6::1
ralf-2                IN AAAA  6::2
;
; Cluster A and AAAA records - sprout and bono pick randomly from these.
ralf                  IN A     6.0.0.1
ralf                  IN A     6.0.0.2
ralf                  IN AAAA  6::1
ralf                  IN AAAA  6::2
;
; (No need for NAPTR or SRV records as ralf doesn't handle SIP traffic.)

; ellis
; =====
;
; ellis is not clustered, so there's only ever one node.
;
; Per-node record - not required to have both IPv4 and IPv6 records
ellis-1                IN A     7.0.0.1
ellis-1                IN AAAA  7::1
;
; "Cluster"/access A and AAAA record
ellis                  IN A     7.0.0.1
ellis                  IN AAAA  7::1
Restarting

To restart BIND, issue sudo service bind9 restart. Check /var/log/syslog for any error messages.

Client Configuration

Clearwater nodes need to know the identity of their DNS server. Ideally, this is achieved through DHCP. There are two main situations in which it might need to be configured manually.

  • When DNS configuration is not provided via DHCP.
  • When incorrect DNS configuration is provided via DHCP.

Either way, you must

  • create an /etc/dnsmasq.resolv.conf file containing the desired DNS configuration (probably just the single line nameserver <IP address>)
  • add RESOLV_CONF=/etc/dnsmasq.resolv.conf to /etc/default/dnsmasq
  • run service dnsmasq restart.

(As background, dnsmasq is a DNS forwarder that runs on each Clearwater node to act as a cache. Local processes look in /etc/resolv.conf for DNS configuration, and this points them to localhost, where dnsmasq runs. In turn, dnsmasq takes its configuration from /etc/dnsmasq.resolv.conf. By default, dnsmasq would use /var/run/dnsmasq/resolv.conf, but this is controlled by DHCP.)

IPv6 AAAA DNS lookups

Clearwater can be installed on an IPv4-only system, an IPv6-only system, or a system with both IPv4 and IPv6 addresses (though the Clearwater software does not use both IPv4 and IPv6 at the same time).

Normally, systems with both IPv4 and IPv6 addresses will prefer IPv6, performing AAAA lookups first and only trying an A record lookup if that fails. This may cause problems (or be inefficient) if you know that all your Clearwater DNS records are A records.

In this case, you can configure a preference for A lookups by editing /etc/gai.conf and commenting out the line precedence ::ffff:0:0/96 100 (as described at http://askubuntu.com/questions/32298/prefer-a-ipv4-dns-lookups-before-aaaaipv6-lookups).

ENUM Guide

ENUM is a system for mapping PSTN numbers to SIP URIs using DNS NAPTR records, and which sprout supports. This article describes

  • briefly, what ENUM is
  • what we support
  • how to configure a server to respond to ENUM queries.

ENUM overview

The NAPTR article on Wikipedia gives a pretty good overview of how ENUM works. Essentially, you

  • convert your PSTN number into a domain name by stripping all punctuation, reversing the order of the digits, separating them with dots and suffixing with “e164.arpa” (e.g. 12012031234 becomes 4.3.2.1.3.0.2.1.0.2.1.e164.arpa)
  • issue a DNS NAPTR query for this domain and receive the response (which is for the best prefix match to your number)
  • parse the response, which contains a series of regular expressions and replacements
  • find the best match for your number (after stripping all punctuation except any leading +) and apply the replacement
  • if the NAPTR response indicated this match was “terminal”, then use the resulting SIP URI - if not, use the output of this pass as input for another ENUM query.

The RFCs to refer to for a definitive position are RFC 3401 (which covers Dynamic Delegation Discovery System (DDDS), on which ENUM is built) and RFC 3761 (which covers ENUM itself). Also relevant is RFC 4769 (which specifies ENUM service types).

Clearwater ENUM Support

Clearwater supports ENUM as set out in RFC 3761 and RFC 4769, with the following points/omissions.

  • ENUM processing should only be applied to TEL URIs and SIP URIs that specify that this was dialed by the user. Since no SIP UAs that we’re aware of indicate that the SIP URI was dialed by a user, we assume that any all-numeric (including +) URIs should have ENUM processing applied.
  • RFC 3761 section 2.4 says that the ENUM replacement string is UTF-8-encoded. We do not support this - all characters must be ASCII (i.e. a subset of UTF-8). It is not particularly meaningful to use UTF-8 replacements for SIP URIs as SIP URIs themselves must be ASCII.
  • RFC 3761 section 6.1 says that a deployed ENUM service SHOULD include mechanisms to ameliorate security threats and mentions using DNSSEC. We don’t support DNSSEC, so other security approaches (such as private ENUM servers) must be used.
  • RFC 4769 describes ENUM rules that output TEL URIs. Since we have no way (apart from ENUM itself) of routing to TEL URIs, we ignore such rules.
  • RFC 3761 section 2.1 describes the construction of the Application Unique String from the E.164 number. We have no way of converting a user-dialed number into an E.164 number, so we assume that the number provided by the user is E.164.

Deciding on ENUM rules

ENUM rules will vary with your deployment, but the (JSON-ized) rules from an example deployment might form a reasonable basis:

{
    "number_blocks" : [
        {   "name" : "Demo system number 2125550270",
            "prefix" : "2125550270",
            "regex" : "!(^.*$)!sip:\\1@10.147.226.2!"
        },
        {
            "name" : "Clearwater external numbers 2125550271-9",
            "prefix" : "212555027",
            "regex" : "!(^.*$)!sip:+1\\1@ngv.example.com!"
        },
        {
            "name" : "Clearwater external numbers +12125550271-9",
            "prefix" : "+1212555027",
            "regex" : "!(^.*$)!sip:\\1@ngv.example.com!"
        },
        {
            "name" : "Clearwater external number 2125550280",
            "prefix" : "2125550280",
            "regex" : "!(^.*$)!sip:+1\\1@ngv.example.com!"
        },
        {
            "name" : "Clearwater external number +12125550280",
            "prefix" : "+12125550280",
            "regex" : "!(^.*$)!sip:\\1@ngv.example.com!"
        },
        {   "name" : "Clearwater internal numbers",
            "prefix" : "650555",
            "regex" : "!(^.*$)!sip:\\1@ngv.example.com!"
        },
        {   "name" : "Clearwater internal numbers dialled with +1 prefix",
            "prefix" : "+1650555",
            "regex" : "!+1(^.*$)!sip:\\1@ngv.example.com!"
        },
        {   "name" : "NANP => SIP trunk",
            "prefix" : "",
            "regex" : "!(^.*$)!sip:\\1@10.147.226.2!"
        }
    ]
}

Configuring ENUM rules

Normally, the ENUM server would be an existing part of the customer’s network (not part of Clearwater itself) but, for testing and demonstration, it is useful to be able to configure our own.

Unfortunately, AWS Route 53 does not support the required NAPTR records, so we can’t use this. Fortunately, BIND (easy to install) does, and dnsmasq (our standard DNS forwarder on Clearwater nodes) does to a limited extent. This section describes how to configure BIND or dnsmasq to respond to ENUM queries.

You can set up ENUM rules on either BIND or dnsmasq. BIND is the better choice, but requires installing an additional package and a few additional file tweaks. dnsmasq is the weaker choice - it does not support wildcard domains properly, so rules for each directory number must be added separately.

BIND

Create a new node to run BIND, and open port 53 on it to the world (0.0.0.0/0).

If you are using chef, you can do this by adding a new ‘enum’ node.

On the new node,

  1. install bind by typing “sudo apt-get install bind9”

  2. modify /etc/bind/named.conf to add a line ‘include “/etc/bind/named.conf.e164.arpa”;’ - we’ll create that file next

  3. create /etc/bind/named.conf.e164.arpa, with the following contents - we’ll create the /etc/bind/e164.arpa.db file next

    zone "e164.arpa" {
            type master;
            file "/etc/bind/e164.arpa.db";
    };
    
  4. create /etc/bind/e164.arpa.db, with the following header

    $TTL 1h
    @ IN SOA e164.arpa ngv-admin@example.com (
                                                            2009010910 ;serial
                                                            3600 ;refresh
                                                            3600 ;retry
                                                            3600 ;expire
                                                            3600 ;minimum TTL
    )
    @ IN NS e164.arpa.
    @ IN A <this server's IP address>
    
  5. add additional rules of the form ‘<enum domain name> <order> <preference> “<flags>” “<service>” “<regexp>” .’ to this file (being careful to maintain double-quotes and full-stops)

    • enum domain name is the domain name for which you want to return this regular expression - for example, if you want to use a regular expression for number “1201”, this would be “1.0.2.1.e164.arpa” - if you want to specify a wildcard domain (i.e. a prefix match), use *.1.0.2.1.e164.arpa - note, however, that non-wildcard children of a wildcard domain are not themselves wildcarded, e.g. if you have a domain *.1.e164.arpa and a domain 2.1.e164.arpa, a query for 3.2.1.e164.arpa would not resolve (even though it matches *.1.e164.arpa)
    • order specifies the order in which this rule is applied compared to others - rules of order 1 and processed before those of order 2 (and rules of order 2 are not processed if one of the matching rules of of order 1 are non-terminal)
    • preference specifies how preferable one rule is compared to another - if both rules match, the one with the higher preference is used
    • flags specifies how a match should be processed - it can either be blank (meaning apply the regular expression and then continue) or “U” (meaning the rule is terminal and, after applying the regular expression, no further rules should be processed)
    • service must be “E2U+SIP”, indicating ENUM rather than other services
    • regexp must be a regular expression to apply and is of the form !<pattern>!<replacement>! - note that the ”!” delimiter is only by convention and can be replaced with other symbols (such as “/”) if ”!” occurs within the pattern or replacement.
  6. restart bind by typing “sudo service bind9 restart”

  7. check /var/log/syslog for errors reported by bind on start-up

  8. test your configuration by typing “dig -t NAPTR <enum domain name>” and checking you get back the responses you just added.

As an example, the JSON-ized ENUM rules for an example system (above), translate to the following entries in /etc/bind/e164.arpa.db.

; Demo system number 2125550270
0.7.2.0.5.5.5.2.1.2 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@10.147.226.2!" .

; Demo system number +12125550270
0.7.2.0.5.5.5.2.1.2.1 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@10.147.226.2!" .

; Clearwater external numbers 2125550271-9
*.7.2.0.5.5.5.2.1.2 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:+1\\1@ngv.example.com!" .

; Clearwater external numbers +12125550271-9
; Note that we can't define a domain name containing + so we must define a
; domain without it and then match a telephone number starting with a + (if
; present) or (if not) use the default route via the SIP trunk.
*.7.2.0.5.5.5.2.1.2.1 IN NAPTR 1 1 "u" "E2U+sip" "!(\+^.*$)!sip:\\1@ngv.example.com!" .
*.7.2.0.5.5.5.2.1.2.1 IN NAPTR 2 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@10.147.226.2!" .

; Clearwater external number 2125550280
0.8.2.0.5.5.5.2.1.2 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:+1\\1@ngv.example.com!" .

; Clearwater external number +12125550280
; Note that we can't define a domain name containing + so we must define a
; domain without it and then match a telephone number starting with a + (if
; present) or (if not) use the default route via the SIP trunk.
0.8.2.0.5.5.5.2.1.2.1 IN NAPTR 1 1 "u" "E2U+sip" "!(\+^.*$)!sip:\\1@ngv.example.com!" .
0.8.2.0.5.5.5.2.1.2.1 IN NAPTR 2 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@10.147.226.2!" .

; Clearwater internal numbers
*.5.5.5.0.5.6 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@ngv.example.com!" .

; Clearwater internal numbers dialled with +1 prefix
; Note that we can't define a domain name containing + so we must define a
; domain without it and then match a telephone number starting with a + (if
; present) or (if not) use the default route via the SIP trunk.
*.5.5.5.0.5.6.1 IN NAPTR 1 1 "u" "E2U+sip" "!\+1(^.*$)!sip:\\1@ngv.example.com!" .
*.5.5.5.0.5.6.1 IN NAPTR 2 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@10.147.226.2!" .

; NANP => SIP trunk.
* IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@10.147.226.2!" .
2 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@10.147.226.2!" .
*.2 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@10.147.226.2!" .
1.2 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@10.147.226.2!" .
*.1.2 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@10.147.226.2!" .
2.1.2 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@10.147.226.2!" .
*.2.1.2 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@10.147.226.2!" .
5.2.1.2 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@10.147.226.2!" .
*.5.2.1.2 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@10.147.226.2!" .
5.5.2.1.2 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@10.147.226.2!" .
*.5.5.2.1.2 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@10.147.226.2!" .
5.5.5.2.1.2 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@10.147.226.2!" .
*.5.5.5.2.1.2 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@10.147.226.2!" .
0.5.5.5.2.1.2 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@10.147.226.2!" .
*.0.5.5.5.2.1.2 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@10.147.226.2!" .
2.0.5.5.5.2.1.2 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@10.147.226.2!" .
*.2.0.5.5.5.2.1.2 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@10.147.226.2!" .
7.2.0.5.5.5.2.1.2 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@10.147.226.2!" .
1 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@10.147.226.2!" .
*.1 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@10.147.226.2!" .
2.1 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@10.147.226.2!" .
*.2.1 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@10.147.226.2!" .
1.2.1 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@10.147.226.2!" .
*.1.2.1 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@10.147.226.2!" .
2.1.2.1 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@10.147.226.2!" .
*.2.1.2.1 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@10.147.226.2!" .
5.2.1.2.1 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@10.147.226.2!" .
*.5.2.1.2.1 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@10.147.226.2!" .
5.5.2.1.2.1 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@10.147.226.2!" .
*.5.5.2.1.2.1 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@10.147.226.2!" .
5.5.5.2.1.2.1 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@10.147.226.2!" .
*.5.5.5.2.1.2.1 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@10.147.226.2!" .
0.5.5.5.2.1.2.1 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@10.147.226.2!" .
*.0.5.5.5.2.1.2.1 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@10.147.226.2!" .
2.0.5.5.5.2.1.2.1 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@10.147.226.2!" .
*.2.0.5.5.5.2.1.2.1 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@10.147.226.2!" .
7.2.0.5.5.5.2.1.2.1 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@10.147.226.2!" .
8.2.0.5.5.5.2.1.2 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@10.147.226.2!" .
*.8.2.0.5.5.5.2.1.2 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@10.147.226.2!" .
8.2.0.5.5.5.2.1.2.1 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@10.147.226.2!" .
*.8.2.0.5.5.5.2.1.2.1 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@10.147.226.2!" .
*.0.8.2.0.5.5.5.2.1.2.1 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@10.147.226.2!" .
6 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@10.147.226.2!" .
*.6 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@10.147.226.2!" .
5.6 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@10.147.226.2!" .
*.5.6 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@10.147.226.2!" .
5.0.5.6 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@10.147.226.2!" .
*.5.0.5.6 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@10.147.226.2!" .
5.5.0.5.6 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@10.147.226.2!" .
*.5.5.0.5.6 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@10.147.226.2!" .
5.5.5.0.5.6 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@10.147.226.2!" .
dnsmasq

Before configuring dnsmasq, you need to find a suitable host.

  • For a deployment with only one sprout node you should just be able to use the sprout node itself.
  • For deployments with more than one sprout node, you’ll need to either create a new node or pick one of your sprout nodes (bearing in mind that it will become a single point-of-failure) and then configure dnsmasq on all sprouts to query this server for enum lookups by creating an “/etc/dnsmasq.d/enum-forwarding” file containing a single line of the form “server=/e164.arpa/<DNS IP address/” and then restarting dnsmasq with “sudo service dnsmasq restart”. Note that you’ll also need to open UDP port 53 (DNS) to this server in the security group.

On the host you’ve chosen,

  1. ensure dnsmasq is installed (it is standard on all Clearwater nodes) by typing “sudo apt-get install dnsmasq”
  2. create an “/etc/dnsmasq.d/enum” file containing lines of the form “naptr-record=<enum domain name>,<order>,<preference>,<flags>,<service>,<regexp>” where
    • enum domain name is the domain name for which you want to return this regular expression - for example, if you want to use a regular expression for number “1201”, this would be “1.0.2.1.e164.arpa”
    • order specifies the order in which this rule is applied compared to others - rules of order 1 and processed before those of order 2 (and rules of order 2 are not processed if one of the matching rules of of order 1 are non-terminal)
    • preference specifies how preferable one rule is compared to another - if both rules match, the one with the higher preference is used
    • flags specifies how a match should be processed - it can either be blank (meaning apply the regular expression and then continue) or “U” (meaning the rule is terminal and, after applying the regular expression, no further rules should be processed)
    • service must be “E2U+SIP”, indicating ENUM rather than other services
    • regexp must be a regular expression to apply and is of the form !<pattern>!<replacement>! - note that the ”!” delimeter is only by convention and can be replaced with other symbols (such as “/”) if ”!” occurs within the pattern or replacement.
  3. restart dnsmasq by typing “sudo service dnsmasq restart”
  4. test your configuration by typing “dig -t NAPTR <enum domain name>” and checking you get back the responses you just added.

As an example, the JSON-ized ENUM rules for an example system (above), translate to the following entries in /etc/dnsmasq.d/enum.

# Demo system number 2125550270
naptr-record=0.7.2.0.5.5.5.2.1.2.e164.arpa,1,1,U,E2U+SIP,!(^.*$)!sip:\\1@10.147.226.2!

# Clearwater external numbers 2125550271-9
naptr-record=7.2.0.5.5.5.2.1.2.e164.arpa,1,1,U,E2U+SIP,!(^.*$)!sip:+1\\1@ngv.example.com!

# Clearwater external numbers +12125550271-9
# Note that we can't define a domain name containing + so we must define a # domain without it and then match a telephone number starting with a + (if
# present) or (if not) use the default route via the SIP trunk.
naptr-record=7.2.0.5.5.5.2.1.2.1.e164.arpa,1,1,U,E2U+SIP,!(\+^.*$)!sip:\\1@ngv.example.com!
naptr-record=7.2.0.5.5.5.2.1.2.1.e164.arpa,2,1,U,E2U+SIP,!(^.*$)!sip:\\1@10.147.226.2!

# Clearwater external number 2125550280
naptr-record=0.8.2.0.5.5.5.2.1.2.e164.arpa,1,1,U,E2U+SIP,!(^.*$)!sip:+1\\1@ngv.example.com!

# Clearwater external number +12125550280
# Note that we can't define a domain name containing + so we must define a
# domain without it and then match a telephone number starting with a + (if
# present) or (if not) use the default route via the SIP trunk.
naptr-record=0.8.2.0.5.5.5.2.1.2.1.e164.arpa,1,1,U,E2U+SIP,!(\+^.*$)!sip:\\1@ngv.example.com!
naptr-record=0.8.2.0.5.5.5.2.1.2.1.e164.arpa,2,1,U,E2U+SIP,!(^.*$)!sip:\\1@10.147.226.2!

# Clearwater internal numbers
naptr-record=5.5.5.0.5.6.e164.arpa,1,1,U,E2U+SIP,!(^.*$)!sip:\\1@ngv.example.com!

# Clearwater internal numbers dialled with +1 prefix
# Note that we can't define a domain name containing + so we must define a
# domain without it and then match a telephone number starting with a + (if
# present) or (if not) use the default route via the SIP trunk.
naptr-record=5.5.5.0.5.6.1.e164.arpa,1,1,U,E2U+SIP,!\+1(^.*$)!sip:\\1@ngv.example.com!
naptr-record=5.5.5.0.5.6.1.e164.arpa,2,1,U,E2U+SIP,!(^.*$)!sip:\\1@10.147.226.2!

# NANP => SIP trunk
naptr-record=e164.arpa,1,1,U,E2U+SIP,!(^.*$)!sip:\\1@10.147.226.2!

ENUM Domain Suffix

RFC 3761 mandates that domain names used during ENUM processing are suffixed with .e164.arpa. Obviously, this means that there can only be one such public domain. If you need your domain to be public (rather than private as set up above), you can instead change the suffix, e.g. to .e164.arpa.ngv.example.com, by

  • configuring sprout to use this suffix via the –enum-suffix parameter
  • configuring your DNS server to respond to this domain rather than .e164.arpa (by search-replacing in the config files described above)
  • configuring Route 53 to forward DNS requests for e164.arpa.ngv.example.com to your DNS server by creating an NS (Name Server) record with name “e164.arpa.ngv.example.com” and value set to the name/IP address of your DNS server.

ENUM and Sprout

To enable ENUM lookups on Sprout, edit /etc/clearwater/shared_config and add the following configuration to use either an ENUM server (recommended) or an ENUM file:

enum_server=<IP addresses of enum servers>
enum_file=<location of enum file>

If you use the ENUM file, enter the ENUM rules in the JSON format (shown above). If you are using the enhanced node management framework provided by clearwater-etcd, and you use /etc/clearwater/enum.json as your ENUM filename, you can automatically synchronize changes across your deployment by running sudo /usr/share/clearwater/clearwater-config-manager/scripts/upload_enum_json after creating or updating the file. In this case, other Sprout nodes will automatically download and use the uploaded ENUM rules.

It’s possible to configure Sprout with secondary and tertiary ENUM servers, by providing a comma-separated list (e.g. enum_server=1.2.3.4,10.0.0.1,192.168.1.1). If this is done:

  • Sprout will always query the first server in the list first
  • If this returns SERVFAIL or times out (which happens after a randomised 500ms-1000ms period), Sprout will resend the query to the second server
  • If this returns SERVFAIL or times out, Sprout will resend the query to the third server
  • If all servers return SERVFAIL or time out, the ENUM query will fail

Voxbone

Voxbone provides local, geographical, mobile and toll free phone numbers and converts incoming calls and SMSs to SIP INVITE and MESSAGE flows.

Clearwater supports these flows, and this document describes how to configure Voxbone and Clearwater to work together, and then how to test and troubleshoot.

Configuration

There are 3 configuration steps.

  1. Configure Clearwater’s IBCF to trust Voxbone’s servers.
  2. Create a subscriber on Clearwater.
  3. Associate the subscriber with a number on Voxbone.
Configuring IBCF

An IBCF (Interconnection Border Control Function) interfaces between the core network and one or must trusted SIP trunks. We will configure Voxbone to send all its traffic to the IBCF, but we first need to configure the IBCF to accept traffic from Voxbone.

Clearwater’s bono component acts as an IBCF. To make things simpler later on, you should configure all of your bono nodes to act as IBCFs.

The IP addresses you need to specify as trusted peers are

The process for configuring trusted peers is described as part of the IBCF documentation.

If you don’t want to configure all your bono nodes as IBCF nodes, you can do so, but will need to take extra steps when configuring Voxbone to ensure that calls are routed to the correct node.

Creating a subscriber

Creating a subscriber for Voxbone use is done exactly as you would normally do so.

Associating the subscriber on Voxbone

For each telephone number you own, Voxbone allows you to configure a SIP URI to which incoming calls will be sent.

  • If, as recommended earlier, all your bono nodes are configured as IBCFs, you can simply use the SIP URI of the subscriber you created above.
  • If only one (or a subset) of your bono nodes are configured as IBCFs, you must ensure that calls are routed to this node by replacing the domain name of the SIP URI with the IP address or domain name of the IBCF node. For example, if your SIP URI was 1234567890@example.com but your IBCF node was 1.2.3.4, you must use 1234567890@1.2.3.4.

If you are also using the VoxSMS service for receiving SMSes, you additionally need to configure a VoxSMS SIP URI using the same SIP URI as above.

Testing

The Voxbone web UI allows you to make a test call. Alternatively, you can make a normal call through the PSTN to your Voxbone number.

The only way to test SMS function is to actually send an SMS to your Voxbone number.

Troubleshooting

If your call or SMS doesn’t get through, there are three things to check.

  • Does the SIP message actually reach the IBCF? You can check this using network capture (e.g. tcpdump) on the IBCF? If not, the problem is likely to either be that the SIP URI’s domain doesn’t resolve to your IBCF, or that security groups or firewall rules are blocking traffic to it.
  • Is the SIP message trusted? You can again check this with network capture - if you see a 403 Forbidden response, this indicates that the IBCF does not trust the sender of the message. Check the trusted_peers entry in your /etc/clearwater/user_settings file and that you have restarted bono since updating it.
  • Would the call go through if it were sent on-net? Try making a call to the subscriber from another subscriber on Clearwater and check that it goes through (or follow the standard troubleshooting process).

SNMP Statistics

Clearwater provides a set of statistics about the performance of each Clearwater nodes over SNMP. Currently, this is available on Bono, Sprout, Ralf and Homestead nodes, and the MMTel, Call Diversion, Memento and Gemini Application Server nodes.

Configuration

These SNMP statistics require:

  • the clearwater-snmpd package to be installed for all node types
  • the clearwater-snmp-handler-astaire package to be installed for Sprout and Ralf nodes

These packages will be automatically installed when installing through the Chef automation system; for a manual install, you will need to install the packages with sudo apt-get install.

Usage

Clearwater nodes provide SNMP statistics over port 161 using SNMP v2c and community clearwater. The MIB definition file can be downloaded from here.

Our SNMP statistics are provided through plugins or subagents to the standard SNMPd packaged with Ubuntu, so querying port 161 (the standard SNMP port) on a Clearwater node will provide system-level stats like CPU% as well as any available Clearwater stats.

To load the MIB file, allowing you to refer to MIB objects by name, first place it in the ~/.snmp/mibs directory. To load the MIB file for just the current session, run export MIBS=+PROJECT-CLEARWATER-MIB. To load the MIB file every time, add the line mibs +PROJECT-CLEARWATER-MIB to a snmp.conf file in the ~/.snmp directory.

If a statistic is indexed by time period, then it displays the relevant statistics over:

  • the previous five-second period
  • the current five-minute period
  • the previous five-minute period

For example, a stat queried at 12:01:33 would display the stats covering:

  • 12:01:25 - 12:01:30 (the previous five-second period)
  • 12:00:00 - 12:01:33 (the current five-minute period)
  • 11:55:00 - 12:00:00 (the previous five-minute period)

All latency values are in microseconds.

Many of the statistics listed below are stored in SNMP tables

(although the MIB file should be examined to determine exactly which ones). The full table can be retrieved by using the snmptable command. For example, the Initial Registrations table for Sprout can be retrieved by running: | snmptable -v2c -c clearwater <ip> PROJECT-CLEARWATER-MIB::sproutInitialRegistrationTable

The individual table elements can be accessed using: snmpget -v2c -c clearwater <ip> <table OID>.1.<column>.<row>

For example, the Initial Registration Stats table has an OID of

.1.2.826.0.1.1578918.9.3.9, so the number of initial registration attempts in the current five-minute period can be retrieved by: | snmpget -v2c -c clearwater <ip> .1.2.826.0.1.1578918.9.3.9.1.2.2 or by: | snmpget -v2c -c clearwater <ip> PROJECT-CLEARWATER-MIB::sproutInitialRegistrationAttempts.scopeCurrent5MinutePeriod

The snmpwalk command can be used to discover the list of queryable

OIDs beneath a certain point in the MIB tree. For example, you can retrieve all of the entries in the Sprout Initial Registrations table using: | snmpwalk -v2c -c clearwater <ip> PROJECT-CLEARWATER-MIB::sproutInitialRegistrationTable

Running snmpwalk -v2c -c clearwater <ip> PROJECT-CLEARWATER-MIB::projectClearwater will output a very long list of all available Clearwater stats.

Note that running an ‘snmpget’ on a table OID will result in a “No Such Object available on this agent at this OID” message.

Bono statistics

Bono nodes provide the following statistics:

  • The standard SNMP CPU and memory usage statistics (see http://net-snmp.sourceforge.net/docs/mibs/ucdavis.html for details)
  • The average latency, variance, highest latency and lowest latency for SIP requests, indexed by time period.
  • The number of parallel TCP connections to each Sprout node.
  • The number of incoming requests, indexed by time period.
  • The number of requests rejected due to overload, indexed by time period.
  • The average request queue size, variance, highest queue size and lowest queue size, indexed by time period.
Sprout statistics

Sprout nodes provide the following statistics:

  • The standard SNMP CPU and memory usage statistics (see http://net-snmp.sourceforge.net/docs/mibs/ucdavis.html for details)
  • The average latency, variance, highest latency and lowest latency for SIP requests, indexed by time period.
  • The average latency, variance, highest latency and lowest latency for requests to Homestead, indexed by time period.
  • The average latency, variance, highest latency and lowest latency for requests to Homestead’s /impi/<private ID>/av endpoint, indexed by time period.
  • The average latency, variance, highest latency and lowest latency for requests to Homestead’s /impi/<private ID>/registration-status endpoint, indexed by time period.
  • The average latency, variance, highest latency and lowest latency for requests to Homestead’s /impu/<public ID>/reg-data endpoint, indexed by time period.
  • The average latency, variance, highest latency and lowest latency for requests to Homestead’s /impu/<public ID>/location endpoint, indexed by time period.
  • The average latency, variance, highest latency and lowest latency for requests to Homer, indexed by time period.
  • The number of attempts, successes and failures for AKA authentications on register requests, indexed by time period (AKA authentication attempts with a correct response but that fail due to the sequence number in the nonce being out of sync are counted as successes).
  • The number of attempts, successes and failures for SIP digest authentications on register requests, indexed by time period (authentication attempts with a correct response but that fail due to being stale are counted as failures).
  • The number of attempts, successes and failures for authentications on non-register requests, indexed by time period.
  • The number of attempts, successes and failures for initial registrations, indexed by time period (registrations that fail due to failed authentication are counted in the authentication stats and not here).
  • The number of attempts, successes and failures for re-registrations, indexed by time period (registrations that fail due to failed authentication are counted in the authentication stats and not here).
  • The number of attempts, successes and failures for de-registrations, indexed by time period (registrations that fail due to failed authentication are counted in the authentication stats and not here).
  • The number of attempts, successes and failures for third-party initial registrations, indexed by time period (registrations that fail due to failed authentication are counted in the authentication stats and not here).
  • The number of attempts, successes and failures for third-party re-registrations, indexed by time period (registrations that fail due to failed authentication are counted in the authetication stats and not here).
  • The number of attempts, successes and failures for third-party de-registrations, indexed by time period (registrations that fail due to failed authentication are counted in the authentication stats and not here).
  • The number of requests routed by the S-CSCF according to a route pre-loaded by an app server, indexed by time period.
  • The number of parallel TCP connections to each Homestead node.
  • The number of parallel TCP connections to each Homer node.
  • The number of incoming SIP requests, indexed by time period.
  • The number of requests rejected due to overload, indexed by time period.
  • The average request queue size, variance, highest queue size and lowest queue size, indexed by time period.
  • The number of Memcached buckets needing to be synchronized and buckets already resynchronized during the current Astaire resynchronization operation (overall, and for each peer).
  • The number of Memcached entries, and amount of data (in bytes) already resynchronized during the current Astaire resynchronization operation.
  • The transfer rate (in bytes/second) of data during this resynchronization, over the last 5 seconds (overall, and per bucket).
  • The number of remaining nodes to query during the current Chronos scaling operation.
  • The number of timers, and number of invalid timers, processed over the last 5 seconds.
  • The total number of timers being managed by a Chronos node at the current time.
  • The weighted average of total timer count, variance, highest timer count, lowest timer count, indexed by time period.
  • The number of attempts, successes and failures for incoming SIP transactions for the ICSCF, indexed by time period and request type.
  • The number of attempts, successes and failures for outgoing SIP transactions for the ICSCF, indexed by time period and request type.
  • The number of attempts, successes and failures for incoming SIP transactions for the SCSCF, indexed by time period and request type.
  • The number of attempts, successes and failures for outgoing SIP transactions for the SCSCF, indexed by time period and request type.
  • The number of attempts, successes and failures for incoming SIP transactions for the BGCF, indexed by time period and request type.
  • The number of attempts, successes and failures for outgoing SIP transactions for the BGCF, indexed by time period and request type.
  • The permitted request rate (PRR) is an estimate for the sustainable request rate without causing large latency. Sprout provides a weighted average permitted request rate, variance, highest PRR, and lowest PRR, indexed by time period.
  • The value of the smoothed latency at the last permitted request rate update.
  • The value of the target (maximum permissible) latency at the last permitted request rate update.
  • The number of penalties experienced at the last permitted request rate update.
  • The current permitted request rate.
  • The number of incoming INVITE transactions for the S-CSCF that were cancelled before a 1xx response was seen, indexed by time period.
  • The number of incoming INVITE transactions for the S-CSCF that were cancelled after a 1xx response was seen, indexed by time period (these INVITE cancellation statistics can be used to distinguish between the case where an INVITE was cancelled because the call rang but wasn’t answered and the case where it failed due to network issues and never got through in the first place).
Ralf statistics

Ralf nodes provide the following statistics:

  • The standard SNMP CPU and memory usage statistics (see http://net-snmp.sourceforge.net/docs/mibs/ucdavis.html for details).
  • The number of Memcached buckets needing to be synchronized and buckets already resynchronized during the current Astaire resynchronization operation (overall, and for each peer).
  • The number of Memcached entries, and amount of data (in bytes) already resynchronized during the current Astaire resynchronization operation.
  • The transfer rate (in bytes/second) of data during this resynchronization, over the last 5 seconds (overall, and per bucket).
  • The number of remaining nodes to query during the current Chronos scaling operation.
  • The number of timers, and number of invalid timers, processed over the last 5 seconds.
Homestead Statistics

Homestead nodes provide the following statistics:

  • The standard SNMP CPU and memory usage statistics (see http://net-snmp.sourceforge.net/docs/mibs/ucdavis.html for details)
  • The average latency, variance, highest call latency and lowest latency on HTTP requests, indexed by time period.
  • The average latency, variance, highest latency and lowest latency on the Cx interface, indexed by time period.
  • The average latency, variance, highest latency and lowest latency on Multimedia-Auth Requests on the Cx interface, indexed by time period.
  • The average latency, variance, highest latency and lowest latency on Server-Assignment, User-Authorization and Location-Information Requests on the Cx interface, indexed by time period.
  • The number of incoming requests, indexed by time period.
  • The number of requests rejected due to overload, indexed by time period.
  • The total number of Diameter requests with an invalid Destination-Realm or invalid Destination-Host, indexed by time period.
  • The number of Multimedia-Authorization-Answers with a given result-code received over the Cx interface, indexed by time period.
  • The number of Server-Assignment-Answers with a given result-code received over the Cx interface, indexed by time period.
  • The number of User-Authorization-Answers with a given result-code received over the Cx interface, indexed by time period.
  • The number of Location-Information-Answers with a given result-code received over the Cx interface, indexed by time period.
  • The number of Push-Profile-Answers with a given result-code sent over the Cx interface, indexed by time period.
  • The number of Registration-Termination-Answers with a given result-code sent over the Cx interface, indexed by time period.
Call Diversion App Server Statistics

Call Diversion App Server nodes provide the following statistics:

  • The number of attempts, successes and failures for incoming SIP transactions, indexed by time period and request type.
  • The number of attempts, successes and failures for outgoing SIP transactions, indexed by time period and request type.
Memento App Server Statistics

Memento App Server nodes provide the following statistics:

  • The number of attempts, successes and failures for incoming SIP transactions, indexed by time period and request type.
  • The number of attempts, successes and failures for outgoing SIP transactions, indexed by time period and request type.
MMTel App Server Statistics

MMTel App Server nodes provide the following statistics:

  • The number of attempts, successes and failures for incoming SIP transactions, indexed by time period and request type.
  • The number of attempts, successes and failures for outgoing SIP transactions, indexed by time period and request type.
Gemini App Server Statistics

Gemini App Server nodes provide the following statistics:

  • The number of attempts, successes and failures for incoming SIP transactions, indexed by time period and request type.
  • The number of attempts, successes and failures for outgoing SIP transactions, indexed by time period and request type.

SNMP Alarms

Clearwater nodes report errors over the SNMP interface when they are in an abnormal state, and clear them when they return to normal. They only report errors relating to that node - not errors relating to the whole deployment, which is the role of an external monitoring service.

Error indications come in two forms:

  • For clearly-defined errors not based on thresholds, the Clearwater node sets an ITU Alarm MIB from RFC 3877 and sends an SNMP notification to a configured external SNMP manager.
  • For errors based on a threshold set on a statistic (such as latency targets or number of failed connections), the Clearwater node exposes that statistic over SNMP. A downstream statistics aggregator from the Management and Orchestration (MANO) layer monitors these statistics, compares them to its configured thresholds, and raises alarms on that basis.

EMS

To integrate with an EMS, the EMS must support the following capabilities of RFC 3877 to obtain the brief description, detailed description, and severity for the alarm:

  • The EMS must be configured to catch the SNMP INFORM messages used to report alarms from Clearwater. It is also recommended that the EMS must display the alarm information provided by the following MIBs.
  • Upon receiving a SNMP INFORM message from Clearwater the EMS can obtain the alarm data by the following:
    • The EMS must retrieve the AlarmModelEntry MIB table data associated with the current SNMP INFORM message from Clearwater. This MIB provides the active/clear state, alarm description, and a table index for the ituAlarmEntry MIB.
    • The EMS must the retrieve the ituAlarmEntry MIB table data associated with the current alarm from Clearwater. This MIB provides the alarm severity and the additional detailed alarm description.

Alarm Models

The alarm models used by Clearwater are defined in alarmdefinition.h.

Radius Authentication

It is possible to authenticate users for SSH/SFTP access by using the RADIUS authentication protocol to contact a third party RADIUS authentication server. The following will explain how to install and configure this capability on a Clearwater node, and how to go about using it.

The authentication process

On attempting to access a node via SSH or SFTP, a user is either expected to use a key, or to provide a password to verify their identity, and thus pass through authentication successfully. This process requires a locally provisioned set of authentication details for each user that the sshd process can compare to the provided credentials for verification. In the case of password authentication, by enabling RADIUS all user accounts can be configured centrally on a RADIUS server, or in a database said server can access, and each node can pass user credentials provided at log in across to this server to complete the authentication process.

As the user attempting access may not exist locally on the node, which sshd requires, any unknown user is mapped to the default Ubuntu user to allow authentication to proceed correctly. As such, once authenticated, they will be acting as if they were the default user, but for auditing purposes it is the username provided at login that is recorded.

Prerequisites

The following conditions are assumed by this process:

  • Your nodes are running on Ubuntu 14.04.
  • Your nodes have access to both Clearwater and Ubuntu repositories.
  • Your SSH configuration allows password authentication and PAM (the correct configuration will be detailed below).
  • You have access to a third-party RADIUS server (such as freeRADIUS).
  • Your firewall allows UDP traffic to the above server on port 1812.

Installation

Package installation

Install the Clearwater RADIUS authentication package:

sudo apt-get install clearwater-radius-auth
Configuration

The details of your RADIUS server will need to be entered into /etc/pam_radius_auth.conf. This file provides an example of how entries should be structured: * Multiple entries are allowed, but each must be on a new line. * Each line consists of three fields: * server[:port] (The default is port 1812. All traffic will be UDP) * secret * [timeout] (Default timeout is 3 seconds) * The secret is shared between each client and the server to allow simple encryption of passwords. The secret must match the entry for the client in the RADIUS server configuration. * Both the port and timeout entries are optional.

Your sshd configuration must allow password authentication, and use of PAM. If you are unsure, check that the PasswordAuthentication and UsePAM entries in /etc/ssh/sshd_config are set to yes. Any changes to ssh configuration will require the ssh process to be restarted before coming into effect.

You must ensure that your firewall/security groups allow UDP traffic to the RADIUS server on port 1812.

Usage

Once the above is installed and configured, any user provisioned in the RADIUS server can attempt SSH or SFTP access to the configured node, and on providing their password they will be authenticated against the details held on the RADIUS server, and logged in, acting as the default Ubuntu user. Commands such as who or last will output the username supplied at login, and this will also be recorded in the auth log /var/log/auth.log.

Any users provisioned locally on the node will see no change to their authentication experience. By default, RADIUS authentication is set to be a sufficient, but not required condition. As such, failing to authenticate against the server credentials will cause the authentication attempt to fall back to checking locally provisioned details. See below for further details on configuration options.

Troubleshooting

  • If you are not seeing any traffic reaching your RADIUS server, and entries in /var/log/auth.log state that no RADIUS server was reachable, re-check the RADIUS server entry in /etc/pam_radius_auth.conf, and ensure that your firewall is configured to allow UDP traffic to the RADIUS server on port 1812.
  • If your RADIUS server is rejecting authentication requests, ensure that the server is configured correctly.

Removal

To properly remove clearwater-radius-auth, and the components it brings with it, run the following:

sudo apt-get purge clearwater-radius-auth
sudo apt-get purge libpam-radius-auth
sudo apt-get purge libnss-ato

This will remove all configuration put in place by the installation. Should your configuration become corrupt, purging and re-installing the associated module will re-instate the correct configuration.

Further configuration

This section details the configuration put in place by the installation. It is highly recommended that these be left as their defaults. The following is for information purposes only.

libnss-ato.conf

The libnss-ato configuration file is found at /etc/libnss-ato.conf, and will look like the following:

ubuntu:x:1000:1000:Ubuntu:/home/ubuntu:/bin/bash

It holds the information of the default user to which unknown users are mapped. By default this maps to the Ubuntu user.

Only the first line of this file is parsed. The user entry is the same format as is found in /etc/passwd. Replacing this file with a different user entry will map unknown users to the entry provided.

pam.d/sshd

The PAM configuration file for the sshd process is found at /etc/pam.d/sshd. As part of the installation, the 3 lines around auth sufficient pam_radius_auth.so are added at the top of the file, configuring PAM to attempt RADIUS authentication before other methods. It will look like the following:

# PAM configuration for the Secure Shell service
# +clearwater-radius-auth
auth sufficient pam_radius_auth.so
# -clearwater-radius-auth
# Standard Un*x authentication.

It is strongly recommended that users do not modify this entry. Further information on this configuration can be found at FreeRADIUS.

nsswitch.conf

The NSS configuration file is found at /etc/nsswitch.conf. After installation, the top three entries in this file will look as follows:

passwd:   compat ato
group:    compat
shadow:   compat ato

Modifications to the NSS configuration make it check the libnss-ato component for a user mapping if no local user is found. The addition of ato at the end of both the passwd and shadow entries provides this function. Removal of these addition will disable the user mapping, and break RADIUS authentication.

Application Server Guide

You can add new call features or functionality to calls by adding an application server. Clearwater supports application servers through the standard IMS interface ISC. This article explains the features and limitations of this support. See Configuring an Application Server for details of how to configure Clearwater to use this function.

What is an application server?

An application server (AS) is a server which is brought into or notified of a call by the network in order to provide call features or other behaviour.

In IMS, an application server is a SIP entity which is invoked by the S-CSCF for each dialog, both originating and terminating, as configured by the initial filter criteria (iFCs) of the relevant subscribers. The application server may reject or accept the call itself, redirect it, pass it back to the S-CSCF as a proxy, originate new calls, or perform more complex actions (such as forking) as a B2BUA. It may choose to remain in the signaling path, or drop out.

The application server communicates with the IMS core via the ISC interface. This is a SIP interface. The details are given in 3GPP TS 24.229, especially s5.4 (the S-CSCF side) and s5.7 (the AS side). Further useful information can be found in 3GPP TS 23.228. Section references below are to 3GPP TS 24.229 unless otherwise specified.

For a more detailed look at the ISC interface, and how it is implemented in Clearwater, see the ISC interface guide in the Sprout developer documentation.

Clearwater interfaces

In Clearwater most S-CSCF function, including the ISC interface, is implemented in Sprout. Sprout invokes application servers over the ISC interface, as specified in the iFCs.

  • Per the IMS specs, this invocation occurs on dialog-initiating requests such as INVITE, SUBSCRIBE, MESSAGE, etc, and according to the triggers within the iFCs (s5.4.3.2, s5.4.3.3):
    • When specified, Sprout will route the message to the AS; the AS can either route it onward, act as a B2BUA for e.g. call diversion, or give a final response.
    • Clearwater has full support for chained iFCs. The original dialog is tracked by Sprout using an ODI token in the Route header. This support includes support for forking ASs at any point in the chain.
    • Service trigger points (i.e., conditions) are implemented, including SIP method, SIP headers, SDP lines, registration parameters, and request URIs.
    • Service trigger points can have session case configuration, allowing the AS to only be invoked on originating calls or on terminating calls.
  • No per-AS configuration is required; ASs are invoked simply by their URI appearing in the iFCs.
  • AS invocation also occurs on REGISTER - this is called third-party registration (3GPP TS 24.229 s5.4.1.7 and 7A):
    • When a UE registers with Sprout, if the iFCs require it, it passes a third-party registration onto an AS.
    • Message body handling for third-party registration, per 3GPP TS 24.229 s5.4.1.7A: including optionally service info, a copy of the registration, and a copy of the response.
    • Network-initiated deregister. If the third-party registration fails and the iFC requests it, we must deregister the UE.
Supported SIP headers

The following SIP headers are supported on the ISC interface:

  • All standard RFC3261 headers and relevant flows, including To, From, Route, Contact, etc. Also Privacy (per RFC3323) and Path (per RFC3327).
  • P-Asserted-Identity - bare minimal support only: we set it to the same value as From: on the outbound ISC interface, and never strip it. Full support will be added in future phases. This header is needed by originating ASs, particularly in cases where privacy is requested (see RFC3325). Full support would involve setting it in P-CSCF (bono), and ensuring it is set/stripped in the right places. The proposed minimal support has the limitation that ASs which don’t understand P-Served-User won’t work correctly when privacy is invoked. See 3GPP TS 24.229 s5.7.1.3A.
  • P-Served-User. This is needed for proper support of AS chains, to avoid service-interaction tangles. It is set by Sprout and set/stripped in the appropriate places.
Points of Note
  • Trust:
    • Some ISC signaling is trust-dependent. For Clearwater, all ASs are trusted - we think support for untrusted ASs is unlikely to be required.
  • IP Connectivity
    • We assume that there is no NAT (static or otherwise) between the AS and the Clearwater core.
Current Spec Deltas

As of June 2015, Clearwater has the following limitations on the ISC interface:

  • Change of request URI:
    • 3GPP TS 24.229 s5.4.3.3 step 3 allows a terminating AS to change the request URI to another URI that matches it (i.e., is canonically equal or an alias) without interrupting the interpretation of the iFCs.
    • Clearwater only supports this for URIs which are canonically equal; it does not support changing to an alias URI (i.e., a different public identity belonging to the same alias group of the same private identity, per 3GPP TS 24.229 s3.1, 3GPP TS 29.228 sB.2.1).
  • Request-Disposition: no-fork (3GPP TS 24.229 s5.4.3.3 step 10, s5.7.3, RFC 3841)
    • Clearwater ignores this directive - it always INVITEs all registered endpoints.
Future phases

The following features may be implemented by Clearwater in future phases:

  • Billing: both billing headers within ISC, and AS communicating with the billing system via Rf/Ro.
  • P-Access-Network-Info: should be passed (from the UE) and stripped in the appropriate places, and possibly set by bono.
  • History-Info draft-ietf-sipcore-rfc4244bis (not RFC4244, which is inaccurate wrt real implementations).
  • user=dialstring handling (RFC4967). The specs (3GPP TS 24.229 s5.4.3.2, esp step 10) are quite clear that this is handed off to an AS or handled in a deployment-specific way, as for various other URI formats, so there is nothing to do here.
  • P-Asserted-Service / P-Preferred-Service (RFC6050, TS 23.228 s4.13), i.e., IMS Communication Services and ICSIs.
  • IMS debug package, IMS logging.
  • Support for untrusted ASs.
  • Support for terminating ASs changing to an alias URI.
  • Support for Request-Disposition: no-fork, which should pick a single endpoint to INVITE.
Exclusions

Clearwater implements the ISC interface only. An AS may also expect to communicate over several other interfaces:

  • AS interrogates HSS over Sh. If the AS requires this it should access the HSS directly, bypassing Clearwater’s Homestead cache. It is not supported in deployments without an HSS.
  • AS interrogates SRF over Dh. Typical Clearwater deployments do not include an SRF.
  • UE retrieves and edits AS configuration via Ut. An AS is free to provide this or any other configuration interface it chooses. Homer does not provide a generic Ut interface for ASs to store configuration information.

The built-in MMTEL application server

Clearwater has a built-in application server, mmtel.<deployment-domain>, which implements a subset of the MMTEL services defined in GSMA PRD IR.92, ETSI TS 129.364 and 3GPP TS 24.623:

  • Originating Identification Presentation (OIP)
  • Originating Identification Restriction (OIR)
  • Communication Diversion (CDIV)
  • Communication Barring (CB)

Note that Clearwater is also able to support a number of other IR.92 services, either through inherent support (by transparently passing messages between UEs) or through integration with external application servers. See the IR.92 Supplementary Services document for more information.

The built-in MMTEL application server is invoked only for calls configured to use it. To use it, simply configure a subscriber’s iFCs to indicate the use of mmtel.<deployment-domain> as an application server. The MMTEL application server can be used on its own, or as one of a chain of application servers handling a call. The default iFCs configured by Ellis specify that the built-in MMTEL application server should be used for all originating and terminating calls.

WebRTC support in Clearwater

Clearwater supports WebRTC clients. This article explains how to do this, using a common WebRTC client as an example.

This has been tested using Chrome 26 and some previous versions. It has not been tested on other browsers. WebRTC is under active development, so as a result of browser updates clients may not work.

Be aware that the sipML5 client can’t be logged in to multiple numbers simultaneously in the same browser.

Instructions

In the instructions below, <domain> is your Clearwater domain.

  • Configure a line in ellis.
  • Check the camera is not in use: check the camera light is off. If it is on, close the app that is using it (e.g., Skype, or a phone client).
  • Visit http://sipml5.org/ and click on the button to start the live demo. If the website is down, you can also use http://doubango.org/sipml5/.
  • By default sipML5 uses a SIP<->WebRTC gateway run by sipml5.org. Clearwater supports WebRTC directly. To tell sipML5 to speak WebRTC directly to Clearwater:
    • Click on the “expert mode” button to open the “expert mode” tab, and fill in the following field:
      • WebSocket Server URL: ws://<domain>:5062
    • Click Save.
    • Go back to the calls tab.
  • Fill in the fields as follows:
    • Display name: <whatever you like, e.g., your name>
    • Private identity: <phone number>@<domain>
    • Public identity: sip:<phone number>@<domain>
    • Password: <password as displayed in ellis>
    • Realm: <domain>
  • Now click Log In. This sometimes takes a few seconds. If it fails (e.g., due to firewall behaviour), click Log Out then Log In again.

Some firewalls or proxies may not support WebSockets (the technology underlying WebRTC). If this is the case, logging in will not work (probably it will time out). Clearwater uses port 5062 for WebSockets.

Once you have this set up on two tabs, or two browsers, you can make calls between them as follows:

  • Entering the phone number
  • Press the Call button.
  • Accept the local request to use the camera/mic.
  • Remote browser rings.
  • Accept the remote request to use the camera/mic.
  • Click Answer.
  • Talk.
  • Click Hang up on either end.

Video can be disabled on the ‘expert mode’ tab.

Note: Clearwater does not transcode media between WebRTC and non-WebRTC clients, so such calls are not possible. Transcoding will be supported in future.

Known sipML5 bugs

At the time of writing, the following sipML5 bugs are known:

  • Calls where one or both ends do not have a webcam do not always complete correctly. It is best to ensure both ends have a webcam, even if audio-only calls are being made.
  • Hang up doesn’t work - you have to hang up on both ends.

Using STUN and TURN

Note: The above uses a default (non-Clearwater) STUN server, and no TURN server. Clearwater has its own STUN and TURN servers which can be used to support clients behind NATs and firewalls, but at the time of writing (April 2013) there are some difficulties with the WebRTC spec that make the Clearwater TURN server inaccessible to compliant browsers. These are being tracked.

If one or both clients are behind a symmetric firewall, you must use TURN. Configure this as follows:

  • On the ‘expert mode’ tab, fill in the following field with your SIP username and the TURN workaround password set by your administrator (turn_workaround in Installing a Chef workstation). This is a workaround for Chrome WebRTC issue 1508 which will be fixed in Chrome 28 (not yet released).
    • ICE Servers: [{url:”turn:<phone number>%40<domain>@<domain>”,credential:”<password>”}]

Testing TURN

To test TURN when one or both of the clients are running under Windows 7, find the IP of the remote client, and block direct communication between them as follows:

  • Control Panel > Windows Firewall > Advanced Settings > Inbound Rules > New Rule > Custom > All Programs > Any Protocol > specify remote IP address > Block the connection > (give it a name).
  • Do the same for outbound.

IR.92 Supplementary Services

GSMA IR.92 defines a set of Supplementary Services (section 2.3). This document describes Clearwater’s support for each.

The services fall into one of four categories.

  • Supported by Clearwater’s built-in MMTel application server
  • Supported by Clearwater inherently (i.e. whether or not the MMTel application server is enabled) - these services rely on messages or headers that Clearwater simply passes through
  • Requiring an external application server
  • Not supported

Supported by Clearwater’s Built-In MMTel Application Server

The following Supplementary Services are supported by Clearwater’s built-in MMTel Application Server.

Supported by Clearwater Inherently

The following Supplementary Services are primarily implemented on the UE, and just require Clearwater to proxy messages and headers, which it does.

  • Communication Hold
  • Communication Waiting

Requiring an External Application Server

The following Supplementary Services require an external application server.

  • Terminating Identification Presentation
  • Terminating Identification Restriction
  • Barring of Outgoing International Calls - ex Home Country - Clearwater does not currently have sufficient configuration to know the subscriber’s home country.
  • Message Waiting Indication - Clearwater proxies SUBSCRIBE and NOTIFY messages for the MWI event package (RFC 3842), but requires an external application server to handle/generate them.
  • Ad-Hoc Multi Party Conference - Clearwater proxies messages between the UE and an external application server that conferences sessions together.

Not Supported

The following Supplementary Service is not currently supported by Clearwater.

  • Barring of Incoming Calls - When Roaming - Clearwater does not currently support roaming, and so can’t support barring on this basis.

Backups

Within a Clearwater deployment, ellis, homestead, homer and memento all store persistent data. (Bono and sprout do not.) To prevent data loss in disaster scenarios, ellis, homestead, homer and memento all have data backup and restore mechanisms. Specifically, all support

  • manual backup
  • periodic automated local backup
  • manual restore.

This document describes

  • how to list the backups that have been taken
  • how to take a manual backup
  • the periodic automated local backup behavior
  • how to restore from a backup.

Note that if your Clearwater deployment is integrated with an external HSS, the HSS is the master of ellis and homestead’s data, and those nodes do not need to be backed up. However, homer’s data still needs to be backed up.

Listing Backups

The process for listing backups varies between ellis and homestead, homer and memento.

Ellis

To list the backups that have been taken on ellis, run sudo /usr/share/clearwater/ellis/backup/list_backups.sh.

Backups for ellis:
1372294741  /usr/share/clearwater/ellis/backup/backups/1372294741
1372294681  /usr/share/clearwater/ellis/backup/backups/1372294681
1372294621  /usr/share/clearwater/ellis/backup/backups/1372294621
1372294561  /usr/share/clearwater/ellis/backup/backups/1372294561
Homestead, Homer and Memento

Homestead actually contains two databases (homestead_provisioning and homestead_cache) and these must be backed up together. This is why there are two commands for each homestead operation. Homer and memento only contain one database and so there is only one command for each operation.

To list the backups that have been taken on homestead, homer or memento, run

  • sudo /usr/share/clearwater/bin/run-in-signaling-namespace /usr/share/clearwater/bin/list_backups.sh homestead_provisioning and sudo /usr/share/clearwater/bin/run-in-signaling-namespace /usr/share/clearwater/bin/list_backups.sh homestead_cache for homestead
  • sudo /usr/share/clearwater/bin/run-in-signaling-namespace /usr/share/clearwater/bin/list_backups.sh homer for homer
  • sudo /usr/share/clearwater/bin/run-in-signaling-namespace /usr/share/clearwater/bin/list_backups.sh memento for memento.

This produces output of the following form, listing each of the available backups.

No backup directory specified, defaulting to /usr/share/clearwater/homestead/backup/backups
provisioning1372812963174
provisioning1372813022822
provisioning1372813082506
provisioning1372813143119

You can also specify a directory to search in for backups, e.g. for homestead:

sudo /usr/share/clearwater/bin/run-in-signaling-namespace /usr/share/clearwater/bin/list_backups.sh homestead_provisioning <backup dir>

Taking a Manual Backup

The process for taking a manual backup varies between ellis and homestead, homer and memento. Note that in all cases,

  • the backup is stored locally and should be copied to a secure backup server to ensure resilience
  • this process only backs up a single local node, so the same process must be run on all nodes in a cluster to ensure a complete set of backups
  • these processes cause a small amount of extra load on the disk, so it is recommended not to perform this during periods of high load
  • only 4 backups are stored locally - when a fifth backup is taken, the oldest is deleted.
Ellis

To take a manual backup on ellis, run sudo /usr/share/clearwater/ellis/backup/do_backup.sh.

This produces output of the following form, reporting the successfully-created backup.

Creating backup in /usr/share/clearwater/ellis/backup/backups/1372336317/db_backup.sql

Make a note of the snapshot directory (1372336317 in the example above) - this will be referred to as <snapshot> below.

This file is only accessible by the root user. To copy it to the current user’s home directory, run

snapshot=<snapshot>
sudo bash -c 'cp /usr/share/clearwater/ellis/backup/backups/'$snapshot'/db_backup.sql ~'$USER' &&
              chown '$USER.$USER' db_backup.sql'

This file can, and should, be copied off the ellis node to a secure backup server.

Homestead, Homer and Memento

To take a manual backup on homestead, homer or memento, run

  • sudo /usr/share/clearwater/bin/run-in-signaling-namespace /usr/share/clearwater/bin/do_backup.sh homestead_provisioning and sudo /usr/share/clearwater/bin/run-in-signaling-namespace /usr/share/clearwater/bin/do_backup.sh homestead_cache on homestead
  • sudo /usr/share/clearwater/bin/run-in-signaling-namespace /usr/share/clearwater/bin/do_backup.sh homer on homer
  • sudo /usr/share/clearwater/bin/run-in-signaling-namespace /usr/share/clearwater/bin/do_backup.sh memento on memento.

This produces output of the following form, reporting the successfully-created backup.

...
Deleting old backup: /usr/share/clearwater/homestead/backup/backups/1372812963174
Creating backup for keyspace homestead_provisoning...
Requested snapshot for: homestead_provisioning
Snapshot directory: 1372850637124
Backups can be found at: /usr/share/clearwater/homestead/backup/backups/provisioning/

Make a note of the snapshot directory - this will be referred to as <snapshot> below.

The backups are only stored locally - the resulting backup is stored in /usr/share/clearwater/homestead/backup/backups/provisioning/<snapshot>

These should be copied off the node to a secure backup server. For example, from a remote location execute scp -r ubuntu@<homestead node>:/usr/share/clearwater/homestead/backup/backups/provisioning/<snapshot> ..

Periodic Automated Local Backups

Ellis, homestead, homer and memento are all automatically configured to take daily backups if you’ve installed them through chef, at midnight local time every night.

If you want to turn this on, edit your crontab by running sudo crontab -e and add the following lines if not already present:

  • 0 0 * * * /usr/share/clearwater/ellis/backup/do_backup.sh on ellis
  • 0 0 * * * /usr/share/clearwater/bin/run-in-signaling-namespace /usr/share/clearwater/bin/do_backup.sh homestead_provisioning and 5 0 * * * /usr/share/clearwater/bin/run-in-signaling-namespace /usr/share/clearwater/bin/do_backup.sh homestead_cache on homestead
  • 0 0 * * * /usr/share/clearwater/bin/run-in-signaling-namespace /usr/share/clearwater/bin/do_backup.sh homer on homer
  • 0 0 * * * /usr/share/clearwater/bin/run-in-signaling-namespace /usr/share/clearwater/bin/do_backup.sh memento on memento.

These backups are stored locally, in the same locations as they would be generated for a manual backup.

Restoring from a Backup

There are three stages to restoring from a backup.

  1. Copying the backup files to the correct location.
  2. Running the restore backup script.
  3. Synchronizing ellis, homestead, homer and memento’s views of the system state.

This process will impact service and overwrite data in your database.

Copying Backup Files

The first step in restoring from a backup is getting the backup files/directories into the correct locations on the ellis, homer, homestead or memento node.

If you are restoring from a backup that was taken on the node on which you are restoring (and haven’t moved it), you can just move onto the next step.

If not, create a directory on your system that you want to put your backups into (we’ll use ~/backup in this example). Then copy the backups there. For example, from a remote location that contains your backup directory <snapshot> execute scp -r <snapshot> ubuntu@<homestead node>:backup/<snapshot>.

On ellis, run the following commands.

snapshot=<snapshot>
sudo chown root.root db_backup.sql
sudo mkdir -p /usr/share/clearwater/ellis/backup/backups/$snapshot
sudo mv ~/backup/$snapshot/db_backup.sql /usr/share/clearwater/ellis/backup/backups/$snapshot

On homestead/homer/memento there is no need to further move the files as the backup script takes a optional backup directory parameter.

Running the Restore Backup Script

To actually restore from the backup file, run

  • sudo /usr/share/clearwater/ellis/backup/restore_backup.sh <snapshot> on ellis
  • sudo /usr/share/clearwater/bin/run-in-signaling-namespace /usr/share/clearwater/bin/restore_backup.sh homestead_provisioning <snapshot> <backup directory> and sudo /usr/share/clearwater/bin/run-in-signaling-namespace /usr/share/clearwater/bin/restore_backup.sh homestead_cache <snapshot> <backup directory> on homestead
  • sudo /usr/share/clearwater/bin/run-in-signaling-namespace /usr/share/clearwater/bin/restore_backup.sh homer <snapshot> <backup directory> on homer
  • sudo /usr/share/clearwater/bin/run-in-signaling-namespace /usr/share/clearwater/bin/restore_backup.sh memento <snapshot> <backup directory> on memento.

Ellis will produce output of the following form.

Will attempt to backup from backup 1372336317
Found backup directory 1372336317
Restoring backup for ellis...
--------------
/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */
--------------

...

--------------
/*!40111 SET SQL_NOTES=@OLD_SQL_NOTES */
--------------

Homestead, homer or memento will produce output of the following form.

Will attempt to backup from backup 1372336442947
Will attempt to backup from directory /home/ubuntu/bkp_test/
Found backup directory /home/ubuntu/bkp_test//1372336442947
Restoring backup for keyspace homestead_provisioning...
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xm
s826M -Xmx826M -Xmn100M -XX:+HeapDumpOnOutOfMemoryError -Xss180k
Clearing commitlog...
filter_criteria: Deleting old .db files...
filter_criteria: Restoring from backup: 1372336442947
private_ids: Deleting old .db files...
private_ids: Restoring from backup: 1372336442947
public_ids: Deleting old .db files...
public_ids: Restoring from backup: 1372336442947
sip_digests: Deleting old .db files...
sip_digests: Restoring from backup: 1372336442947

At this point, this node has been restored.

Synchronization

It is possible (and likely) that when backups are taken on different boxes the data will be out of sync, e.g. ellis will know about a subscriber, but there will no digest in homestead. To restore the system to a consistent state we have a synchronization tool within ellis, which can be run over a deployment to get the databases in sync. To run, log into an ellis box and execute:

cd /usr/share/clearwater/ellis
sudo env/bin/python src/metaswitch/ellis/tools/sync_databases.py

This will:

  • Run through all the lines on ellis that have an owner and verify that there is a private identity associated with the public identity stored in ellis. If successful, it will verify that a digest exists in homestead for that private identity. If either of these checks fail, the line is considered lost and is removed from ellis. If both checks pass, it will check that there is a valid IFC - if this is missing, it will be replaced with the default IFC.
  • Run through all the lines on ellis without an owner and make sure there is no orphaned data in homestead and homer, i.e. deleting the simservs, IFC and digest for those lines.

Shared Configuration

In addition to the data stored in ellis, homer, homestead and memento, a Clearwater deployment also has shared configuration that is automatically shared between nodes. This is stored in a distributed database, and mirrored to files on the disk of each node.

Backing Up

To backup the shared configuration:

  • If you are in the middle of modifying shared config, complete the process to apply the config change to all nodes.

  • Log onto one of the sprout nodes in the deployment.

  • Copy the following files to somewhere else for safe keeping (e.g. another directory on the node, or another node entirely).

    /etc/clearwater/shared_config /etc/clearwater/bgcf.json /etc/clearwater/enum.json /etc/clearwater/s-cscf.json

Restoring Configuration

To restore a previous backup, copy the four files listed above to /etc/clearwater on one of your sprout nodes. Then run the following commands on that node:

/usr/share/clearwater/clearwater-config-manager/scripts/upload_shared_config
/usr/share/clearwater/clearwater-config-manager/scripts/upload_bgcf_json
/usr/share/clearwater/clearwater-config-manager/scripts/upload_enum_json
/usr/share/clearwater/clearwater-config-manager/scripts/upload_scscf_json

Call Barring

Clearwater allows users to configure rules for blocking of incoming or outgoing calls, by implementing the IR.92 version of call barring (which is itself a cut-down version of the MMTEL call barring service).

The relevant specifications are:

Some of the entries in the IR.92 list are not currently applicable for Clearwater, since Clearwater does not have support for/the concept of roaming calls. Because of this, the customer may only specify that they wish to block one of:

  • No outgoing calls
  • All outgoing calls
  • Outgoing calls to international numbers (but see below)

And one of:

  • No incoming calls
  • All incoming calls

International Call detection

The 3GPP specification states that to detect an international number, a SIP router should discount SIP/SIPS URIs that do not have a user=phone parameter set on them (this parameter is intended to distinguish between dialed digits and call-by-name with numeric names) but X-Lite and other SIP clients never set this parameter. Because of this behavior, the Clearwater implementation considers all SIP URIs for international call categorization.

National dialing prefix

Another limitation is that the international call detection algorithm only checks against a global ‘national’ dialing prefix (which is fine unless subscribers are expected to be multi-national within one Clearwater deployment). There does not appear to be a mechanism in IMS networks to store a subscriber-specific locale (e.g. in the HSS) so this would require some smarts to solve. These issues may be addressed in subsequent development.

Call Diversion

Clearwater allows users to define rules for handling of incoming calls, by implementing a superset of the IR.92 version of call diversion (which is itself a cut-down version of the MMTEL call diversion service).

The relevant specifications are

We have implemented all of IR.92 section 2.3.8. We have implemented all of 3GPP TS 24.604 with the following exceptions.

  • CDIV Notification (notifying the diverting party when a call has been diverted) is not supported
  • We only support the default subscription options from table 4.3.1.1 (most notably, no privacy options).
  • We only support hard-coded values for network provider options from table 4.3.1.2, as follows.
    • We stop ringing the diverting party before/simultaneously with trying the diverted-to target (rather than waiting for them to start ringing).
      • Served user communication retention on invocation of diversion (forwarding or deflection) = Clear communication to the served user on invocation of call diversion
      • Served user communication retention when diverting is rejected at diverted-to user = No action at the diverting user
    • We don’t support CDIV Notification (as above).
      • Subscription option is provided for “served user received reminder indication on outgoing communication that CDIV is currently activated” = No
      • CDIV Indication Timer = irrelevant
      • CDIVN Buffer Timer = irrelevant
    • We support up to 5 diversions and then reject the call.
      • Total number of call diversions for each communication = 5
      • AS behavior when the maximum number of diversions for a communication is reached = Reject the communication
    • The default no-reply timer before Communication Forwarding on no Reply occurs is hard-coded to 20s. This is still overridable on a per-subscriber basis.
      • Communication forwarding on no reply timer = 36s
  • Obviously, the interactions with services that we haven’t implemented yet have not been implemented! In particular, we haven’t implemented any interactions with terminating privacy.
  • We’ve ignored the comment in section 4.2.1.1 which says “It should be possible that a user has the option to restrict receiving communications that are forwarded” but doesn’t provide any function for this.

Note that we have implemented support for Communication Deflection (in which the callee sends a 302 response and this triggers forwarding) despite it not being required by IR.92.

Elastic Scaling

The core Clearwater nodes have the ability to elastically scale; in other words, you can grow and shrink your deployment on demand, without disrupting calls or losing data.

This page explains how to use this elastic scaling function when using a deployment created through the automated or manual install processes. Note that, although the instructions differ between the automated and manual processes, the underlying operations that will be performed on your deployment are the same - the automated process simply uses chef to drive this rather than issuing the commands manually.

Before scaling your deployment

Before scaling up or down, you should decide how many each of Bono, Sprout, Homestead, Homer and Ralf nodes you need (i.e. your target size). This should be based on your call load profile and measurements of current systems, though based on experience we recommend scaling up a tier of a given type (sprout, bono, etc.) when the average CPU utilization within that tier reaches ~60%. The Deployment Sizing Spreadsheet may also provide useful input.

Performing the resize

If you did an Automated Install

To resize your automated deployment, run:

knife deployment resize -E <env> --sprout-count <n> --bono-count <n> --homer-count <n> --homestead-count <n> --ralf-count <n>

Where the <n> values are how many nodes of each type you need. Once this command has completed, the resize operation has completed and any nodes that are no longer needed will have been terminated.

More detailed documentation on the available Chef commands is available here.

If you did a Manual Install

Follow these instructions if you manually installed your deployment and are using Clearwater’s automatic clustering and configuration sharing functionality.

If you’re scaling up your deployment, follow the following process:

  1. Spin up new nodes, following the standard install process, but with the following modifications:
    • Set the etcd_cluster so that it only includes the nodes that are already in the deployment (so it does not include the nodes being added).
    • Stop when you get to the “Provide Shared Configuration” step. The nodes will learn their configuration from the existing nodes.
  2. Wait until the new nodes have fully joined the existing deployment. To check if a node has joined the deployment:
    • Run /usr/share/clearwater/clearwater-cluster-manager/scripts/check_cluster_state. This should report that the local node is in all of its clusters and that the cluster is stable.
    • Run sudo /usr/share/clearwater/clearwater-config-manager/scripts/check_config_sync. This reports when the node has learned its configuration.
  3. Update DNS to contain the new nodes.

If you’re scaling down your deployment, follow the following process:

  1. Update DNS to contain the nodes that will remain after the scale-down.
  2. On each node that is about to be turned down:
    • Run monit unmonitor -g <node-type>. For example for a sprout node: monit unmonitor -g sprout. On a homestead node also run monit unmonitor -g homestead-prov.
    • Start the main process quiescing.
      • Sprout - sudo service sprout quiesce
      • Bono - sudo service bono quiesce
      • Homestead - sudo service homestead stop && sudo service homestead-prov stop
      • Homer - sudo service homer stop
      • Ralf -sudo service ralf stop
      • Ellis - sudo service ellis stop
      • Memento - sudo service memento stop
    • Unmonitor the clearwater management processes:
      • sudo monit unmonitor clearwater_cluster_manager
      • sudo monit unmonitor clearwater_config_manager
      • sudo monit unmonitor -g etcd
    • Run sudo service clearwater-etcd decommission. This will cause the nodes to leave their existing clusters.
  3. Once the above steps have completed, turn down the nodes.
If you did a Manual Install without Automatic Clustering

Follow these instructions if you manually installed your deployment but are not using Clearwater’s automatic clustering and configuration sharing functionality.

If you’re scaling up your deployment, follow the following process.

  1. Spin up new nodes, following the standard install process.
  2. On Sprout and Ralf nodes, update /etc/clearwater/cluster_settings to contain both a list of the old nodes (servers=...) and a (longer) list of the new nodes (new_servers=...) and then run service <process> reload to re-read this file. Do the same on Memento nodes, but use /etc/clearwater/memento_cluster_settings as the file.
  3. On new Memento, Homestead and Homer nodes, follow the instructions on the Cassandra website to join the new nodes to the existing cluster.
  4. On Sprout and Ralf nodes, update /etc/chronos/chronos_cluster.conf to contain a list of all the nodes (see here for details of how to do this) and then run service chronos reload to re-read this file.
  5. On Sprout, Memento and Ralf nodes, run service astaire reload to start resynchronization.
  6. On Sprout and Ralf nodes, run service chronos resync to start resynchronization of Chronos timers.
  7. Update DNS to contain the new nodes.
  8. On Sprout, Memento and Ralf nodes, wait until Astaire has resynchronized, either by running service astaire wait-sync or by polling over SNMP.
  9. On Sprout and Ralf nodes, wait until Chronos has resynchronized, either by running service chronos wait-sync or by polling over SNMP.
  10. On all nodes, update /etc/clearwater/cluster_settings and /etc/clearwater/memento_cluster_settings to just contain the new list of nodes (servers=...) and then run service <process> reload to re-read this file.

If you’re scaling down your deployment, follow the following process.

  1. Update DNS to contain the nodes that will remain after the scale-down.
  2. On Sprout and Ralf nodes, update /etc/clearwater/cluster_settings to contain both a list of the old nodes (servers=...) and a (shorter) list of the new nodes (new_servers=...) and then run service <process> reload to re-read this file. Do the same on Memento nodes, but use /etc/clearwater/memento_clus ter_settings as the file.
  3. On leaving Memento, Homestead and Homer nodes, follow the instructions on the Cassandra website to remove the leaving nodes from the cluster.
  4. On Sprout and Ralf nodes, update /etc/chronos/chronos_cluster.conf to mark the nodes that are being scaled down as leaving (see here for details of how to do this) and then run service chronos reload to re-read this file.
  5. On Sprout, Memento and Ralf nodes, run service astaire reload to start resynchronization.
  6. On the Sprout and Ralf nodes that are staying in the Chronos cluster, run service chronos resync to start resynchronization of Chronos timers.
  7. On Sprout, Memento and Ralf nodes, wait until Astaire has resynchronized, either by running service astaire wait-sync or by polling over SNMP.
  8. On Sprout and Ralf nodes, wait until Chronos has resynchronized, either by running service chronos wait-sync or by polling over SNMP.
  9. On Sprout, Memento and Ralf nodes, update /etc/clearwater/cluster_settings and /etc/clearwater/memento_cluster_settings to just contain the new list of nodes (servers=...) and then run service <process> reload to re-read this file.
  10. On the Sprout and Ralf nodes that are staying in the cluster, update /etc/chronos/chronos_cluster.conf so that it only contains entries for the staying nodes in the cluster and then run service chronos reload to re-read this file.
  11. On each node that is about to be turned down:
    • Run monit unmonitor -g <node-type>. For example for a sprout node: monit unmonitor -g sprout. On a homestead node also run monit unmonitor -g homestead-prov.
    • Start the main process quiescing.
      • Sprout - sudo service sprout quiesce
      • Bono - sudo service bono quiesce
      • Homestead - sudo service homestead stop
      • Homer - sudo service homer stop
      • Ralf -sudo service ralf stop
      • Ellis - sudo service ellis stop
      • Memento - sudo service memento stop
  12. Turn down each of these nodes once the process has terminated.

Privacy

The Clearwater privacy feature splits into two sections, configuration and application. For configuration, we’ve been working from a combination of 3GPP TS 24.623 and ETSI TS 129.364, which define the XDM documents that store the MMTel services configuration (currently only OIR, originating-identity-presentation-restriction, is supported). For application, we’ve used 3GPP TS 24.607` <http://www.quintillion.co.jp/3GPP/Specs/24607-910.pdf>`__ which defines the roles of the originating UE, originating AS, terminating AS and terminating UE. This document is supplemented by RFC3323 and RFC3325.

For configuration, there’s also a communication with Homestead to determine whether MMTel services should be applied for a call. This uses a proprietary (JSON) interface based on the inbound filter criteria definitions.

Configuration

User-specified configuration is held and validated by the XDM. When a subscriber is provisioned, Ellis injects a default document into the XDM for that user, this document turns off OIR by default. After this, if the user wishes to change their call services configuration, they use the XCAP interface (Ut) to change their document.

If the user has turned on Privacy by default, then all calls originating from the user will have identifying headers (e.g. User-Agent, Organization, Server...) stripped out from the originating request and will have the From: header re-written as "Anonymous" <sip:anonymous@anonymous.invalid\>;tag=xxxxxxx, preventing the callee from working out who the caller is. With the current implementation, there are some headers that are left in the message that could be used to determine some information about the caller, see below.

Application

Originating UE

The originating UE is responsible for adding any per-call Privacy headers (overriding the stored privacy configuration) if required and hiding some identifying information (e.g. From: header).

Application Server

When receiving a dialog-creating request from an endpoint, Sprout will ask Homestead which application servers should be used and will parse the returned iFC configuration. If the configuration indicates that MMTEL services should be applied, Sprout will continue to act as an originating and terminating application server on the call as it passes through.

Once Homestead has indicated that MMTel services should be applied for the call, Sprout will query the XDM to determine the user’s specific configuration.

Originating Application Server

When acting as an originating AS, the above 3GPP spec states that we should do the following:

For an originating user that subscribes to the OIR service in “temporary mode” with default “restricted”, if the request does not include a Privacy header field, or the request includes a Privacy header field that is not set to “none”, the AS shall insert a Privacy header field set to “id” or “header” based on the subscription option. Additionally based on operator policy, the AS shall either modify the From header field to remove the identification information, or add a Privacy header field set to “user”.

For an originating user that subscribes to the OIR service in “temporary mode” with default “not restricted”, if the request includes a Privacy header field is set to “id” or “header”, based on operator policy, the AS shall either, may modify the From header field to remove the identification information or add a Privacy header field set to “user”.

Our implementation of Sprout satisfies the above by injecting the applicable Privacy headers (i.e. it never actually applies privacy at this point, only indicates that privacy should be applied later).

Terminating Application Server

When acting as a terminating AS, the 3GPP spec states that we should do the following:

If the request includes the Privacy header field set to “header” the AS shall:

a) anonymize the contents of all headers containing private information in accordance with IETF RFC 3323 [6] and IETF RFC 3325 [7]; and

b) add a Privacy header field with the priv-value set to “id” if not already present in the request.

If the request includes the Privacy header field set to “user” the AS shall remove or anonymize the contents of all “user configurable” headers in accordance with IETF RFC 3323 [6] and IETF RFC 3325 [7]. In the latter case, the AS may need to act as transparent back-to-back user agent as described in IETF RFC 3323 [6].

Our implementation of Sprout satisfies these requirements but does it by not fully satisfying the SHOULD clauses of the linked RFCs. In particular:

  • ‘header’ privacy
    • We do not perform “Via-Stripping” since we use the Via headers for subsequent call routing and don’t want to have to hold a via-header mapping in state.
    • Similarly we don’t perform “Record-Route stripping” or “Contact-stripping”.
  • ‘session’ privacy
    • This is not supported so if the privacy is marked as critical, we will reject the call.
  • ‘user’ privacy
    • We modify the To: header inline here (and do not cache the original value) we’re assuming that the endpoints are robust enough to continue correlating the call based on the tags
    • We do not anonymize the call-id (for the same reason as the Via-stripping above).
  • ‘id’ privacy
    • This is fully supported.
  • ‘none’ privacy
    • This is fully supported.
Terminating UE

The terminating UE may use the presence of Privacy headers to determine if privacy has been applied by OIR or by the originating endpoint (though this doesn’t seem to be very useful information...). It may also look at the P-Asserted-Identity header to determine the true identity of the caller (rather than some local ID).

Dialog-Wide Issues

There are a few issues with our stateless model and privacy that are worth drawing out here, pending a fuller discussion.

  • We only apply OIR processing and privacy obfuscation to originating requests (i.e. not on in-dialog requests or responses)
  • We cannot apply Contact/Via/RR-stripping since to do so would require us to restore the headers on not just the response but also on all callee in-dialog requests.
  • Similarly for Call-Id obfuscation.
  • Stripping Contact/Via/RR would prevent features such as callee call push from working (since they require the callee to send a REFER that identifies the caller).

Off-net Calling with BGCF and IBCF

The Clearwater IBCF (Interconnection Border Control Function) provides an interface point between a Clearwater deployment and one or more trusted SIP trunks.

The IBCF does not currently provide all of the functions of an IMS compliant IBCF - in particular it does not support topology hiding or an H.248 interface to control a Transition Gateway.

To use the IBCF function you must install and configure at least one IBCF node, set up the ENUM configuration appropriately, and set up BGCF (Breakout Gateway Control Function) routing configuration on the Sprout nodes.

Install and Configure an IBCF

Install and configure an IBCF node with the following steps.

  • Install the node as if installing a Bono node, either manually or using Chef. If using Chef, use the ibcf role, for example

    knife box create -E <name> ibcf
    
  • Edit the /etc/clearwater/user_settings file (creating it if it does not already exist) and add or update the line defining the IP addresses of SIP trunk peer nodes.

    trusted_peers="<trunk 1 IP address>,<trunk 2 IP address>,<trunk 3 IP address>, ..."
    
  • Restart the Bono daemon.

    service stop bono (allow monit to restart Bono)
    

ENUM Configuration

The number ranges that should be routed over the SIP trunk must be set up in the ENUM configuration to map to a SIP URI owned by the SIP trunk provider, and routable to that provider. For example, you would normally want to map a dialed number to a URI of the form <number>@<domain>.

This can either be achieved by defining rules for each number range you want to route over the trunk, for example

*.0.1.5 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@<trunk IP address>!" .
*.0.1.5.1 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@<trunk IP address>!" .

or by defining a default mapping to the trunk and exceptions for number ranges you want to keep on-net, for example

* IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@<trunk IP address>!" .
*.3.2.1.2.4.2 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@<local domain>!" .
*.3.2.1.2.4.2.1 IN NAPTR 1 1 "u" "E2U+sip" "!(^.*$)!sip:\\1@<local domain>!" .

You can also use ENUM to provide number portability information, for example

*.3.2.1.2.4.2 IN NAPTR 1 1 "u" "E2U+pstn:tel" "!(^.*$)!sip:\\1;npdi;rn=+242123@<local domain>!" .
*.3.2.1.2.4.2.1 IN NAPTR 1 1 "u" "E2U+pstn:tel" "!(^.*$)!sip:\\1;npdi@<local domain>!" .

Refer to the ENUM guide for more about how to configure ENUM.

BGCF Configuration

BGCF (Border Gateway Control Function) configuration is stored in the bgcf.json file in /etc/clearwater on each sprout node (both I- and S-CSCF). The file stores two types of mappings. This first maps from SIP trunk IP addresses and/or host names to IBCF host names, and the second maps from a routing number (using prefix matching) to IBCF host names. These mappings control which IBCF nodes are used to route to a particular destination. Each entry can only apply to one type of mapping.

Multiple nodes to route to can be specified. In this case, Route headers are added to the message such that it is sent to the first node and the first node sends it to the second node and so on; the message is not tried at the second node if it fails at the first node.

The file is in JSON format, for example.

{
    "routes" : [
        {   "name" : "<route 1 descriptive name>",
            "domain" : "<SIP trunk IP address or host name>",
            "route" : ["<IBCF SIP URI>"]
        },
        {   "name" : "<route 2 descriptive name>",
            "domain" : "<SIP trunk IP address or host name>",
            "route" : ["<IBCF SIP URI>", "<IBCF SIP URI>"]
        },
        {   "name" : "<route 3 descriptive name>",
            "number" : "<Routing number>",
            "route" : ["<IBCF SIP URI>", "<IBCF SIP URI>"]
        }
    ]
}

There can be only one route set for any given SIP trunk IP address or host name. If there are multiple routes with the same destination IP address or host name, only the first will be used. Likewise, there can only be one route set for any given routing number; if there are multiple routes with the same routing number only the first will be used.

A default route set can be configured by having an entry where the domain is set to "*". This will be used by the BGCF if it is trying to route based on the the domain and there’s no explicit match for the domain in the configuration.

There is no default route set if the BGCF is routing based on the routing number provided.

After making a change to this file you should run sudo /usr/share/clearwater/clearwater-config-manager/scripts/upload_bgcf_json to ensure the change is synchronized to other Sprout nodes on your system (including nodes added in the future).

Multiple Domains

A single Clearwater deployment can support subscribers in multiple domains. This document

  • gives some background context
  • describes how to configure multiple domains
  • covers some restrictions on this support.

Background

Clearwater acts as an S-CSCF for one or more home domains. It can only provide service to users within a home domain for which it is the S-CSCF. Traffic to or from users in any other home domains must go via the S-CSCF for those other domains.

Home domains are a routing concept. There is a similar (but distinct) concept of authentication realms. When a user authenticates, as well as providing their username and digest, they must also provide a realm, which describes which user database to authenticate against. Many clients default their home domain to match their authentication realm.

A single Clearwater deployment can support multiple home domains and multiple realms.

Configuration

There are three steps to configuring multiple home domains and/or multiple realms.

  • Configure the Clearwater deployment to know about all the home domains it is responsible for.
  • Add DNS records to point all home domains at the deployment.
  • Create subscribers within these multiple home domains, and optionally using multiple different realms.
Configuring the Clearwater Deployment

The /etc/clearwater/shared_config file can contain two properties that are relevant to this function.

  • home_domain - defines the “default” home domain and must always be set
  • additional_home_domains - optionally defines additional home domains

The only difference between the default home domain and the additional home domains is that the default home domain is used whenever Clearwater must guess the home domain for a TEL URI, e.g. when filling in the orig-ioi or term-ioi parameters on the P-Charging-Vector header. This is fairly minor, and generally you can consider the default home domain and additional home domains as equivalent.

Adding DNS Records

The Clearwater DNS Usage document describes how to configure DNS for a simple single-domain deployment. This includes creating NAPTR, SRV and A records that resolve the home domain (or <zone>) to the bono nodes. If you have additional home domains, you should repeat this for each additional home domain. This ensures that SIP clients can find the bono nodes without needing to be explicitly configured with a proxy.

Creating subscribers.

Once the deployment has multiple domain support enabled, you can create subscribers in any of the domains, and with any realm.

  • If you use ellis, the `create-numbers.py <https://github.com/Metaswitch/ellis/blob/dev/docs/create-numbers.md>`__ script accepts a --realm parameter to specify the realm in which the directory numbers are created. When a number is allocated in the ellis UI, ellis picks any valid number/realm combination.
  • If you bulk-provision numbers, the home domain and realm can be specified in the input CSV file.
  • If you use homestead’s provisioning API (i.e. without an external HSS), you can specify the home domain and realm as parameters.

Alternatively, you can use an external HSS, and configure the subscribers using its provisioning interface.

Restriction

Clearwater does not support connections to multiple Diameter realms for the Cx interface to the HSS or the Rf interface to the CDF - only a single Diameter realm is supported.

Multiple Network Support

Introduction

As of the Ultima release, Project Clearwater can be configured to provide signaling service (SIP/HTTP/Diameter) on a separate physical (or virtual) network to management services (SNMP/SSH/Provisioning). This may be used to protect management devices (on a firewalled network) from attack from the outside world (on the signaling network).

Technical Details

Network Independence

If traffic separation is configured, the management and signaling networks are accessed by the VM via completely separate (virtual) network interfaces.

The management and signaling networks may be completely independent.

  • There need not be any way to route between them.
  • The IP addresses, subnets and default gateways used on the networks may be the same, may overlap or may be completely distinct.
Traffic Distinction

The following traffic is carried over the management interface:

  • SSH
  • DNS (for management traffic)
  • SNMP statistics
  • SAS
  • NTP
  • Homestead provisioning
  • Homer provisioning
  • Ellis provisioning API
  • M/Monit

The following traffic is carried over the signaling interface:

  • DNS (for signaling traffic)
  • ENUM
  • SIP between the UE, Bono and Sprout
  • Cx between Homestead and the HSS
  • Rf between Ralf and the CDF
  • Ut between Homer and the UE
  • inter-tier HTTP
  • intra-tier memcached, Cassandra and Chronos flows

Both interfaces respond to ICMP pings which can be used by external systems to perform health-checking if desired.

Configuration

Pre-requisites

The (possibly virtual) machines that are running your Project Clearwater services will need to be set up with a second network interface (referred to as eth1 in the following).

You will need to determine the IP configuration for your second network, including IP address, subnet, default gateway, DNS server. DHCP is supported on this interface if required but you should ensure that the DHCP lease time is sufficiently high to ensure that the local IP address does not change regularly.

Preparing the network

Due to the varied complexities of IP networking, it would be impractical to attempt to automate configuring the Linux-level view of the various networks. Network namespaces are created and managed using the ip netns tool, which is a standard part of Ubuntu 14.04.

The following example commands (when run as root) create a network namespace, move eth1 into it, configure a static IP address and configure routing.

ip netns add signaling
ip link set eth1 netns signaling
ip netns exec signaling ifconfig lo up
ip netns exec signaling ifconfig eth1 1.2.3.4/16 up
ip netns exec signaling route add default gateway 1.2.0.1 dev eth1

Obviously you should make any appropriate changes to the above to correctly configure your chosen signaling network. These changes are not persisted across reboots on Linux and you should ensure that these are run on boot before the clearwater-infrastructure service is run. A sensible place to configure this would be in /etc/rc.local.

Finally, you should create /etc/netns/signaling/resolv.conf configuring the DNS server you’d like to use on the signaling network. The format of this file is documented at http://manpages.ubuntu.com/manpages/trusty/man5/resolv.conf.5.html but a simple example file might just contain the following.

nameserver <DNS IP address>
Project Clearwater Configuration

Now that the signaling namespace is configured and has networking set up, it’s time to apply it to Project Clearwater. To do this, change the local_ip, public_ip and public_hostname lines in /etc/clearwater/local_config to refer to the node’s identities in the signaling network and add the following lines:

signaling_namespace=<namespace name>
signaling_dns_server=<DNS IP address>
management_local_ip=<Node's local IP address in the management

If you’ve not yet installed the Project Clearwater services, do so now and the namespaces will be automatically used.

If you’ve already installed the Project Clearwater services, run sudo service clearwater-infrastructure restart and restart all the Clearwater-related services running on each machine (you can see what these are with sudo monit summary).

Diagnostic Changes

All the built-in Clearwater diagnostics will automatically take note of network namespaces, but if you are running diagnostics yourself (e.g. following instructions from the troubleshooting page) you may need to prefix your commands with ip netns exec <namespace> to run them in the signaling namespace. The following tools will need this prefix:

  • cqlsh - For viewing Cassandra databases
  • nodetool - For viewing Cassandra status
  • telnet - For inspecting Memcached stores

Geographic redundancy

This article describes

  • the architecture of a geographically-redundant system
  • the (current) limitations
  • the probable impact on a subscriber of a failure
  • how to set it up.

Architecture

The architecture of a geographically-redundant system is as follows.

Diagram

Diagram

Sprout has one memcached cluster per geographic region. Although memcached itself does not support the concept of local and remote peers, Sprout builds this on top - writing to both local and remote clusters and reading from the local but falling back to the remote. Communication between the nodes should be secure - for example, if it is going over the public internet rather than a private connection between datacenters, it should be encrypted and authenticated with IPsec. Each Sprout uses Homers and Homesteads in the same region only. Sprout also has one Chronos cluster per geographic region; these clusters do not communicate.

Separate instances of Bono in each geographic region front the Sprouts in that region. Clearwater uses a geo-routing DNS service such as Amazon’s Route 53 to achieve this. A geo-routing DNS service responds to DNS queries based on latency, so if you’re nearer to geographic region B’s instances, you’ll be served by them.

Homestead and Homer are each a single cluster split over the geographic regions. Since they are backed by Cassandra (which is aware of local and remote peers), they can be smarter about spatial locality. As with Sprout nodes, communication between the nodes should be secure.

Ellis is not redundant, whether deployed in a single geographic region or more. It is deployed in one of the geographic regions and a failure of that region would deny all provisioning function.

Ralf does not support geographic redundancy. Each geographic region has its own Ralf cluster; Sprout and Bono should only communicate with their local Ralfs.

While it appears as a single node in our system, Route 53 DNS is actually a geographically-redundant service provided by Amazon. Route 53’s DNS interface has had 100% uptime since it was first turned up in 2010. (Its configuration interface has not, but that is less important.)

The architecture above is for 2 geographic regions - we do not currently support more regions.

Note that there are other servers involved in a deployment that are not described above. Specifically,

  • communication back to the repository server is via HTTP. There is a single repository server. The repository server is not required in normal operation, only for upgrades.

Limitations

  • The local IP addresses of all nodes in a deployment most be reachable from all other nodes - there must not be a NAT between the two GR sites. (This currently precludes having the GR sites in different EC2 regions.)

Impact

This section considers the probable impact on a subscriber of a total outage of a region in a 2-region geographically-redundant deployment. It assumes that the deployments in both regions have sufficient capacity to cope with all subscribers (or the deployments scale elastically) - if this is not true, we will hit overload scenarios, which are much more complicated.

The subscriber interacts with Clearwater through 3 interfaces, and these each have a different user experience.

  • SIP to Bono for calls
  • HTTP to Homer for direct call service configuration (not currently exposed)
  • HTTP to Ellis for web-UI-based provisioning

For the purposes of the following descriptions, we label the two regions A and B, and the deployment in region A has failed.

SIP to Bono

If the subscriber was connected to a Bono node in region A, their TCP connection fails. They then attempt to re-register. If it has been more than the DNS TTL (proposed to be 30s) since they last connected, DNS will point to region B, they will re-register and their service will recover (both for incoming and outgoing calls). If it has been less than the DNS TTL since they last connected, they will probably wait 5 minutes before they try to re-register (using the correct DNS entry this time).

If the subscriber was connected to a Bono node in region B, their TCP connection does not fail, they do not re-register and their service is unaffected.

Overall, the subscriber’s expected incoming or outgoing call service outage would be as follows.

  50% chance of being on a Bono node in region A *
  30s/300s chance of having a stale DNS entry *
  300s re-registration time
= 5% chance of a 300s outage
= 15s average outage

Realistically, if 50% of subscribers all re-registered almost simultaneously (due to their TCP connection dropping and their DNS being timed out), it’s unlikely that Bono would be able to keep up.

Also, depending on the failure mode of the nodes in region A, it’s possible that the TCP connection failure would be silent and the clients would not notice until they next re-REGISTERed. In this case, all clients connected to Bonos in region A would take an average of 150s to notice the failure. This equates to a 50% chance of a 150s outage, or an average outage of 75s.

HTTP to Homer

(This function is not currently exposed.)

If the subscriber was using a Homer node in region A, their requests would fail until their DNS timed out. If the subscriber was using a Homer node in region B, they would see no failures.

Given the proposed DNS TTL of 30s, 50% of subscribers (those in region A) would see an average of 15s of failures. On average, a subscriber would see 7.5s of failures.

HTTP to Ellis

Ellis is not geographically redundant. If Ellis was deployed in region A, all service would fail until region A was recovered. If Ellis was deployed in region B, there would be no outage.

Setup

The process for setting up a geographically-redundant deployment is as follows.

  1. Create independent deployments for each of the regions, with separate DNS entries. See the manual install instructions for the required GR settings.
  2. Set up DNS (probably using SRV records) so that:
    • Bono nodes prefer the Sprout node local to them, but will fail over to the one in the other site.
    • Sprout nodes only use the Homer, Homestead and Ralf nodes in their local site.
    • The Ellis node ony uses the Homer and Homesteads in its local site.
  3. Configure Route 53 to forward requests for Bono according to latency. To do this, for each region, create one record set, as follows.
    • Name: <shared (non-geographically-redundant) DNS name>
    • Type: “A - IPv4 address” or “AAAA - IPv6 address”
    • Alias: No
    • TTL (Seconds): 30 (or as low as you can go - in a failure scenario, we need to go back to DNS ASAP)
    • Value: <list of IPv4 or IPv6 addresses for this region>
    • Routing Policy: Latency
    • Region: <AWS region matching geographic region>
    • Set ID: <make up a unique name, e.g. gr-bono-us-east-1>

You can also use Chef to try GR function, by setting "gr" => true in the environment, as described in the automated install docs.

IPv6

IPv6 is the new version of IP. The most obvious difference from IPv4 is the addresses.

  • IPv4 addresses are 4 bytes, represented in decimal, separated by dots, e.g. 166.78.3.224
  • IPv6 addresses are 16 bytes, represented in hexadecimal, with each pair of bytes separated by colons, e.g 2001:0db8:85a3:0000:0000:8a2e:0370:7334.

Clearwater can operate over either IPv4 or IPv6, but not both simultaneously (known as “dual-stack”).

This document describes

  • how to configure Clearwater to use IPv6
  • function currently missing on IPv6.

Configuration

As discussed in the Clearwater installation instructions, there are several ways to install Clearwater, and which way you choose affects how you must configure IPv6.

Manual Install

The process to configure Clearwater for IPv6 is very similar to IPv4. The key difference is to use IPv6 addresses in the /etc/clearwater/local_config file rather than IPv4 addresses when following the manual install instructions.

Note also that you must configure your DNS server to return IPv6 addresses (AAAA records) rather than (or as well as) IPv4 addresses (A records). For more information on this, see the Clearwater DNS usage documentation.

Automated Install

Configuring Clearwater for IPv6 via the automated install process is not yet supported (and Amazon EC2 does not yet support IPv6).

All-in-one Images

Clearwater all-in-one images support IPv6.

Since all-in-one images get their IP configuration via DHCP, the DHCP server must be capable of returning IPv6 addresses. Not all virtualization platforms support this, but we have successfully tested on OpenStack.

Once the all-in-one image determines it has an IPv6 address, it automatically configures itself to use it. Note that since Clearwater does not support dual-stack, all-in-one images default to using their IPv4 address in preference to their IPv6 address if they have both. To force IPv6, create an empty /etc/clearwater/force_ipv6 file and reboot.

Missing Function

The following function is not yet supported on IPv6.

  • Automated install
  • SIP/WebSockets (WebRTC)

SIP Interface Specifications

This document is a review of all the SIP related RFCs referenced by IMS (in particular TS 24.229 Rel 10), particularly how they relate to Clearwater’s SIP support. The bulk of the document goes through the RFCs grouped in the following 4 categories.

  • RFCs already supported by Clearwater (at least sufficient for IMS compliance for Clearwater as a combined P/I/S-CSCF/BGCF/IBCF).
  • RFCs partially supported by Clearwater.
  • RFCs that are relevant to Clearwater but not currently supported.
  • RFCs relevant to media processing, which Clearwater doesn’t currently need to handle as it is not involved in the media path.
  • RFCs that are not relevant to Clearwater.

Supported

The following RFCs are already supported by Clearwater. Note that a number of these are supported simply because they require no active function from SIP proxies other than forwarding headers unchanged.

Basic SIP (RFC 3261)
  • This covers the basic SIP protocol including
    • the 6 basic methods (ACK, BYE, CANCEL, INVITE, OPTIONS and REGISTER)
    • the 44 basic headers
    • the transaction and dialog state machines
    • the function of UACs, UASs, stateful and stateless proxies, and registrars
    • basic and digest security
    • transport over UDP and TCP.
  • Clearwater supports this RFC.
SUBSCRIBE/NOTIFY/PUBLISH messages (RFC 3265 and RFC 3903)
  • These two RFCs define a framework for generating and distributing event notifications in a SIP network. They cover the SUBSCRIBE and NOTIFY methods and Allow-Events and Event headers (RFC 3265) and the PUBLISH method and SIP-Etag and SIP-If-Match headers (RFC 3903).
  • Routing of these messages is largely transparent to proxies, so is largely already supported by proxy function in Clearwater.
UPDATE method (RFC 3311)
  • Defines UPDATE method used to update session parameters (so can be used as an alternative to reINVITE as a keepalive, for media renegotiation or to signal successful resource reservation). Often used by ICE enabled endpoints to change media path once ICE probing phase has completed.
  • Supported transparently by Clearwater.
Privacy header (RFC 3323)
  • Required for privacy service.
  • Clearwater supports this header.
P-Asserted-Identity and P-Preferred-Identity headers (RFC 3325)
  • Defines headers that allow a UE to select which identity it wants to use on a call-by-call basis. Ties in with authentication and the P-Associated-URIs header defined in RFC 3455 (see below).
  • Clearwater supports these headers.
Path header (RFC 3327)
  • Defines handling of registrations from devices which are not adjacent (in SIP routing terms) to the registrar.
  • Supported by Clearwater.
message/sipfrag MIME type (RFC 3420)
  • RFC 3420 defines an encoding for transporting MIME or S/MIME encoded SIP message fragments in the body of a SIP message. Use cases for this include status NOTIFYs sent during REFER processing which signal the status of the referral to the referee.
  • Support for this is mandatory according to TS 24.229, but it does not describe any particular use cases.
  • Transparent to proxy components, so supported by Clearwater.
P-Access-Network-Info, P-Associated-URI, P-Called-Party-ID, P-Charging-Function-Address, P-Charging-Vector and P-Visited-Network-ID headers (RFC 3455)
  • Defines various private headers specifically for IMS.
  • P-Access-Network-Info is added to incoming messages by the UE or the P-CSCF to provide information about the access network and possibly the UEs location within the network (for example cell). IMS specifications talk about various uses for this information, including routing emergency calls, an alternative to phone-context for interpreting dialled digits, determining the security scheme. This header is supported by Clearwater.
  • P-Associated-URI is returned by registrar to include the list of alternative user identities defined for the IMS subscription. This header is supported by Clearwater.
  • P-Called-Party-ID is added to a SIP request by an IMS network to ensure the called UE knows which of the user identities within the IMS subscription was actually called. This header is supported by Clearwater.
  • P-Charging-Function-Address and P-Charging-Vector headers are related to billing, and are supported by Clearwater.
  • P-Visited-Network-ID is used when roaming to identify the network the user has roamed to. The specs say it is a free-form string, and should be used to check that there is a roaming agreement in place. This header is supported by Clearwater.
Symmetric SIP response routing (RFC 3581)
  • Defines how SIP responses should be routed to ensure they traverse NAT pinholes.
  • Mandatory for IMS proxy components.
  • Supported by Clearwater nodes.
Service-Route header (RFC 3608)
  • In IMS an S-CSCF includes Service-Route header on REGISTER responses, so UE can use to construct Route headers to ensure subsequent requests get to the appropriate S-CSCF.
  • Optional according to TS 24.229(http://www.3gpp.org/ftp/Specs/html-info/24229.htm).
  • This header is supported by Clearwater.
ENUM (RFC 3761 and RFC 4769)
  • Defines a mechanism for using DNS look-ups to translate telephone numbers into SIP URIs. (RFC 3761 defines ENUM, RFC 4769 contains IANA registrations for ENUM data.)
  • IMS specifies that ENUM is one mechanisms an S-CSCF can use to translate non-routable URIs (either Tel URIs or SIP URIs encoding a phone number) into a globally routable URI (one where the domain name/IP address portion of the URI can safely be used to route the message towards its destination).
  • Clearwater supports this through function in Sprout.
Event package for MWI (RFC 3842)
  • Defines an event package for notifying UAs of message waiting.
  • Required for UEs and ASs implementing MWI services, transparent to proxy components.
  • Transparent to proxies, so already supported by Clearwater.
Presence event package (RFC 3856)
  • Defines an event package for monitoring user presence.
  • Required for UEs supporting presence and presence servers.
  • Transparent to proxies, so supported by Clearwater.
Watcher event template package (RFC 3857)
  • Defines an event package that can be used to subscribe to the list of watchers for another event type.
  • IMS mandates support for this package for subscribing to the list of watchers of a particular presentity, so support is only applicable for SIP UEs supporting presence and presence application servers.
  • Transparent to proxies, so supported by Clearwater.
Session-expiry (RFC 4028)
  • Covers periodic reINVITE or UPDATE messages used to allow UEs or call stateful proxies to detect when sessions have ended. Includes Min-SE and Session-Expires headers, which are used to negotiate the frequency of keepalives.
  • Clearwater supports session-expiry as a proxy per section 8 of the RFC. It does not have a minimum acceptable session interval, so does not make use of the Min-SE header, though does honor it.
  • Clearwater requests a session interval of 10 minutes (or as close to 10 minutes as possible given pre-existing headers) in order to properly perform session-based billing.
Early session disposition type (RFC 3959)
  • Defines a new value for the Content-Disposition header to indicate when SDP negotiation applies to early media.
  • Optional according to TS 24.229.
  • Transparent to proxies so supported by Clearwater.
Dialog events (RFC 4235)
  • Event package for dialog state.
  • In TS 24.229 only seems to be relevant to E-CSCF and LRF functions.
  • Transparent to proxies, so supported by Clearwater.
MRFC control (RFC 4240, RFC 5552, RFC 6230, RFC 6231, RFC 6505)
  • These RFCs define three different ways of controlling the function of an MRFC from an AS. RFC 4240 is a simple “play an announcement” service, RFC 5552 uses VoiceXML and RFC 6230/RFC 6231/RFC 6505 use SIP/SDP to establish a two-way control channel between the AS and MRFC.
  • IMS allows any of the three mechanisms to be used, or combinations depending on circumstances.
  • All three mechanisms are transparent to proxy components, so supported by Clearwater.
History-Info (draft-ietf-sipcore-rfc4244bis - not RFC 4244, which is inaccurate wrt real implementations)
  • Primarily aimed at ASs - need to manipulate History-Info headers when diverting calls. The MMTEL AS built into Sprout already supports this for CDIV.
  • Also, need to proxy History-Info headers transparently. Clearwater supports this.
  • However, s9.1 says if the incoming H-I is missing or wrong, the intermediary must add an entry on behalf of the previous entity. Clearwater does not currently do this so if other parts of the network are not compliant, Clearwater will not fill in for them (and should).
OMS Push-to-Talk over Cellular service (RFC 4354, RFC 4964 and RFC 5318)
  • Covers poc-settings event package (RFC 4354), P-Answer-State header (RFC 4964) and P-Refused-URI-List header (RFC 5318)
  • Only required for OMA push-to-talk over cellular service.
  • Optional according to TS 24.229.
  • Transparent to proxies, so supported by Clearwater.
Conference event package (RFC 4575)
  • Defines an event package for various events associated with SIP conferences.
  • Mandatory for UEs and IMS application servers providing conference services.
  • Transparent to proxies, so supported by Clearwater.
Event notification resource lists (RFC 4662)
  • Defines an extension to the SUBSCRIBE/NOTIFY event mechanisms that allow an application to subscribe for events from multiple resources with a single request, by referencing a resource list.
  • In IMS, this is only required for presence events, and is only mandatory for UEs supporting presence or presence servers, otherwise it is optional.
  • Should be transparent to proxies, so supported by Clearwater.
Multiple-Recipient MESSAGE requests (RFC 5365)
  • Defines a mechanism for MESSAGE requests to be sent to multiple recipients by specifying the list of recipients in the body of the message. A message list server then fans the MESSAGE out to the multiple recipients.
  • The function is mandatory in an IMS message list server, but not required anywhere else in the network.
  • Transparent to proxies, so supported by Clearwater.
Creating multi-party conferences (RFC 5366)
  • Allows a SIP UA to establish a multi-party conference by specifying a resource list in the body of the message used to set up the conference.
  • Mandatory for IMS conference app servers.
  • Transparent to proxies, so supported by Clearwater.
Referring to multiple resources (RFC 5368)
  • Allows a SIP UA to send a REFER message containing a resource list URI. One use case for this could be to send a single REFER to a conference server to get it to invite a group of people to a conference.
  • Optional according to TS 24.229, and only applicable to UEs and any application servers that can receive REFER requests.
  • Transparent to proxies, so supported by Clearwater.
P-Served-User header (RFC 5502)
  • Only applicable on the ISC interface - used to clear up some ambiguities about exactly which user the AS should be providing service for.
  • Optional according to TS 24.229.
  • This header is supported by Clearwater and included on requests sent to application servers.
Message body handling (RFC 5621)
  • Defines how multiple message bodies can be encoded in a SIP message.
  • Relevant in handling of third-party REGISTERs on ISC where XML encoded data from iFC may be passed to AS.
SIP outbound support (RFC 5626)
  • Defines mechanisms for clients behind NATs to connect to a SIP network so SIP requests can be routed to the client through the NAT.
  • Mandatory according to TS 24.229.
  • Supported by Clearwater for NAT traversal, except for the Flow-Timer header (which tells the client how often to send keepalives).
Fixes to Record-Route processing (RFC 5658)
  • Fixes some interoperability issues in basic SIP handling of Record-Route headers, particularly in proxies which have multiple IP addresses.
  • Clearwater used RFC5658 double-record routing in Bono nodes when transitioning between the trusted and untrusted zones on different port numbers.
Fixes for IPv6 addresses in URIs (RFC 5954)
  • Fixes a minor problem in the ABNF definition in RFC 3261 relating to IPv6 addresses, and clarifies URI comparison rules when comparing IPv6 addresses (in what looks like an obvious way).
  • Mandatory according to TS 24.229.
  • Supported by Clearwater.
Q.950 codes in Reason header (RFC 6432)
  • Defines how Q.950 codes can be encoded in Reason header.
  • Mandatory for MGCFs, optional for proxy components in IMS.
  • Transparent to proxies anyway, so supported by Clearwater.
Geolocation (RFC 4483 and RFC 6442)
  • Framework for passing geo-location information within SIP messages.
  • According to TS 24.229 only required on an E-CSCF.
  • Supported by Clearwater as headers will be passed transparently.
Proxy Feature Capabilities (referenced as draft-ietf-sipcore-proxy-feature-12, but now RFC 6809)
  • Defines a mechanism to allow SIP intermediaries (such as registrars, proxies or B2BUAs) to signal feature capabilities when it would not be appropriate to use the feature tags mechanism in the Contact header as per RFC 3841.
  • Mandatory on P-CSCF according to TS 24.229, but not clear what this actually means as the IMS specs don’t seem to actually define any features to be signalled in these headers.
  • Clearwater will forward these headers if specified by other nodes in the signaling path, so this is supported.
Alert info URNs (draft-ietf-salud-alert-info-urns-06)
  • Defines family of URNs to be used in Alert-Info header to provide better control over how user is alerted on a UE.
  • Optional according to TS 24.229.
  • Transparent to proxies, so supported by Clearwater.
AKA Authentication (RFC 3310)
  • This RFC defines how AKA authentication parameters map into the authentication and authorization headers used for digest authentication.
  • IMS allows AKA authentication as an alternative to SIP digest, although it is not mandatory.
  • Supported in Clearwater since sprint 39 “WALL-E”.
SIP Instant Messaging (RFC 3428)
  • Defines the use of the SIP MESSAGE method to implement an instant messaging service.
  • Mandatory in proxy components according to TS 24.229.
  • Supported in Clearwater.
Registration Events (RFC 3680)
  • Defines an event package supported by SIP registrars to notify other devices of registrations.
  • Must be supported by IMS core for the ISC interface, and also needed by some UEs.
  • Supported in Clearwater since sprint 40 “Yojimbo”
PRACK support (RFC 3262)
  • Defines a mechanism for ensuring the reliability of provisional responses when using an unreliable transport. Covers the PRACK method and the Rseq and Rack headers.
  • Optional according to TS 24.229.
  • Supported in Clearwater.
Fixes to issues with SIP non-INVITE transactions (RFC 4320)
  • Defines changes to RFC 3261 procedures for handling non-INVITE transactions to avoid some issues - in particular the potential for an O(N^2) storm of 408 responses if a transaction times out.
  • Mandatory for all SIP nodes according to TS 24.229.
  • Supported in Clearwater.
User agent capabilities and caller preferences (RFC 3840 and RFC 3841)
  • User agent capabilities encoded as feature tags in Contact headers during registration (RFC 3840) and Accept-Contact, Reject-Contact and Request-Disposition headers encode filtering rules to decide which targets subsequent request should be routed/forked to (RFC 3841).
  • Used for routing of requests to targets with the appropriate features/capabilities in IMS. Mandatory for proxy components.
  • Clearwater’s Sprout registrar has full support for storing, matching, filtering and prioritizing bindings based on advertised capabilities and requirements as per these specifications.

Relevant to Clearwater and partially supported

GRUUs (RFC 5627, plus RFC 4122, draft-montemurro-gsma-imei-urn-11 and draft-atarius-device-id-meid-urn-01)
  • RFC 5627 defines mechanisms to assign and propagate a Globally-Routable User Agent URI for each client that registers with the SIP network. A GRUU identifies a specific user agent rather than a SIP user, and is routable from anywhere in the internet. GRUUs are intended to be used in scenarios like call transfer where URIs are required for individual user agents to ensure the service operates correctly. Standard Contact headers would seem to do the job in many cases but don’t satisfy the globally routable requirement in all cases, for example where the client is behind certain types of NAT.
  • Assignment and use of GRUUs is mandatory for S-CSCF and UEs in an IMS network. RFC 4122, draft-montemurro-gsma-imei-urn-11 and draft-atarius-device-id-meid-urn-01 are referenced from the sections of TS 24.229 that discuss exactly how GRUUs should be constructed.
  • Clearwater supports assigning and routing on public GRUUs as of the Ninja Gaiden release. It does not support temporary GRUUs.
  • The Blink softphone client can be used to test this, as it provides the +sip.instance parameter needed for GRUU support.
Registration event package for GRUUs (RFC 5628)
  • Defines an extension to the RFC 3680 registration event package to include GRUUs.
  • Mandatory on S-CSCF and UEs in an IMS network.
  • Clearwater includes public GRUUs in these event notifications as of the Ninja Gaiden release. It does not support temporary GRUUs.
Alternative URIs (RFC 2806, RFC 2368, RFC 3859, RFC 3860, RFC 3966)
  • Various RFCs defining alternatives to SIP URI - Tel URI (RFC 2806 and RFC 3966), mailto URI (RFC 2368), pres URI (RFC 3859), and im URI (RFC 3860).
  • IMS allows use of Tel URIs
    • as a public user identity associated with a subscription (although a subscription must have at least on public user identity which is a SIP URI)
    • as the target URI for a call.
  • Other URIs can be specified as Request URI for a SIP message.
  • Clearwater supports SIP and Tel URIs but not mailto, pres or im URIs.
Number portability parameters in Tel URI (RFC 4694)
  • Defines additional parameters in Tel URI for signaling related to number portability.
  • Used in IMS in both Tel and SIP URIs for carrier subscription scenarios, and in IMS core for other number portability related scenarios.
  • Optional according to TS 24.229.
  • The rn and npdi parameters are supported by Clearwater; Clearwater doesn’t currently support the other parameters defined in this RFC.

Relevant to Clearwater but not currently supported

These are the RFCs which are relevant to Clearwater and not yet supported.

Dialstring URI parameter (RFC 4967)
  • Defines a user=dialstring parameter used in SIP URIs to indicate that the user portion of the URI is a dial string (as opposed to a number that definitely identifies a phone as in the user=phone case).
  • IMS allows this encoding from UEs initiating calls, but doesn’t specify any particular processing within the core of the network. The intention is that this can be handled by an application server, or captured by filter criteria.
  • Clearwater doesn’t currently support this.
P-Early-Media header (RFC 5009)
  • Used to authorize and control early media.
  • If P-CSCF is not gating media then required function is as simple as
    • adding P-Early-Media header with supported value on requests from clients (or modifying header from clients if already in message)
    • passing the header through transparently on responses.
  • If P-CSCF is gating media then function is more complex as P-CSCF has to operate on values in P-Early-Media headers sent to/from UEs.
  • Mandatory in a P-CSCF according to TS 24.229.
  • Clearwater doesn’t currently support this.
P-Media-Authorization header (RFC 3313)
  • According to TS 24.229, only required if P-CSCF supporting IMS AKA authentication with IPsec ESP encryption, or SIP digest authentication with TLS encryption.
  • Not supported, as this is P-CSCF only (and Bono doesn’t support AKA).
Signalling Compression aka SigComp (RFC 3320, RFC 3485, RFC 3486, RFC 4077, RFC 4896, RFC 5049, RFC 5112)
  • RFC 3320 defines basic SigComp (which is not SIP-specific), RFC 3485 defines a static dictionary for use in SigComp compression of SIP and SDP, RFC 3486 defines how to use SigComp with SIP, RFC 4077 defines a mechanism for negative acknowledgements to signal errors in synchronization between the compressor and decompressor, RFC 4896 contains corrections and clarifications to RFC 3320, RFC 5049 contains details and clarifications for SigComp compression of SIP, and RFC 5112 defines a static dictionary for use in SigComp compression of SIP presence bodies.
  • TS 24.229 says that SigComp is mandatory between UEs and P-CSCF if access network is one of the 3GPP or 802.11 types (specifically it says this is mandatory if the UE sends messages with the P-Access-Network-Info set to one of these values - ie. the UE knows that is the type of access network it is being used on). Compression must use the dictionaries defined in both RFC 3485 and RFC 5112.
  • Clearwater does not currently support SigComp (although it would be relatively straightforward to implement it).
Reason header (RFC 3326)
  • Already supported in Clearwater for responses, would need to add support for passing on CANCEL requests (but pretty easy).
  • Optional according to TS 24.229.
Security-Client, Security-Server and Security-Verify headers (RFC 3329)
  • Defines headers that can be used to negotiate authentication mechanisms.
  • Only required if P-CSCF supporting IMS AKA authentication with IPsec ESP encryption, or SIP digest authentication with TLS encryption.
  • Not supported, as these headers are always caught by the P-CSCF (and Bono doesn’t support AKA).
SMS over IP (RFC 3862 and RFC 5438)
  • Covers CPIM message format used between UEs and AS implementing SMS-GW function (RFC 3862) and Message Disposition Notifications sent from the SMS-GW to the UE (RFC 5438).
  • Transported as body in MESSAGE method across IMS core, so transparent to proxy components. Therefore should be supported by Clearwater once we have tested support for MESSAGE methods.
SIP over SCTP (RFC 4168)
  • Defines how SIP can be transported using SCTP (instead of UDP or TCP).
  • IMS allows SCTP transport within the core of the network, but not between P-CSCF and UEs.
  • Clearwater does not support SCTP transport (nor does PJSIP). Strictly speaking this is relevant to Clearwater, but it’s not clear whether anyone would use it.
Signalling pre-emption events (RFC 4411)
  • Defines use of the SIP Reason header in BYE messages to signal when a session is being terminated because of a network pre-emption event, for example, if the resources originally acquired for the call were need for a higher priority session.
  • Optional according to TS 24.229.
  • Not currently supported by Clearwater.
Resource Priority (RFC 4412)
  • Covers Resource-Priority and Accept-Resource-Priority headers. Intended to allow UEs to signal high priority calls that get preferential treatment by the network (for example, emergency service use).
  • Not currently supported by Clearwater.
  • Optional according to TS 24.229.
P-Profile-Key header (RFC 5002)
  • Used solely between I-CSCF and S-CSCF to signal the public service identity key to be used when a requested public service identity matches a wildcard entry in the HSS. Purely an optimization to avoid having to do wildcard matching twice for a single request.
  • Optional according to TS 24.229
  • Not currently supported by Clearwater.
Service URNs (RFC 5031)
  • Defines a URN namespace to identify services. IMS allows UEs to use such service URNs as target URIs when establishing a call. In particular, it is mandatory for UEs to signal emergency calls using a service URN of the form urn:service:sos possibly with a subtype, and the P-CSCF must be able to handle these requests appropriately, routing to an E-CSCF.
  • Clearwater currently has no support for service URNs.
Rejecting anonymous requests (RFC 5079)
  • Defines a status code used to reject anonymous requests if required by local policy/configuration.
  • Optional according to TS 24.229.
  • Not currently supported by Clearwater.
Subscribing to events on multiple resources (RFC 5367)
  • Allows a SIP node to subscribe to events on multiple resources with a single SUBSCRIBE message by specifying a resource list in the body of the message.
  • Optional for any IMS node that can be the target of a SUBSCRIBE request.
  • Not currently supported by Clearwater.
Max-Breadth header (RFC 5393)
  • Intended to plug an amplification vulnerability in SIP forking. Any forking proxy must limit the breadth of forking to breadth specified in this header.
  • According to TS 24.229 is mandatory on any node that supports forking.
  • Not currently supported by Clearwater.
Media feature tag for MIME application subtypes (RFC 5688)
  • Defines a media feature tag to be used with the mechanisms in RFC 3840 and RFC 3841 to direct requests to UAs that support a specific MIME application subtype media stream.
  • According to TS 24.229 support for this feature tag is mandatory on S-CSCFs.
  • Not currently supported, but support may drop out of implementing RFC 3840/RFC 3841 (depending on the what match criteria the tag requires).
XCAPdiff event package (RFC 5875)
  • Defines an event package allowing applications to get notifications of changes to XCAP documents.
  • Used by IMS at Ut reference point to allow UEs to control service settings. According to TS 24.229, mandatory for XDMS server, but optional for UEs.
  • Not supported by Homer.
Correct transaction handling of 2xx responses to INVITE (RFC 6026)
  • A fix to basic SIP transaction model to avoid INVITE retransmissions being incorrectly identified as a new transaction and to plug a security hole around the forwarding of uncorrelated responses through proxies. The change basically adds a new state to the transaction state machine when previously the transaction would have been considered terminated and therefore deleted.
  • This fix is mandatory according to TS 24.229.
  • Not supported by Clearwater, and will probably require PJSIP changes.
P-Asserted-Service and P-Preferred-Service headers (RFC 6050)
  • Defines a mechanism to allow a UE to signal the service it would like (although this is not binding on the network) and other components to signal between themselves the service being provided. The UE may optionally include a P-Preferred-Service header on any request specifying the service it wishes to use. The S-CSCF is responsible for checking that the service requested in P-Preferred-Service is (a) supported for the subscriber and (b) consistent with the actual request. If the request is allowed, the S-CSCF replaces the P-Preferred-Service with a P-Asserted-Service header. If either check fails, the S-CSCF may reject the request or it may allow it but ignore the P-Preferred-Service header. If the UE does not specify a P-Preferred-Service header (or the specified one was not valid) the S-CSCF should work out the requested service by analysing the request parameters, and add a P-Asserted-Service header encoding the result.
  • In IMS networks, header should contain an IMS Communication Service Identifier (ICSI) - defined values are documented at http://www.3gpp.com/Uniform-Resource-Name-URN-list.html - examples include MMTEL (3gpp-service.ims.icsi.mmtel), IPTV (3gpp-service.ims.icsi.iptv), Remote Access (3gpp-service.ims.icsi.ra), and Converged IP Messaging (3gpp-service.ims.icsi.oma.cpm.* depending on exact service being requested/provided).
  • Mandatory function according to TS 24.229.
  • Not currently supported by Clearwater.
SIP INFO messages (RFC 6086)
  • Framework for exchanging application specification information within a SIP dialog context.
  • Not currently supported by Clearwater.
  • Optional according to TS 24.229.
Indication of support for keepalives (RFC 6223)
  • Adds a parameter to Via headers to allow nodes to agree the type of keepalives to be used to keep NAT pinholes open.
  • Mandatory for UEs and P-CSCFs, optional elsewhere.
  • Not currently supported by Clearwater and would require PJSIP changes.
Response code for indication of terminated dialog (RFC 6228)
  • Defines a new status code to indicate the termination of an early dialog (ie. a dialog created by a provisional response) prior to sending a final response.
  • According to TS 24.229 this parameter is mandatory for all UA components than can send or receive INVITEs, and mandatory for S-CSCF because it has implications on forking proxies.
  • This is not currently supported by Clearwater.
P-Private-Network-Indication (referenced as draft-vanelburg-sipping-private-network-indication-02)
  • Mechanisms for transport of private network information across an IMS core.
  • Optional according to TS 24.229.
  • Not currently supported by Clearwater.
Session-ID header (referenced as draft-kaplan-dispatch-session-id-00)
  • Defines an end-to-end globally unique session identifier that is preserved even for sessions that traverse B2BUAs).
  • Optional according to TS 24.229.
  • Not currently supported by Clearwater.
Interworking with ISDN (draft-ietf-cuss-sip-uui-06 and draft-ietf-cuss-sip-uui-isdn-04)
  • Defines mechanisms/encodings for interworking ISDN with SIP.
  • Optional according to TS 24.229.
  • Not currently supported by Clearwater.

SDP/Media RFCs

TS 24.229 references a number of specifications which relate to media function - either SDP negotiation or media transport or both. Clearwater currently passes SDP transparently and does not get involved in media flows. Unless we change this position, Clearwater can either be considered to support the RFC (because it supports passing SDP with the relevant contents) or that the RFC is not applicable to Clearwater.

The following is a brief explanation of each RFC, and its relevance to IMS.

  • SDP (RFC 4566) - defines basic SDP protocol.
  • Offer/Answer model for SDP media negotiation (RFC 3264) - defines how SDP is used to negotiate media.
  • Default RTP/AV profile (RFC 3551) - defines default mappings from audio and video encodings to RTP payload types.
  • Media Resource Reservation (RFC 3312 and RFC 4032) - defines additional SDP parameters to signal media resource reservation between two SIP UAs. TS 24.229 requires UEs, MGCFs and ASs to use these mechanisms, and that P-CSCF should monitor the flows if it is performing IMS-ALG or media gating functions.
  • Mapping of media streams to resource reservation (RFC 3524 and RFC 3388) - define how multiple media streams can be grouped in SDP and mapped to a single resource reservation. IMS requires UEs to use these encodings when doing resource reservations.
  • Signaling bandwidth required for RTCP (RFC 3556) - by default RTCP is assumed to consume 5% of session bandwidth, but this is not accurate for some encodings, so this RFC defines explicit signaling of RTCP bandwidth in SDP. This function is optional according to TS 24.229.
  • TCP media transport (RFC 3556 and RFC 6544) - defines how support for TCP media transport is encoded in SDP for basic SDP exchanges and for ICE exchanges. According to TS 24.229, media over TCP is optional in most of an IMS network, but mandatory in an IBCF implementing IMS-ALG function.
  • ICE (RFC 5245) - defines ICE media negotiation used to establish efficient media paths through NATs. According to TS 24.229 support for ICE is optional in most of the IMS network, but mandatory on an IBCF implementing IMS-ALG function.
  • STUN (RFC 5389) - defines a protocol used to allow UAs to obtain their server reflexive address for use in ICE.
  • TURN (RFC 5766) - defines extensions to STUN used to relay media sessions via a TURN server to traverse NATs when STUN-only techniques don’t work.
  • Indicating support for ICE in SIP (RFC 5768) - defines a media feature tag used to signal support for ICE.
  • SDP for binary floor control protocol (RFC 4583) - defines SDP extensions for establishing conferencing binary floor control protocol streams. Optional according to TS 24.229.
  • Real-time RTCP feedback (RFC 4585 and RFC 5104) - defines extensions to RTCP to provide extended real-time feedback on network conditions. Mandatory for most IMS components handling media (MRFs, IBCFs, MGCFs), but optional for UEs.
  • SDP capability negotiation (RFC 5939) - defines extensions to SDP to allow for signaling and negotiation of media capabilities (such as alternate transports - SRTP). Mandatory for most IMS components handling media (MRFs, IBCFs, MGCFs), but optional for UEs.
  • SDP transport independent bandwidth signaling (RFC 3890) - defines extensions to SDP to signal codec-only (ie. independent of transport) bandwidth requirements for a stream. Optional for IMS components handling media.
  • Secure RTP (RFC 3711, RFC 4567, RFC 4568, RFC 6043) - defines transport of media over a secure variant of RTP, supporting encryption (to prevent eavesdropping) and integrity protecting (to protect against tampering). Keys are either exchanged in SDP negotiation (specified in RFC 4567 and RFC 4568) or distributed via a separate key management service - termed MIKEY-TICKEY (specified in RFC 6043).
  • Transcoder invocation using 3PCC (RFC 4117) - defines how a transcoding service can be inserted into the media path when required using third-party call control flows. According to TS 24.229 this is only applicable to app servers and MRFC, and even then is optional.
  • MSRP (RFC 4975) - defines a protocol for session-based transmission of a sequence of instant messages. Only applicable to UEs, ASs and MRFCs and optional.
  • SDP for file transfer (RFC 5547) - defines SDP extensions to support simple file transfers between two IMS nodes. Optional.
  • Explicit congestion notifications for RTP over UDP (RFC 6679) - defines RTCP extensions for reporting urgent congestion conditions and reporting congestion summaries. Optional.
  • Setting up audio streams over circuit switched bearers (referenced as draft-ietf-mmusic-sdp-cs-00) - defines SDP extensions for negotiating audio streams over a circuit switched network. Mandatory for ICS capable UEs and SCC-AS, otherwise optional.
  • Media capabilities negotiation in SDP (referenced as draft-ietf-mmusic-sdp-media-capabilities-08) - defines SDP extensions for signaling media capabilities such as encodings and formats. Mandatory for ICS capable UEs and SCC-AS, otherwise optional.
  • Miscellaneous capabilities negotiation in SDP (referenced as draft-ietf-mmusic-sdp-miscellaneous-caps-00) - defines SDP extensions for signaling some miscellaneous capabilities. Mandatory for ICS capable UEs and SCC-AS, otherwise optional.
Locating P-CSCF using DHCP (RFC 3319 and RFC 3361)
  • These RFCs describe a mechanism for locating a SIP proxy server using DHCP. (RFC 3319 defines the mechanism for IPv6/DHCPv6, and RFC 3361 is the IPv4 equivalent - not sure why they happened that way round!).
  • The IMS specifications allows this as one option for a UE to locate a P-CSCF, although there are other options such as manual configuration of a domain name or obtaining it from some access-network specific mechanism.
  • This is irrelevant to Clearwater - there would be no point in Clearwater providing a DHCP server with this function as there will be existing mechanisms used by clients to obtain their own IP addresses. A service provider might enhance their own DHCP servers to support this function if required.
Proxy-to-proxy SIP extensions for PacketCable DCS (RFC 3603)
  • Only relevance to IMS is that it defines a billing correlation parameter (bcid) which is passed in the P-Charging-Vector header from DOCSIS access networks.
Geolocation (RFC 4119 and RFC 6442)
  • Frameworks for passing geo-location information within SIP messages - RFC 4119 encodes geo-location in a message body, RFC 6442 encodes a URI reference where the UEs location can be found.
  • According to TS 24.229 only required on an E-CSCF.
P-User-Database header (RFC 4457)
  • Used when an IMS core has multiple HSSs and an SLF - allows I-CSCF to signal to S-CSCF which HSS to use to avoid multiple SLF look-ups.
  • Optional according to TS 24.229.
  • Not applicable to Clearwater given its stateless architecture.
URIs for Voicemail and IVR applications (RFC 4458)
  • Defines conventions for service URIs for redirection services such as voicemail and IVR.
  • Optional according to TS 24.229.
Emergency call requirements and terminology (RFC 5012)
  • Referenced to define some terminology used in IMS architecture for handling emergency calls.
Answering Modes (RFC 5373)
  • Defines a mechanism for a caller to control the answer mode at the target of the call. Use cases can include invoking some kind of auto-answer loopback. Covers the Answer-Mode and Priv-Answer-Mode headers.
  • In general is transparent to proxies (provided proxies pass headers through), but the RFC recommends the mechanism is not used in environments that support parallel forking.
  • Optional according to TS 24.229 - and arguably not a good idea because of bad interactions with forking.

Clearwater Automatic Clustering and Configuration Sharing

Clearwater has a feature that allows nodes in a deployment to automatically form the correct clusters and share configuration with each other. This makes deployments much easier to manage. For example:

  • It is easy to add new nodes to an existing deployment. The new nodes will automatically join the correct clusters according to their node type, without any loss of service. The nodes will also learn the majority of their config from the nodes already in the deployment.
  • Similarly, removing nodes from a deployment is straightforward. The leaving nodes will leave their clusters without impacting service.
  • It makes it much easier to modify configuration that is shared across all nodes in the deployment.

This features uses etcd as a decentralized data store, a clearwater-cluster-manager service to handle automatic clustering, and a clearwater-config-manager to handle configuration sharing.

Is my Deployment Using Automatic Clustering and Configuration Sharing?

To tell if your deployment is using this feature, log onto one of the nodes in your deployment and run dpkg --list | grep clearwater-etcd. If this does not give any output the feature is not in use.

Migrating to Automatic Clustering and Configuration Sharing

Deployments that are not using the feature may be migrated so they start using it. To perform this migration, follow these instructions.

Provisioning Subscribers

Clearwater provides the Ellis web UI for easy provisioning of subscribers. However, sometimes a more programmatic interface is desirable.

Homestead provides a provisioning API but, for convenience, Clearwater also provides some command-line provisioning tools.

By default, the tools are installed on the Homestead servers only (as part of the clearwater-prov-tools package), in the /usr/share/clearwater/bin directory.

There are 5 tools.

  • create_user - for creating users
  • update_user - for updating users’ passwords
  • delete_user - for deleting users
  • display_user - for displaying users’ details
  • list_users - for listing users

Creating users

New users can be created with the create_user tool. As well as creating single users, it’s also possible to create multiple users with a single command. Note that this is not recommended for provisioning large numbers of users - for that, bulk provisioning is much quicker.

usage: create_user.py [-h] [-k] [-q] [--hsprov IP:PORT] [--plaintext]
                      [--ifc iFC-FILE] [--prefix TWIN_PREFIX]
                      <directory-number>[..<directory-number>] <domain>
                      <password>

Create user

positional arguments:
  <directory-number>[..<directory-number>]
  <domain>
  <password>

optional arguments:
  -h, --help            show this help message and exit
  -k, --keep-going      keep going on errors
  -q, --quiet           don't display the user
  --hsprov IP:PORT      IP address and port of homestead-prov
  --plaintext           store password in plaintext
  --ifc iFC-FILE        XML file containing the iFC
  --prefix TWIN_PREFIX  twin-prefix (default: 123)

Update users

Existing users’ passwords can be updated with the update_user tool.

usage: update_user.py [-h] [-k] [-q] [--hsprov IP:PORT] [--plaintext]
                      <directory-number>[..<directory-number>] <domain>
                      <password>

Update user

positional arguments:
  <directory-number>[..<directory-number>]
  <domain>
  <password>

optional arguments:
  -h, --help            show this help message and exit
  -k, --keep-going      keep going on errors
  -q, --quiet           don't display the user
  --hsprov IP:PORT      IP address and port of homestead-prov
  --plaintext           store password in plaintext

Delete users

Users can be deleted with the delete_user tool.

usage: delete_user.py [-h] [-f] [-q] [--hsprov IP:PORT]
                      <directory-number>[..<directory-number>] <domain>

Delete user

positional arguments:
  <directory-number>[..<directory-number>]
  <domain>

optional arguments:
  -h, --help            show this help message and exit
  -f, --force           proceed with delete in the face of errors
  -q, --quiet           silence 'forced' error messages
  --hsprov IP:PORT      IP address and port of homestead-prov

Display users

Users’ details can be displayed with the display_user tool.

usage: display_user.py [-h] [-k] [-q] [-s] [--hsprov IP:PORT]
                       <directory-number>[..<directory-number>] <domain>

Display user

positional arguments:
  <directory-number>[..<directory-number>]
  <domain>

optional arguments:
  -h, --help            show this help message and exit
  -k, --keep-going      keep going on errors
  -q, --quiet           suppress errors when ignoring them
  -s, --short           less verbose display
  --hsprov IP:PORT      IP address and port of homestead-prov

List users

All the users provisioned on the system can be listed with the list_users tool.

Note that the --full parameter defaults to off because it greatly decreases the performance of the tool (by more than an order of magnitude).

The --pace parameter’s default values should ensure that this does not use more than 10% of the homestead cluster’s CPU - that is 5 users per second if --force is set and 500 if not. If you set the “–pace” parameter to more than the default, you’ll be prompted to confirm (or specify the --force parameter).

usage: list_users.py [-h] [-k] [--hsprov IP:PORT] [--full] [--pace PACE] [-f]

List users

optional arguments:
  -h, --help        show this help message and exit
  -k, --keep-going  keep going on errors
  --hsprov IP:PORT  IP address and port of homestead-prov
  --full            displays full information for each user
  --pace PACE       sets the target number of users to list per second
  -f, --force       forces specified pace

Dealing with Failed Nodes

Nodes can be easily removed from a Clearwater deployment by following the instructions for elastic scaling. When scaling down the remaining nodes are informed that a node is leaving and take appropriate action. However sometimes a node may fail unexpectedly. If this happens and the node cannot be recovered (for example the virtual machine has been deleted) the remaining nodes must be manually informed of the failure. This article explains how to do this.

The processes described in the document do not affect call processing and can be run on a system handling call traffic.

Removing a Failed Node

If a node permanently fails scaling the deployment up and down may stop working, or if a scaling operation is in progress it may get stuck (because other nodes in the tier will wait forever for the failed node to react). To recover from this situation the failed node should be removed from the deployment using the following steps:

  • Remove the node from the underlying etcd cluster. To do this:
    • Run clearwater-etcdctl cluster-health and make a note of the ID of the failed node.
    • Run clearwater-etcdctl member list to check that the failed node reported is the one you were expecting (by looking at its IP address).
    • Run clearwater-etcdctl member remove <ID>, replacing <ID> with the ID learned above.
  • Remove the failed node from any back-end data store clusters it was a part of (see Removing a Node From a Data Store).

Multiple Failed Nodes

If your deployment loses half or more of its nodes permanently, it loses “quorum” which means that the underlying etcd cluster becomes read-only. This means that scaling up and down is not possible and changes to shared config cannot be made. It also means that the steps for removing a single failed node won’t work. This section describes how to recover from this state.

In this example, your initial cluster consists of servers A, B, C, D, E and F. D, E and F die permanently, and A, B and C enter a read-only state (because they lack quorum). Recent changes may have been permanently lost at this point (if they were not replicated to the surviving nodes).

You should follow this process completely - the behaviour is unspecified if this process is started but not completed. It is always safe to restart this process from the beginning (for example, if you encounter an error partway through).

To recover from this state:

  • stop etcd on A, B and C by running sudo monit stop -g etcd
  • create a new cluster, only on A, by:
    • editing etcd_cluster in /etc/clearwater/local_config to just contain A’s IP (e.g. etcd_cluster=10.0.0.1)
    • running sudo service clearwater-etcd force-new-cluster. This will warn that this is dangerous and should only be run during this process; choose to proceed.
    • running clearwater-etcdctl member list to check that the cluster only has A in
    • running clearwater-etcdctl cluster-health to check that the cluster is healthy
    • running clearwater-etcdctl get configuration/shared_config to check that the data is safe.
    • running sudo monit monitor -g etcd to put etcd back under monit control
  • get B to join that cluster by:
    • editing etcd_cluster in /etc/clearwater/local_config to just contain A’s IP (e.g. etcd_cluster=10.0.0.1)
    • running sudo service clearwater-etcd force-decommission. This will warn that this is dangerous and offer the chance to cancel; do not cancel.
    • running sudo monit monitor -g etcd.
  • get C to join that cluster by following the same steps as for B:
    • editing etcd_cluster in /etc/clearwater/local_config to just contain A’s IP (e.g. etcd_cluster=10.0.0.1)
    • running sudo service clearwater-etcd force-decommission. This will warn that this is dangerous and should only be run during this process; choose to proceed.
    • running sudo monit monitor -g etcd.
  • check that the cluster is now OK by doing the following on A:
  • log on to A. For each of D, E and F follow the instructions in Removing a Node From a Data Store.

Removing a Node From a Data Store

The mark_node_failed script can be used to remove a failed node from a back-end data store. You will need to know the type of the failed node (e.g. “sprout”) and its IP address. To remove the failed node log onto a working node in the same site and run the following commands (depending on the failed node’s type). If you are removing multiple nodes simultaneously, ensure that you run the mark_node_failed scripts for each store type simultaneously (e.g. for multiple sprout removal, mark all failed nodes for memcached simultaneously first, and then mark all failed nodes for chronos).

If you cannot log into a working node in the same site (e.g. because an entire geographically redundant site has been lost), you can use a working node in the other site, but in this case you must run /usr/share/clearwater/clearwater-cluster-manager/scripts/mark_remote_node_failed instead of /usr/share/clearwater/clearwater-cluster-manager/scripts/mark_node_failed.

If you are using separate signaling and management networks, you must use the signaling IP address of the failed node as the failed node IP in the commands below.

Sprout
sudo /usr/share/clearwater/clearwater-cluster-manager/scripts/mark_node_failed "sprout" "memcached" <failed node IP>
sudo /usr/share/clearwater/clearwater-cluster-manager/scripts/mark_node_failed "sprout" "chronos" <failed node IP>
Homestead
sudo /usr/share/clearwater/clearwater-cluster-manager/scripts/mark_node_failed "homestead" "cassandra" <failed node IP>
Homer
sudo /usr/share/clearwater/clearwater-cluster-manager/scripts/mark_node_failed "homer" "cassandra" <failed node IP>
Ralf
sudo /usr/share/clearwater/clearwater-cluster-manager/scripts/mark_node_failed "ralf" "chronos" <failed node IP>
sudo /usr/share/clearwater/clearwater-cluster-manager/scripts/mark_node_failed "ralf" "memcached" <failed node IP>
Memento
sudo /usr/share/clearwater/clearwater-cluster-manager/scripts/mark_node_failed "memento" "cassandra" <failed node IP>
sudo /usr/share/clearwater/clearwater-cluster-manager/scripts/mark_node_failed "memento" "memcached" <failed node IP>

Complete Site Failure

In a geographically redundant deployment, you may encounter the situation where an entire site has permanently failed (e.g. because the location of that geographic site has been physically destroyed). To recover from this situation:

  • If the failed site contained half or more of your nodes, you have lost quorum in your etcd cluster. You should follow the “Multiple Failed Nodes” instructions above to rebuild the etcd cluster, containing only nodes from the surviving site.
  • If the failed site contained fewer than half of your nodes, you have not lost quorum in your etcd cluster. You should follow the “Removing a Failed Node” instructions above to remove each failed node from the cluster.

After following the above instructions, you will have removed the nodes in the failed site from etcd, but not from the Cassandra/Chronos/Memcached datastore clusters. To do this, follow the “Removing a Node From a Data Store” instructions above for each failed node, using the mark_remote_node_failed script instead of the mark_node_failed script.

You should now have a working single-site cluster, which can continue to run as a single site, or be safely paired with a new remote site.

C++ Coding Guidelines

Naming

  • Class names are CamelCase, beginning with a capital letter.
  • Constant names are ALL_CAPITALS with underscores separating words.
  • Method, method parameter and local variable names are lowercase with underscores separating words, e.g. method_name.
  • Member variable names are lowercase with underscores separating words and also begin with an underscore, e.g. _member_variable.

C++ Feature Use

  • We assume a C++11-compliant compiler, so C++11 features are allowed with some exceptions.
  • No use of auto type declarations.
  • No use of using namespace ..., if the namespace is particularly lengthy, consider using namespace aliasing (e.g. namespace po = boost::program_options).
  • Avoid using Boost (or similar) libraries that return special library-specific pointers, to minimize “infection” of the code-base. Consider using the C++11 equivalents instead.

Formatting

C++ code contributed to Clearwater should be formatted according to the following conventions:

  • Braces on a separate line from function definitions, if statements, etc.
  • Two-space indentation
  • Pointer operators attached to the variable type (i.e. int* foo rather than int *foo)
  • if blocks must be surrounded by braces

For example:

if (x)
       int *foo = do_something();

will be replaced with

if (x)
{
 int* foo = do_something();
}

It’s possible to fix up some code automatically using astyle, with the options astyle --style=ansi -s2 -M80 -O -G -k1 -j -o. This fixes up a lot of the most common errors (brace style, indentation, overly long lines), but isn’t perfect - there are some cases where breaking the rules makes the code clearer, and some edge cases (e.g. around switch statements and casts on multiple lines) where our style doesn’t always match astyle’s.

Language Features

  • Use of the auto keyword is forbidden.

Commenting

Where it is necessary to document the interface of classes, this should be done with Doxygen-style comments - three slashes and appropriate @param and @returns tags.

/// Apply first AS (if any) to initial request.
//
// See 3GPP TS 23.218, especially s5.2 and s6, for an overview of how
// this works, and 3GPP TS 24.229 s5.4.3.2 and s5.4.3.3 for
// step-by-step details.
//
// @Returns whether processing should stop, continue, or skip to the end.
AsChainLink::Disposition
AsChainLink::on_initial_request(CallServices* call_services,

Ruby Coding Guidelines

Strongly based on https://github.com/chneukirchen/styleguide/ with some local changes.

Formatting:

  • Use UTF-8 encoding in your source files.
  • Use 2 space indent, no tabs.
  • Use Unix-style line endings, including on the last line of the file.
  • Use spaces around operators, after commas, colons and semicolons, around { and before }.
  • No spaces after (, [ and before ], ).
  • Prefer postfix modifiers (if, unless, rescue) when possible.
  • Indent when as deep as case then indent the contents one step more.
  • Use an empty line before the return value of a method (unless it only has one line), and an empty line between defs.
  • Use Yard and its conventions for API documentation. Don’t put an empty line between the comment block and the definition.
  • Use empty lines to break up a long method into logical paragraphs.
  • Keep lines shorter than 80 characters.
  • Avoid trailing whitespace.

Syntax:

  • Use def with parentheses when there are arguments.
  • Conversely, avoid parentheses when there are none.
  • Never use for, unless you exactly know why. Prefer each or loop.
  • Never use then, a newline is sufficient.
  • Prefer words to symbols.
  • and and or in place of && and ||
  • not in place of !
  • Avoid ?:, use if (remember: if returns a value, use it).
  • Avoid if not, use unless.
  • Suppress superfluous parentheses when calling methods, unless the method has side-effects.
  • Prefer do...end over {...} for multi-line blocks.
  • Prefer {...} over do...end for single-line blocks.
  • Avoid chaining function calls over multiple lines (implying, use {...} for chained functions.
  • Avoid return where not required.
  • Avoid line continuation (\) where not required.
  • Using the return value of = is okay.
  • if v = array.grep(/foo/)
  • Use ||= freely for memoization.
  • When using regexps, freely use =~, -9, :math:`~, ` and $` when needed.
  • Prefer symbols (:name) to strings where applicable.

Naming:

  • Use snake_case for methods.
  • Use CamelCase for classes and modules. (Keep acronyms like HTTP, RFC and XML uppercase.)
  • Use SCREAMING_CASE for other constants.
  • Use one-letter variables for short block/method parameters, according to this scheme:
  • a,b,c: any object
  • d: directory names
  • e: elements of an Enumerable or a rescued Exception
  • f: files and file names
  • i,j: indexes or integers
  • k: the key part of a hash entry
  • m: methods
  • o: any object
  • r: return values of short methods
  • s: strings
  • v: any value
  • v: the value part of a hash entry
  • And in general, the first letter of the class name if all objects are of that type (e.g. nodes.each { |n| n.name })
  • Use _ for unused variables.
  • When defining binary operators, name the argument other.
  • Use def self.method to define singleton methods.

Comments:

  • Comments longer than a word are capitalized and use punctuation. Use two spaces after periods.
  • Avoid superfluous comments. It should be easy to write self-documenting code.

Code design

  • Avoid needless meta-programming.
  • Avoid long methods. Much prefer to go too far the wrong way and have multiple one-line methods.
  • Avoid long parameter lists, consider using a hash with documented defaults instead.
  • Prefer functional methods over procedural ones (common methods below):
  • each - Apply block to each element
  • map - Apply block to each element and remember the returned values.
  • select - Find all matching elements
  • detect - Find first matching element
  • inject - Equivalent to foldl from Haskell
  • Use the mutating version of functional methods (e.g. map!) where applicable, rather than using temporary variables.
  • Avoid non-obvious function overloading (e.g. don’t use [“0”] * 8 to initialize an array).
  • Prefer objects to vanilla arrays/hashes, this allows you to document the structure and interface.
  • Protect the internal data stores from external access. Write API functions explicitly.
  • Use attr_accessor to create getters/setters for simple access.
  • Prefer to add a to_s function to an object for ease of debugging.
  • Internally, use standard libraries where applicable (See the docs for the various APIs).:
  • Hash, Array and Set
  • String
  • Fixnum and Integer
  • Thread and Mutex
  • Fiber
  • Complex
  • Float
  • Dir and File
  • Random
  • Time
  • Prefer string interpolation “blah#{expr}” rather than appending to strings.
  • Prefer using the %w{} family of array generators to typing out arrays of strings manually.

General:

  • Write ruby -w safe code.
  • Avoid alias, use alias_method if you absolutely must alias something (for Monkey Patching).
  • Use OptionParser for parsing command line options.
  • Target Ruby 2.0 (except where libraries are not compatible, such as Chef).
  • Do not mutate arguments unless that is the purpose of the method.
  • Do not mess around in core classes when writing libraries.
  • Do not program defensively.
  • Keep the code simple.
  • Be consistent.
  • Use common sense.

Debugging Clearwater with GDB and Valgrind

Valgrind

Valgrind is a very powerful profiling and debugging tool.

Before you run Bono or Sprout under valgrind, you may want to tweak pjsip’s code slightly as indicated below, then rebuild and patch your nodes.

  • Valgrind’s memory access tracking hooks into malloc and free. Unfortunately, pjsip uses its own memory management functions, and so mallocs/frees relatively rarely. To disable this, modify pjlib/src/pj/pool_caching.c‘s pj_caching_pool_init function to always set cp->max_capacity to 0.
  • Valgrind’s thread safety tool (helgrind) tracks the order in which locks are taken, and reports on any lock cycles (which can in theory cause deadlocks). One of these locks generates a lot of benign results. To prevent these edit pjsip\include\pjsip\sip_config.h and set the value of PJSIP_SAFE_MODULE to 0.
  • Valgrind’s thread safety tool also spots variables that are accessed on mutliple threads without locking the locking necessary to prevent race conditions. Pjsip’s memory pools maintain some statistics that are not used by Clearwater, but that trip valgrind’s race condition detection. To suppress this edit pjlib/src/pj/pool_caching.c and remove the bodies of the cpool_on_block_alloc and cpool_on_block_free (keeping only the bodies).

To run Bono, Sprout or Homestead under valgrind (the example commands assume you are running sprout), the easiest way is to:

  • Find the command line you are using to run Sprout (ps -eaf | grep sprout)
  • Make sure valgrind is installed on your system and you have the appropriate debug packages installed (sudo apt-get install valgrind and sudo apt-get install sprout-dbg)
  • Disable monitoring of sprout (sudo monit unmonitor -g sprout)
  • Stop sprout (sudo service sprout stop)
  • Allow child processes to use more file descriptors, and become the sprout user (sudo -i; ulimit -Hn 1000000; ulimit -Sn 1000000; (sudo -u sprout bash);)
  • Change to the /etc/clearwater directory
  • Set up the library path (export LD_LIBRARY_PATH=/usr/share/clearwater/sprout/lib:$LD_LIBRARY_PATH)
  • Run the executable under valgrind, enabling the appropriate valgrind options - for example, to use massif to monitor the Sprout heap valgrind --tools=massif --massif-out-file=/var/log/sprout/massif.out.%p /usr/share/clearwater/bin/sprout <parameters> (the –massif-out-file option is required to ensure the output is written to a directory where the sprout user has write permission). If any of Sprout parameters include a semi-colon, you must prefix this with a backslash otherwise the bash interpreter will interpret this as the end of the command.

Valgrind will slow down the running of Bono, Sprout and Homestead by a factor of 5-10. It will produce output when it detects invalid/illegal memory access - often these turn out to be benign, but they’re rarely spurious.

GDB

Installing

To install gdb, simply type sudo apt-get install gdb. gdb is already installed on build machines, but not on live nodes.

If you’re debugging on a live node, it’s also worth installing the sprout or bono debug packages. When we build the standard (release) versions, we strip all the symbols out and these are saved off separately in the debug package. Note that you will still be running the release code - the debug symbols will just be loaded by gdb when you start it up. To install these packages, type sudo apt-get install sprout-dbg or sudo apt-get install bono-dbg.

Pull Request Process

Dev and master branches

Each component/repository (with a few exceptions) has 2 main branches, dev and master. Whenever a commit is pushed to the dev branch, Jenkins will automatically run the unit tests for the repository and if they pass, merge into master`.

Development process

Features or any changes to the codebase should be done as follows:

  1. Pull the latest code in the dev branch and create a feature branch off this.
    • Alternatively, if you want to be sure of working from a good build and don’t mind the risk of a harder merge, branch off master.
    • If the codebase doesn’t have a dev branch, branch off master.
  2. Implement your feature (including any necessary UTs etc). Commits are cheap in git, try to split up your code into many, it makes reviewing easier as well as for saner merging.
  3. Once complete, ensure the following check pass (where relevant):
    • All UTs (including coverage and valgrind) pass.
    • Your code builds cleanly into a Debian package on the repo server.
    • The resulting package installs cleanly using the clearwater-infrastructure script.
    • The live tests pass.
  4. Push your feature branch to GitHub.
  5. Create a pull request using GitHub, from your branch to dev (never master, unless the codebase has no dev branch).
  6. Await review.
    • Address code review issues on your feature branch.
    • Push your changes to the feature branch on GitHub. This automatically updates the pull request.
    • If necessary, make a top-level comment along the lines “Please re-review”, assign back to the reviewer, and repeat the above.
    • If no further review is necessary and you have the necessary privileges, merge the pull request and close the branch. Otherwise, make a top-level comment and assign back to the reviewer as above.

Reviewer process:

  • Receive notice of review by GitHub email, GitHub notification, or by checking all your Metaswitch GitHub issues.
  • Make comments on the pull request (either line comments or top-level comments).
  • Make a top-level comment saying something along the lines of “Fine; some minor comments” or “Some issues to address before merging”.
  • If there are no issues, merge the pull request and close the branch. Otherwise, assign the pull request to the developer and leave this to them.

Stress Testing

One of Clearwater’s biggest strengths is scalability and in order to demonstrate this, we have easy-to-use settings for running large amounts of SIP stress against a deployment. This document describes: - Clearwater’s SIP stress nodes, what they do, and (briefly) how they work - how to kick off your own stress test.

SIP Stress Nodes

A Clearwater SIP stress node is similar to any other Project Clearwater node, except that instead of having a Debian package like bono or sprout installed, it has our clearwater-sip-stress Debian package installed.

What they do

Clearwater SIP stress nodes run a standard SIPp script against your bono cluster. The bono nodes translate this into traffic for sprout and this generates traffic on homestead and homer. The nodes log their success/failure to /var/log/clearwater-sip-stress and also restart SIPp (after a 30s cool-down) if anything goes wrong.

The SIPp stress starts as soon as the node comes up - this is by design, to avoid you having to manually start 50+ nodes generating traffic. If you want to stop them from generating traffic, turn the whole node down.

Each SIP stress node picks a single bono to generate traffic against. This bono is chosen by matching the bono node’s index against the SIP stress node’s index. As a result, it’s important to always give SIP stress nodes an index.

How they work

The clearwater-sip-stress package includes two important scripts.

  • /usr/share/clearwater/infrastructure/scripts/sip-stress, which generates a /usr/share/clearwater/sip-stress/users.csv.1 file containing the list of all subscribers we should be targeting - these are calculated from properties in /etc/clearwater/shared_config.
  • /etc/init.d/clearwater-sip-stress, which runs /usr/share/clearwater/bin/sip-stress, which in turn runs SIPp specifying /usr/share/clearwater/sip-stress/call_load2.xml as its test script. This test script simulates a pair of subscribers registering every 5 minutes and then making a call every 30 minutes.

The stress test logs to /var/log/clearwater-sip-stress/sipp.<index>.out.

Running Stress

Using Chef

This section describes step-by-step how to run stress using Chef automation. It includes setting up a new deployment. It is possible to run stress against an existing deployment but you’ll need to reconfigure some things so it’s easier just to tear down your existing deployment and start again.

  1. If you haven’t done so already, set up chef/knife (as described in the install guide).

  2. cd to your chef directory.

  3. Edit your environment (e.g. environments/ENVIRONMENT.rb) to override attributes as follows.

    For example (replacing ENVIRONMENT with your environment name and DOMAIN with your Route 53-hosted root domain):

    name "ENVIRONMENT"
    description "Stress testing environment"
    cookbook_versions "clearwater" => "= 0.1.0"
    override_attributes "clearwater" => {
        "root_domain" => "DOMAIN",
        "availability_zones" => ["us-east-1a"],
        "repo_server" => "http://repo.cw-ngv.com/latest",
        "number_start" => "2010000000",
        "number_count" => 1000,
        "pstn_number_count" => 0}
    
  4. Upload your new environment to the chef server by typing knife environment from file environments/ENVIRONMENT.rb

  5. Create the deployment by typing knife deployment resize -E ENVIRONMENT. If you want more nodes, supply parameters such “–bono-count 5” or “–sprout-count 3” to control this.

  6. Follow this process to bulk provision subscribers. Create 30000 subscribers per SIPp node.

  7. Create your stress test node by typing knife box create -E ENVIRONMENT sipp --index 1. If you have multiple bono nodes, you’ll need to create multiple stress test nodes by repeating this command with “–index 2”, “–index 3”, etc. - each stress test node only sends traffic to the bono with the same index.

  • To create multiple nodes, try for x in {1..20} ; do { knife box create -E ENVIRONMENT sipp --index $x && sleep 2 ; } ; done.
  • To modify the number of calls/hour to simulate, edit/add count=<number> to /etc/clearwater/shared_config, then run sudo /usr/share/clearwater/infrastructure/scripts/sip-stress and sudo service clearwater-sip-stress restart.
  1. Create a Cacti server for monitoring the deployment, as described in this document.
  2. When you’ve finished, destroy your deployment with knife deployment delete -E ENVIRONMENT.
Manual (i.e. non-Chef) stress runs

SIP stress can also be run against a deployment that has been installed manually (as per the Manual Install instructions).

Firstly follow this process to bulk provision subscribers. Work out how many stress nodes you want, and create 30000 subscribers per SIPp node.

To create a new SIPp node, create a new virtual machine and bootstrap it by configuring access to the Project Clearwater Debian repository.

Then set the following properties in /etc/clearwater/local_config:

  • (required) local_ip - the local IP address of this node
  • (optional) node_idx - the node index (defaults to 1)

Set the following properties in /etc/clearwater/shared_config:

  • (required) home_domain - the home domain of the deployment under test
  • (optional) bono_servers - a list of bono servers in this deployment
  • (optional) stress_target - the target host (defaults to the :math:`node_idx-th entry in `bono_servers or, if there are no :math:`bono_servers, defaults to `home_realm)
  • (optional) base - the base directory number (defaults to 2010000000)
  • (optional) count - the number of subscribers to run on this node (must be even, defaults to 30000)

Finally, run sudo apt-get install clearwater-sip-stress to install the Debian package. Stress will start automatically after the package is installed.

Configuring UDP Stress

SIPp supports UDP stress, and this works with Clearwater. To make a stress test node run UDP stress (rather than the default of TCP)

  • open /usr/share/clearwater/bin/sip-stress
  • find the line that begins “nice -n-20 /usr/share/clearwater/bin/sipp”
  • replace “-t tn” with “-t un”
  • save and exit
  • restart stress by typing “sudo service clearwater-sip-stress restart”.

Using Cacti with Clearwater

Cacti is an open-source statistics and graphing solution. It supports gathering statistics via native SNMP and also via external scripts. It then exposes graphs over a web interface.

We use Cacti for monitoring Clearwater deployment, in particular large ones running stress.

This document describes how to

  • create and configure a Cacti node
  • point it at your Clearwater nodes
  • view graphs of statistics from your Clearwater nodes.

Setting up a Cacti node (automated)

Assuming you’ve followed the Automated Chef install, here are the steps to create and configure a Cacti node:

  1. use knife box create to create a Cacti node - knife box create -E     <name> cacti
  2. set up a DNS entry for it - knife dns record create -E <name>     cacti -z <root> -T A --public cacti -p <name>
  3. create graphs for all existing nodes by running knife cacti update -E <name>
  4. point your web browser at cacti.<name>.<root>/cacti/ (you may need to wait for the DNS entry to propagate before this step works)

If you subsequently scale up, running knife cacti update -E <name> again will create graphs for the new nodes (without affecting the existing nodes).

Setting up a Cacti node (manual)

If you haven’t followed our automated install process, and instead just have an Ubuntu 14.04 machine you want to use to monitor Clearwater, then:

  1. install Cacti on a node by running sudo apt-get install cacti cacti-spine

  2. accept all the configuration defaults

  3. login (admin/admin) and set a new password

  4. modify configuration by

    1. going to Devices and deleting “localhost”

    2. going to Settings->Poller and set “Poller Type” to “spine” and “Poller Interval” to “Every Minute” - then click Save at the bottom of the page

    3. going to Data Templates and, for each of “ucd/net - CPU Usage - *” set “Associated RRA’s” to “Hourly (1 Minute Average)” and “Step” to 60

    4. going to Graph Templates and change “ucd/net - CPU Usage” to disable “Auto Scale”

    5. going to Import Templates and import the Sprout, Bono and SIPp host templates - these define host templates for retrieving statistics from our components via SNMP and 0MQ. For each template you import, select “Select your RRA settings below” and “Hourly (1 Minute Average)”

    6. set up the 0MQ-querying script by

      1. ssh-ing into the cacti node

      2. running the following

        sudo apt-get install -y --force-yes git ruby1.9.3 build-essential libzmq3-dev
        sudo gem install bundler --no-ri --no-rdoc
        git clone https://github.com/Metaswitch/cpp-common.git
        cd cpp-common/scripts/stats
        sudo bundle install
        
Pointing Cacti at a Node

Before you point Cacti at a node, make sure the node has the required packages installed. All nodes need clearwater-snmpd installed (sudo apt-get install clearwater-snmpd). Additionally, sipp nodes need clearwater-sip-stress-stats (sudo apt-get install clearwater-sip-stress-stats).

To manually point Cacti at a new node,

  1. go to Devices and Add a new node, giving a Description and Hostname, setting a Host Template of “Bono”, “Sprout” or “SIPp” depending on the node type), Downed Device Detection to “SNMP Uptime” and SNMP Community to “clearwater”
  2. click “Create Graphs for this Host” and select the graphs that you want - “ucd/net - CPU Usage” is a safe bet, but you might also want “Client Counts” and “Bono Latency” (if a bono node), “Sprout Latency” (if a sprout node), or “SIP Stress Status” (if a sipp node).
  3. click “Edit this Host” to go back to the device, choose “List Graphs”, select the new graphs and choose “Add to Default Tree” - this exposes them on the “graphs” tab you can see at the top of the page (although it may take a couple of minutes for them to accumulate enough state to render properly).

Viewing Graphs

Graphs can be viewed on the top “graphs” tab. Useful features include

  • the “Half hour” preset, which shows only the last half-hour rather than the last day
  • the search box, which matches on graph name
  • the thumbnail view checkbox, which gets you many more graphs on screen
  • the auto-refresh interval, configured on the settings panel (the default is 5 minutes, so if you’re looking for a more responsive UI, you’ll need to set a smaller value)
  • CSV export, achievable via the icon on the right of the graph.

Running the Clearwater Live Tests

The Clearwater team have put together a suite of live tests that can be run over a deployment to confirm that the high level function is working. These instructions will take you through installing the test suite and running the tests.

Prerequisites

  • You’ve installed Clearwater
  • You have an Ubuntu Linux machine available to drive the tests
  • If you installed through the automated install process, the chef workstation is a perfectly good choice for this task

Installing dependencies

On your test machine, run

sudo apt-get install build-essential git --yes
curl -L https://get.rvm.io | bash -s stable
source ~/.rvm/scripts/rvm
rvm autolibs enable
rvm install 1.9.3
rvm use 1.9.3

This will install Ruby version 1.9.3 and its dependencies.

Fetching the code

Run the following to download and install the Clearwater test suite

git clone -b stable --recursive git@github.com:Metaswitch/clearwater-live-test.git
cd clearwater-live-test
bundle install

Make sure that you have an SSH key - if not, see the GitHub instructions for how to create one.

Work out your signup code

The tests need your signup code to create a test user. You set this as signup_key during install: manually in /etc/clearwater/shared_config or automatically in knife.rb. For the rest of these instructions, the signup code will be referred to as <code>.

Running the tests against an All-in-One node

Work out your All-in-One node’s identity

This step is only required if you installed an All-in-One node, either from an AMI or an OVF. If you installed manually or using the automated install process, just move on to the next step.

If you installed an All-in-One node from an Amazon AMI, you need the public DNS name that EC2 has assigned to your node. This will look something like ec2-12-34-56-78.compute-1.amazonaws.com and can be found on the EC2 Dashboard on the “instances” panel. If you installed an All-in-One node from an OVF image, you need the IP address that was assigned to the node via DHCP. You can find this out by logging into the node’s console and typing hostname -I.

For the rest of these instructions, the All-in-One node’s identity will be referred to as <aio-identity>.

Run the tests

To run the subset of the tests that don’t require PSTN interconnect to be configured, simply run

rake test[example.com] SIGNUP_CODE=<code> PROXY=<aio-identity> ELLIS=<aio-identity>

The suite of tests will be run and the results will be printed on-screen.

Running the tests against a full deployment

Work out your base domain
  • If you installed Clearwater manually, your base DNS name will simply be <zone>.
  • If you installed using the automated install process, your base DNS name will be <name>.<zone>.

For the rest of these instructions, the base DNS name will be referred to as <domain>.

Running the tests

To run the subset of the tests that don’t require PSTN interconnect to be configured, simply run

rake test[<domain>] SIGNUP_CODE=<code>

The suite of tests will be run and the results will be printed on-screen. If your deployment doesn’t have DNS entries for the Bono and Ellis domain, then these can be passed to rake by adding the PROXY and ELLIS parameters, e.g.

rake test[<domain>] SIGNUP_CODE=<code> PROXY=<Bono domain> ELLIS=<Ellis domain>

PSTN testing

The live test framework also has the ability to test various aspects of PSTN interconnect. If you’ve configured your Clearwater deployment with an IBCF node that connects it to a PSTN trunk, you can test that functionality by picking a real phone (e.g. your mobile) and running

rake test[<domain>] PSTN=true LIVENUMBER=<your phone #> SIGNUP_CODE=<code>

Which will test call services related to the PSTN and will ring your phone and play an audio clip to you (twice, to cover both TCP and UDP).

Where next?

Now that you’ve confirmed that your Clearwater system is operational, you might want to make a real call or explore Clearwater to see what else Clearwater offers.