Overview

In the unlikely event of a network outage, or any other outage prevents you at the site from connecting to Webex Calling Dedicated Instance, the Enhanced Survivability Node actively takes over the call control and routing functionalities. Webex Calling Dedicated Instance, Webex Calling Multi-tenant and on-premises deployment, all have Survivability options, but the solution document details the solution level aspects of Enhanced Survivability for Webex Calling Dedicated Instance.

In Dedicated Instance, the Unified CM cluster’s subscribers is deployed across the datacenter within a region, to provide high availability and Geo-redundancy. It allows the devices or client to failover to the subscriber in the other datacenter. But, if there is a network outage between your site and the Dedicated Instance cloud, the Enhanced Survivability Node that deploys within the site can handle the call control and routing functionalities until the connectivity restores. The Enhanced Survivability Node (ESN) provides the call control functionalities of a standard subscriber during an event of outage.

The Enhanced Survivability Node can only route calls within a site and for other calls, it must route through PSTN for which you must deploy a Local Gateway within the site for PSTN. It requires for you to set up a local DNS server for the ESN for resolutions, as the ESN not able to reach the Cisco’s DNS server during the outage. The Enhanced Survivability Node can coexist with Cisco SRST as well.


To know the level of responsibility to deploy the Enhanced Survivability Node. Refer Enhanced Survivability- Roles and Responsibilty Matrix.

Depolyment Models

Single Site

In the Single Site deployment model, where an Enhanced Survivability Node (ESN) is deployed within a site along with a Local Gateway for PSTN call routing. A maximum of 7500 devices can be registered to the ESN during an outage.

Multiple-Site

In the multiple-site deployment model, where there are multiple sites and ESN can be deployed in each site depends on the business requirement for the site’s survivability. The requirements of a Local Gateway and DNS is always a necessity and a total of 8 ESN nodes can be added to a Unified CM Cluster.

This deployment model is relevant for a customer across a region with multiple sites and survivability is a requirement for multiple of those sites. While it is possible to share the PSTN local gateway across sites, it is not recommended. if there is a network outage, the site can become isolated and in that case, ESN will not be able to reach the Local Gateway for routing the calls to PSTN.

Below are 2 deployment options for a multiple-site deployment:

  • Option 1: Enhanced Survivability Node deployed at each site.
  • Option 2 – Common Enhanced Survivability Node shared between multiple site.

Serviceability

Monitoring

We monitor and manage the Enhanced Survivability Node like other nodes that are deployed in Dedicated Instance datacenter. During a survivability event, when the ESN is disconnected from the Cisco Cloud is when we lose access to the Node and automatically connect back when the outage is resolved, and connectivity is restored.

Certificate Management

We manage the UC application certificates and during the activation of the Enhanced Survivability Node we updated the Dedicated Instance Unified CM cluster certificate is updated with the ESN.


During the activation of the ESN from Control Hub there will be a restart of all registered devices as the certificate for the Unified CM Cluster will be updated with the multi-SAN certificates. Hence, we plan the maintenance period during the activation of the ESN from Control Hub. Refer, How to Activate Enhanced Survivability Node.

CDR

During the survivability event, the Enhanced Survivability Node stores all the CDR/CMR data locally. When the connectivity is restored, the data will be synced back to the Dedicated Instance Unified CM Publisher. The amount of data that can be stored is based on disk size of then Enhanced Survivability Node. The maximum disk allocation space that can set for CDR is 3328 MB. This can be with small to large CDR file size based on the CDR interval that is configured. The purge happens based on:

  • When the disk usage exceeds the allocated or configured disk space, then it deletes the processed records. If the disk usage remains to be more that is when the unprocessed records are also purged.

  • High Water Mark % that is configured in “CDR Management” settings, the CDR files will be purged. For for example, if the “High Water Mark %” is configured as 80% and the disk usage is 80% then the CDR files will be purged.

  • CDR / CMR files preservation Duration (days) that is configured in “CDR Management” settings, the CDR files will be purged. By default, it is set to 30 days.

RTMT Alarms

Following are the alerts in RTMT related to Enhanced Survivability Node:

  • SurvivabilityEvent- the alarm is triggered when all the Dedicated Instance nodes are not reachable from the Enhanced Survivability Node.

  • RemoteSurvivableNodeNotReachable - the alarm is triggered when an Enhanced Survivability Node is not reachable from the Dedicated Instance Unified CM publisher.

Performance Counter

During the survivability event, you need to connect RTMT to the Enhanced Survivability Node to monitor the performance of the ESN. The same will not be available, if RTMT is connected to the Dedicated Instance nodes as the ESN will not be reachable from the cloud during the survivability event.

Unified CM features and settings

User Settings

During normal operation, the database replication is fully meshed between all the servers including the Enhanced Survivability Node within the Unified CM cluster. The static configuration data, because it is created through moves, adds, and changes, is always stored on the publisher and replicated one way from the publisher to each subscriber and enhanced survivability node in the cluster.

During a survivability event, only the user-facing features are modified on the devices that are registered to the Enhanced Survivability Node and the user-facing features are typically characterized using the fact that you can enable or disable a feature directly on their phone by pressing one or more buttons, as opposed to changing a feature through a web-based GUI. So, the Enhanced Survivability Node allows self-care and web admin GUI as read-only operations. User’s devices registered to ESN is able to make changes to only the user facing features listed below during the failover. However, these changes will not be synced back to the DI Unified CM publisher when the connectivity is reestablished.

User-facing features are any features that can be enabled or disabled by pressing buttons on the phone and include the following:

  • Call Forward All (CFA)

  • Privacy Enable or Disable

  • Do Not Disturb(DND) Enable or Disable

  • Cisco Extension Mobility Login

  • Hunt-group Login or Logout

  • Device Mobility

  • CTI CAPF status for end users and application users.

Authentication

The authentication of soft clients (Cisco Jabber and Webex Application) for login during the failover to Enhanced Survivability Node is as follows:

  1. Local Authentication: When authentication of users is done locally within the Unified CM, during the survivability event the Enhanced Survivability Node will be able to authenticate the clients registered to it.

  2. LDAP Authentication: In this case the authentication of users is done using the local LDAP server. Then during the survivability event the authentication of soft clients will work provided the LDAP server is reachable from the Enhanced Survivability Node.


    You should ensure the LDAP directory’s reachability to ESN throughout the survivability event.

  3. Single Sign On (SSO) authentication: The SSO login authentication of users is done using the IDP server. Then during the survivability event the authentication of soft clients work provided the IDP server is reachable from the Enhanced Survivability Node.

    For SSO enabled Unified CM web UI login, the IDP reachability is required or the recovery-based URL login needs to be used.


    Already authenticated clients continue to be logged in as the authentication is based on the token that is obtained before the survivability event. However, for new logins when the client does not have a valid token from previous authentication, the ESN will redirect to the IDP server for authentication. Hence it is always necessary to ensure the IDP server’s reachability to ESN throughout the survivability event.

Media Resources

Media resources are required for basic Unified CM features, such as Music on Hold, Announcement, Conference Bridge (software) services must be enabled on the ESN. If Hardware based media resources were deployed, then during the survivability event you must make sure that the media servers are reachable from the ESN.

Emergency Calls

During normal operations of the DI Unified CM cluster, the emergency calls (particularly in AMER region) are routed through the RedSky cloud where there is a SIP trunk that is configured between the Dedicated Instnace unified CM cluster and RedSky cloud.

If there is a survivability event, the RedSky cloud will not be reachable from the ESN and hence it is required for you to configure the emergency Calling dial plan such that, if the RedSky is not available then to route the emergency calls through the local PSTN GW configured at that site. The route group must consist of the local PSTN GW to handle the call routing during the survivability event.


For emergency calls in other Dedicated Instance regions as well the dial plan needs to be configured to route the calls through Local PSTN GW during the survivability event.

Call Routing

Configure the dial plan for routing intrasite, intersite, inter-cluster, and PSTN calls during the survivability event. In general, the ESN can route calls only for devices that are registered to it. All other calls need to be routed to the Local PSTN GW (configured in every site where ESN is deployed) and from there to the PSTN. Following are few scenarios explained:

  • Phone 1 and Phone 2 registered to same ESN – The call is routed within the ESN.

  • Phone 1 registered to ESN and Phone 2 registered to Dedicated Instance Unified CM cluster – The dial plan should route the calls from ESN to the local PSTN GW, from there to the DI Unified CM via PSTN. During the survivability event, dial plan should detect the call routing failure and re-route the calls through the local PSTN GW. The same should be applicable for incoming calls to ESN from DI Unified CM devices.

  • Phone 1 registered to ESN and Phone 2 is a PSTN device: During survivability event, PSTN calls need to be routed to the local PSTN gateway. You must make sure that the dial plan has the capability to detect call routing failures and reroute the call through the available local PSTN gateway.


We do not recommend ICT calls between 2 ESN's nodes, although it is feasible when the ESNs are reachable within your network.

Voicemail and Auto Attendant

  • During the survivability event, when the connectivity from your site to Dedicated Instance cloud is down (WAN or Connectivity outage), the voicemail and auto attendant features will not work for the devices that registers to the ESN, as the Cisco Unity Connection server is hosted in the Dedicated Instance cloud to which the connectivity from ESN is down. If your device is configured with “Call Forward Unregistered (CFU)” and the call is received in DI Unified CM, then the caller is able to deposit a voicemail in the Dedicated Instance Unity Connection. Which can be retrieved when the devices fall back to DI unified CM subscribers.

  • However, during a survivability event when the connectivity to Dedicated Instance cloud is available but the Unified CM cluster in DI is down, in that case the voicemail and auto attendant features work for devices that are registered to the ESN, as ESN will have connectivity to the Unity Connection server deployed in DI cloud.

Mobile and Remote Access (MRA)

During the survivability event, the ESN will not be able to reach the Cisco Expressway E & C in DI cloud and vice versa. So, in this case the MRA users cannot get the service from ESN and hence will not be able to register. However, if the MRA device has internet and can connect to the Cisco Expressways in DI cloud, then it can register with the DI Unified CM provided the cluster in DI is functional.

Third-Party Integrations

CTI

For CTI based integrations to work with Enhanced Survivability Node, you must add Enhanced Survivability Node as part of CTI’s Server list. CTI enhancements is made for applications that use JTAPI to allow Enhanced Survivability Node as a CTI server to which application can connect only in the event when the primary or secondary CTI servers in the configured list is not reachable. During a normal operation, CTI applications on site can connect with the primary and secondary CTI servers in the DI cloud and during survivability event, they can connect with Enhanced Survivability Node for a continued CTI experience. Applications must adapt to the new APIs as exposed over JTAPI interface to ensure fallback from the Enhanced Survivability Node takes place when the connectivity is restored.

For more information on the new APIs added, refer the redundancy section, https://www.cisco.com/c/en/us/td/docs/voice_ip_comm/cucm/jtapi_dev/14_0_1/cucm_b_cisco-unified-jtapi-developers-guide-14/cucm_b_cisco-unified-jtapi-developers-guide-1251_chapter_00.html

AXL

AXL Web service is enabled in the Enhanced Survivability Node with read-only admin privileges. We recommend that any third party applications such as provisioning server to interface only with DI Unified CM publisher for any Database related updates. However, it is possible for these applications to read-only when connected to the Enhanced Survivability Node.

Third-Party SIP

Third party applications that interface through the SIP trunks supports with Enhanced Survivability Node. In the SIP trunk configurations, ‘run on all nodes’ configuration must be enabled.

Third-Party Phones

3rd party devices are supported which has Tertiary TFTP capability.