Resilience Architecture

The HA solution in XiVO makes it possible to maintain basic telephony function whether your main XiVO server is running or not. When running a XiVO HA cluster, users are guaranteed to never experience a downtime of more than 5 minutes of their basic telephony service.

The HA solution in XiVO is based on a 2-nodes “master and slave” architecture. In the normal situation, both the master and slave nodes are running in parallel, the slave acting as a “hot standby”, and all the telephony services are provided by the master node. If the master fails or must be shutdown for maintenance, then the slave node automatically takes over the telephony services. Supported telephony devices automatically communicate with the slave node instead of the master one. Once the master is up again, the slave node stops itself and the telephony devices failback to the master node.

Currently, resilience is supported with:

  • XIVO deployment

  • CC deployment

  • XDS deployment

Prerequisites

  • Phones must be able to reach the master and the slave

  • Master and Slave nodes must be in the same subnet

  • If firewalling, the master must be allowed to join the slave on ports 22 and 5432

  • If firewalling, the slave must be allowed to join the master with an ICMP ping

  • Trunk registration timeout (expiry) should be less than 300 seconds (5 minutes)

  • The slave must have no provisioning plugins installed.

The HA solution is guaranteed to work correctly with the following devices.

Resilience in XiVO deployement

../../_images/resilience_archi_xivo.png

In a simple XiVO deployment another XiVO is added in slave mode.

In slave mode the XiVO ping the master XiVO every minutes.

  • if the ping succeeds (master is up): the slave XiVO shutdown the telephony services

  • if more than 3 ping fails (master is down): the slave XiVO startup the telephony services

Resilience in CC deployement

../../_images/resilience_archi_cc.png

In a CC deployment another CC is configured with the slave XiVO.

The slave CC is up and running. When the slave XiVO startup the xuc must be manually restarted to be able to serve UC/CC applications.

Resilience in XDS deployement

../../_images/resilience_archi_xds.png

In an XDS deployment no other node is added.

MDS will be shared between the master and the slave XiVO.

When the slave starts the telephony services, it will register the inter-mds trunk with the other MDS so the inter-MDS call can work.

Resilience with Edge deployement

Note

Resilience can work with single Edge deployment

../../_images/resilience_archi_edge.png

Edge must know where the CC and the XiVO are located. Therefore a reconfiguration of the Edge is needed when slave takeover.

XiVO and CC slave must be configured with Edge information (mainly TURN secret).

Replication

Once master slave configuration is completed, XiVO configuration (DB and files) is replicated from the master node to the slave every hour (:00).

DB Replication

The replication does not copy the full XiVO configuration of the master. Notably, these are excluded:

  • All the network configuration except DHCP configuration (i.e. everything under the Configuration ‣ Network ‣ {Interfaces, Resolver, Mail} sections)

  • All the support configuration (i.e. everything under the Configuration ‣ Support section)

  • HA settings

  • Access Web Services configuration

  • Provisioning configuration

  • Voicemail messages

These event data are also excluded:

  • Queue logs

  • CELs

File Replication

The following directories will then be rsync’ed every hour:

  • /etc/asterisk/extensions_extra.d

  • /etc/xivo/asterisk

  • /var/lib/asterisk/agi-bin

  • /var/lib/asterisk/moh

  • /var/lib/xivo/certificates

  • /var/lib/xivo/sounds/acd

  • /var/lib/xivo/sounds/playback

Limitations

Architecture:

  • Since DHCP parameters are replicated, Master and Slave node MUST be on the same VoIP network.

When the master node is down, some features are not available and some behave a bit differently. This includes:

  • Call history / call records are not recorded.

  • Voicemail messages saved on the master node are not available.

  • Custom voicemail greetings recorded on the master node are not available.

  • Phone provisioning is disabled, i.e. a phone will always keep the same configuration, even after restarting it.

  • Phone remote directory is not accessible, because provisioned IP address points to the master.

Note that, on failover and on failback:

  • DND, call forwards, call filtering, …, statuses may be lost if changed recently.

  • If you are connected as an agent, then you might need to reconnect as an agent when the master goes down. Since it’s hard to know when the master goes down, if your CTI client disconnects and you can’t reconnect it, then it’s a sign the master might be down.

Additionally, only on failback:

  • Voicemail messages are not copied from the slave to the master, i.e. if someone left a message on your voicemail when the master was down, you won’t be able to consult it once the master is up again.

  • More generally, custom sounds are not copied back. This includes recordings.

Here’s the list of limitations that are more relevant on an administrator standpoint:

  • The master status is up or down, there’s no middle status. This mean that if Asterisk is crashed the XiVO is still up and the failover will NOT happen.