Resilience Architecture
The HA solution in XiVO makes it possible to maintain basic telephony function whether your main XiVO server is running or not. When running a XiVO HA cluster, users are guaranteed to never experience a downtime of more than 5 minutes of their basic telephony service.
The HA solution in XiVO is based on a 2-nodes “master and slave” architecture. In the normal situation, both the master and slave nodes are running in parallel, the slave acting as a “hot standby”, and all the telephony services are provided by the master node. If the master fails or must be shutdown for maintenance, then the slave node automatically takes over the telephony services. Supported telephony devices automatically communicate with the slave node instead of the master one. Once the master is up again, the slave node stops itself and the telephony devices failback to the master node.
Currently, resilience is supported with:
XIVO deployment
CC deployment
XDS deployment
Prerequisites
Phones must be able to reach the master and the slave
Master and Slave nodes must be in the same subnet
If firewalling, the master must be allowed to join the slave on ports 22 and 5432
If firewalling, the slave must be allowed to join the master with an ICMP ping
Trunk registration timeout (
expiry) should be less than 300 seconds (5 minutes)The slave must have no provisioning plugins installed.
The HA solution is guaranteed to work correctly with the following devices.
Resilience in XiVO deployement
In a simple XiVO deployment another XiVO is added in slave mode.
In slave mode the XiVO ping the master XiVO every minutes.
if the ping succeeds (master is up): the slave XiVO shutdown the telephony services
if more than 3 ping fails (master is down): the slave XiVO startup the telephony services
Resilience in CC deployement
In a CC deployment another CC is configured with the slave XiVO.
The slave CC is up and running. When the slave XiVO startup the xuc must be manually restarted to be able to serve UC/CC applications.
Resilience in XDS deployement
In an XDS deployment no other node is added.
MDS will be shared between the master and the slave XiVO.
When the slave starts the telephony services, it will register the inter-mds trunk with the other MDS so the inter-MDS call can work.
Resilience with Edge deployement
Note
Resilience can work with single Edge deployment
Edge must know where the CC and the XiVO are located. Therefore a reconfiguration of the Edge is needed when slave takeover.
XiVO and CC slave must be configured with Edge information (mainly TURN secret).
Replication
Once master slave configuration is completed, XiVO configuration (DB and files) is replicated from the master node to the slave every hour (:00).
DB Replication
The replication does not copy the full XiVO configuration of the master. Notably, these are excluded:
All the network configuration except DHCP configuration (i.e. everything under the sections)
All the support configuration (i.e. everything under the section)
HA settings
Access Web Services configuration
Provisioning configuration
Voicemail messages
These event data are also excluded:
Queue logs
CELs
File Replication
The following directories will then be rsync’ed every hour:
/etc/asterisk/extensions_extra.d
/etc/xivo/asterisk
/var/lib/asterisk/agi-bin
/var/lib/asterisk/moh
/var/lib/xivo/certificates
/var/lib/xivo/sounds/acd
/var/lib/xivo/sounds/playback
Limitations
Architecture:
Since DHCP parameters are replicated, Master and Slave node MUST be on the same VoIP network.
When the master node is down, some features are not available and some behave a bit differently. This includes:
Call history / call records are not recorded.
Voicemail messages saved on the master node are not available.
Custom voicemail greetings recorded on the master node are not available.
Phone provisioning is disabled, i.e. a phone will always keep the same configuration, even after restarting it.
Phone remote directory is not accessible, because provisioned IP address points to the master.
Note that, on failover and on failback:
DND, call forwards, call filtering, …, statuses may be lost if changed recently.
If you are connected as an agent, then you might need to reconnect as an agent when the master goes down. Since it’s hard to know when the master goes down, if your CTI client disconnects and you can’t reconnect it, then it’s a sign the master might be down.
Additionally, only on failback:
Voicemail messages are not copied from the slave to the master, i.e. if someone left a message on your voicemail when the master was down, you won’t be able to consult it once the master is up again.
More generally, custom sounds are not copied back. This includes recordings.
Here’s the list of limitations that are more relevant on an administrator standpoint:
The master status is up or down, there’s no middle status. This mean that if Asterisk is crashed the XiVO is still up and the failover will NOT happen.