.. _high-availability:
**********************
High Availability (HA)
**********************
The :abbr:`HA (High Availability)` solution in XiVO makes it possible to maintain basic telephony
function whether your main XiVO server is running or not. When running a XiVO HA cluster, users are
guaranteed to never experience a downtime of more than 5 minutes of their basic telephony service.
The HA solution in XiVO is based on a 2-nodes "master and slave" architecture. In the normal
situation, both the master and slave nodes are running in parallel, the slave acting as a "hot
standby", and all the telephony services are provided by the master node. If the master fails or
must be shutdown for maintenance, then the telephony devices automatically communicate with the
slave node instead of the master one. Once the master is up again, the telephony devices failback to
the master node. Both the failover and the failback operation are done automatically, i.e. without
any user intervention, although an administrator might want to run some manual operations after
failback as to, for example, make sure any voicemail messages that were left on the slave are copied
back to the master.
Prerequisites
=============
The HA in XiVO only works with telephony devices (i.e. phones) that support
the notion of a primary and backup telephony server.
* Phones must be able to reach the master and the slave
* Master and Slave nodes must be in the same subnet
* If firewalling, the master must be allowed to join the slave on ports 22 and 5432
* If firewalling, the slave must be allowed to join the master with an ICMP ping
* Trunk registration timeout (``expiry``) should be less than 300 seconds (5 minutes)
* The slave must have **no** provisioning plugins installed.
The HA solution is guaranteed to work correctly with `the following devices `_.
Quick Summary
=============
* You need two configured XiVO (wizard passed)
* Configure one XiVO as a master -> setup the slave address (VoIP interface)
* Restart services (xivo-service restart) on master
* Configure the other XiVO as a slave -> setup the master address (VoIP interface)
* Configure file synchronization by runnning the script ``xivo-sync -i`` on the master
* Start configuration synchronization by running the script ``xivo-master-slave-db-replication
`` on the master
* Resynchronize all your devices
That's it, you now have a HA configuration, and every hour all the configuration done on the master
will be reported to the slave.
Configuration Details
=====================
First thing to do is to :ref:`install 2 XiVO `.
.. important:: When you upgrade a node of your cluster, you must also upgrade the other so that
they both are running the same version of XiVO. Otherwise, the replication might not work
properly.
You must configure the :abbr:`HA (High Availability)` in the Web interface
(:menuselection:`Configuration --> Management --> High Availability` page).
You can configure the master and slave in whatever order you want.
You must also run ``xivo-sync -i`` on the master to setup file synchronization. Running ``xivo-sync
-i`` will create a passwordless SSH key on the master, stored under the :file:`/root/.ssh` directory,
and will add it to the :file:`/root/.ssh/authorized_keys` file on the slave.
.. note:: If you want to try the ssh logging as advised by the ssh-copy-id script, you must select
the new key to be used by ssh: ``ssh -i /root/.ssh/xivo_id_rsa root@``
The following directories will then be rsync'ed every hour:
* /etc/asterisk/extensions_extra.d
* /etc/xivo/asterisk
* /var/lib/asterisk/agi-bin
* /var/lib/asterisk/moh
* /var/lib/xivo/certificates
* /var/lib/xivo/sounds/acd
* /var/lib/xivo/sounds/playback
.. warning:: When the HA is configured, some changes will be automatically
made to the configuration of XiVO.
SIP expiry values configuration
-------------------------------
Reminder:
* Minimum expiry (minexpiry) : minimum amount of time, in seconds, allowed for a registration or subscription
* Maximum expiry (maxexpiry) : maximum amount of time, in seconds, until a peer’s registration expires
* Default expiry time (defaultexpiry) : default SIP registration expiration time, in seconds, for incoming and outgoing registrations
More infos on `Asterisk 20 `_
After automatic update:
* minexpiry: 3 minutes
* maxexpiry: 5 minutes
* defaultexpiry: 4 minutes
.. important::
You have to **revert those variables** to their default values. The reasons they were automatically updated are relevant no longer.
Give them back their default values as shown below.
**SIP expiry values to (re)set**:
* minexpiry: 1 minute
* maxexpiry: 1 hour
* defaultexpiry: 2 minutes
.. figure:: images/general_settings_sip_expiry.png
:menuselection:`Services --> IPBX --> General Settings --> SIP Protocol`
Provisioning configuration
--------------------------
The provisioning server configuration will be automatically updated in order to allow
phones to switch from XiVO power failure.
.. figure:: images/provd_config_registrar.png
:menuselection:`Configuration --> Provisioning --> Template Line --> Edit default`
.. warning:: Do not change these values when the HA is configured, as this may cause problems.
These values will be reset to blank when the HA is disabled.
.. important:: For the telephony devices to take the new proxy/registrar settings
into account, you must :ref:`resynchronize the devices `
or restart them manually.
Disable node
------------
Default status of :abbr:`High Availability (HA)` is disabled:
.. warning:: You should not disable an HA node in production as it will break the configuration
and restart some services.
.. figure:: images/ha_dashboard_disabled.png
HA Dashboard Disabled (default state)
.. important:: You have to restart services (xivo-service restart) once the master node is disabled.
Master node
-----------
In choosing the method ``Master`` you must enter the IP address **of the VoIP interface** of the slave node.
.. figure:: images/ha_dashboard_master.png
HA Dashboard Master
.. important:: You have to restart all services (xivo-service restart) once the master node is configured.
Slave node
----------
In choosing the method ``Slave`` you must enter the IP address **of the VoIP interface** of the master node.
.. figure:: images/ha_dashboard_slave.png
HA Dashboard Slave
Replication Configuration
-------------------------
Once master slave configuration is completed, XiVO configuration is replicated from the master node
to the slave every hour (:00).
Replication can be started manually by running the replication scripts on the master::
xivo-master-slave-db-replication
xivo-sync
The replication does not copy the full XiVO configuration of the master. Notably, these
are **excluded**:
* All the network configuration **except DHCP configuration** (i.e. everything under the
:menuselection:`Configuration --> Network --> {Interfaces, Resolver, Mail}` sections)
* All the support configuration (i.e. everything under the
:menuselection:`Configuration --> Support` section)
* HA settings
* Provisioning configuration
* Voicemail messages
These event data are also excluded:
* Queue logs
* CELs
.. _ha_interconnection_with_cc:
Interconnection with XiVO CC
----------------------------
Queue logs and CELs are not replicated from the master node to the slave. Instead, both servers have their own event data.
Thanks to it you can install DB Replic on slave and run it in a special HA mode to replicate only queue logs and CELs to XiVO CC:
* Edit the ``/etc/docker/xivo/custom.env`` file on slave:
* Set ``REPORTING_DB_HOST`` address
* Set ``ELASTICSEARCH`` address
* Set ``IS_HA_SLAVE=true``
* Create log directory ``/var/log/xivo-db-replication`` with owner ``daemon:daemon``
* Install the configuration package: ``apt-get install xivocc-docker-components``
* Enable DB Replic: ``touch /var/lib/xivo/xc_enabled``
* Start DB Replic: ``xivo-service start``
* Refresh monitoring: ``/usr/sbin/xivo-monitoring-update``
Internals
=========
4 scripts are used to manage services and data replication.
* ``xivo-master-slave-db-replication `` is used on the master to replicate the master's
data on the slave server. It runs on the master.
* ``xivo-manage-slave-services {start,stop}`` is used on the slave to start, stop monit and asterisk.
The services won't be restarted after an upgrade or restart.
* ``xivo-check-master-status `` is used to check the status of the master and enable or
disable services accordingly.
* ``xivo-sync`` is used to sync directories from master to slave.
Limitations
===========
Architecture:
* Since DHCP parameters are replicated, Master and Slave node MUST be on the same VoIP network.
When the master node is down, some features are not available and some behave a bit
differently. This includes:
* Call history / call records are not recorded.
* Voicemail messages saved on the master node are not available.
* Custom voicemail greetings recorded on the master node are not available.
* Phone provisioning is disabled, i.e. a phone will always keep the same configuration, even after
restarting it.
* Phone remote directory is not accessible, because provisioned IP address points to the master.
Note that, on failover and on failback:
* DND, call forwards, call filtering, ..., statuses may be lost if changed recently.
* If you are connected as an agent, then you might need to reconnect as an agent
when the master goes down. Since it's hard to know when the master goes down,
if your CTI client disconnects and you can't reconnect it, then it's a sign
the master might be down.
Additionally, only on failback:
* Voicemail messages are not copied from the slave to the master, i.e. if someone
left a message on your voicemail when the master was down, you won't be able to
consult it once the master is up again.
* More generally, custom sounds are not copied back. This includes recordings.
Here's the list of limitations that are more relevant on an administrator standpoint:
* The master status is up or down, there's no middle status. This mean that if Asterisk is crashed
the XiVO is still up and the failover will NOT happen.
Berofos Integration
===================
.. toctree::
:maxdepth: 2
berofos
Troubleshooting
===============
When replicating the database between master and slave, if you encounter problems related to the
system locale, see :ref:`postgresql_localization_errors`.