Changes in the X-Road Central Server High Availability Support

Central Server is one of the key components of the X-Road ecosystem. It contains a registry of X-Road member organisations and their Security Servers. In addition, the Central Server contains the security policy of the X-Road instance that includes list of trusted certification authorities, list of trusted time-stamping authorities and configuration parameters. Both the member registry and the security policy are made available to the Security Servers via HTTP protocol. This distributed set of data forms the global configuration that the Security Servers use for mediating messages sent via X-Road. An X-Road operator is responsible for operating the Central Server.

Image 1. X-Road architecture and roles.

Image 1. X-Road architecture and roles.

To be able to mediate messages Security Server must have a valid copy of the global configuration available all the time. Security Server downloads the global configuration from Central Server regularly and uses a local copy while processing messages. Security Server remains operational as long as it has a valid copy of the global configuration available locally. This means that Central Server may be unavailable for a limited time period without causing any downtime to the ecosystem. However, registering new members or subsystems is not possible without Central Server. Both the download interval and global configuration validity period can be configured according to the requirements of the X-Road ecosystem.

Design for Failure

An X-Road ecosystem is very fault tolerant against Central Server failures even with one Central Server node only. However, critical information systems should always be designed for failure so that they remain operational despite of a failure of individual components.

Central Server supports high availability through clustering that provides additional fault tolerance and scalability from performance point of view. A Central Server cluster consists of two or more Central Server nodes. The cluster is based on active-active model which means all the nodes can be used for both read and write operations. In case one of the nodes fails, Security Servers are able to fail over to other available nodes.

Why Changes Are Needed?

Until X-Road version 6.22 the clustering implementation was based on asynchronous, active-active database replication between the nodes. Unfortunately, the technology that was used in the implementation reached its end-of-life in December 2019 and newer versions of the same technology are not available under an open source license. Therefore, there was no other choice than to give up the BDR plugin for PostgreSQL by 2ndQuadrant and update the high availability support implementation for Central Server. Continuing with a newer version of the BDR plugin for PostgreSQL would have meant that every X-Road operator using clustering was required to buy a commercial license for the plugin.

Image 2. Central Server high availability implementation until version 6.22.

Image 2. Central Server high availability implementation until version 6.22.

What Will Change?

Starting from version 6.23 the Central Server high availability implementation is based on a shared, optionally highly available database. Before version 6.23 every Central Server node in a cluster had its own database and changes were synchronized using multi-master database replication between the nodes. X-Road provided tools to setup the cluster and replication between the nodes. Starting from version 6.23 all the Central Server nodes share the same database that can be a standalone database, a database cluster, a fully maintained database service in the cloud etc. X-Road provides instructions how to configure the Central Server nodes in the cluster, but implementing high availability of the database is out of X-Road’s scope. However, the documentation provides instruction for setting up a replicated PostgreSQL database, but the documentation does not cover automatic failover.

Image 3. Central Server high availability implementation starting from version 6.23.

Image 3. Central Server high availability implementation starting from version 6.23.

Compared to the previous implementation the new implementation is more flexible, because it gives the X-Road operator the freedom to choose how high availability is implemented on the database level. Instead, the previous implementation was tied to the BDR plugin for PostgreSQL. At the same time, more flexibility also brings more responsibility as implementing the high availability of the database is now the X-Road operator’s responsibility.

Available Resources

The official X-Road documentation provides an updated Central Server High Availability Installation Guide. In addition, the X-Road Knowledge Base provides an article about migrating Central Server clusters from version 6.22 to version 6.23. It is highly recommended for all the X-Road operators to read these documents before updating clustered Central Servers to version 6.23.

Try It Out!

X-Road 6.23.0-beta is now available for testing and the production version will be released by the end of February 2020. We wish to receive feedback about the new version and/or any possible challenges regarding migration to the new version.