Resisting Failure

X-Road is based on a distributed architecture, which makes it extremely resilient to failure. Data is always exchanged directly between a service provider and a service consumer without third parties or intermediaries having access to it. Each data exchange party may have one or more Security Servers, but data is always exchanged between two Security Servers in a one-to-one fashion. Therefore, the failure of a single Security Server only affects services available on the Security Server in question, and all other Security Servers and services of the ecosystem remain unaffected.

Image 1. X-Road is based on a distributed architecture.

Image 1. X-Road is based on a distributed architecture.

Despite the distributed architecture, an X-Road ecosystem includes components that affect the availability of all the Security Servers. The good news is that the resiliency of the ecosystem against a failure of those components can be controlled and adjusted using various measures. What are the components, and how the X-Road ecosystem can be protected against their failures? Let’s find out!

Central Server

The Central Server is one of the critical components of the X-Road ecosystem. It contains a registry of X-Road member organisations and their Security Servers. Also, the Central Server contains the security policy of the X-Road instance that includes a list of trusted certification authorities, a list of trusted time-stamping authorities, and configuration parameters. Both the member registry and the security policy are made available to the Security Servers via HTTP protocol. This distributed set of data forms the global configuration that the Security Servers use for mediating messages sent via X-Road.

To be able to mediate messages, the Security Server must have a valid copy of the global configuration available all the time. The Security Server downloads the global configuration from the Central Server regularly and uses a local copy while processing messages. The Security Server remains operational as long as it has a valid copy of the global configuration available locally. It means that the Central Server may be unavailable for a limited time without causing any downtime to the ecosystem. However, registering new members or subsystems is not possible without the Central Server.

By default, the Security Server refreshes the global configuration every 60 seconds, and the configuration is valid for 10 minutes. It means that the Central Server may be unavailable for 9 minutes without affecting the ecosystem. Once the local copies of the global configuration on Security Servers expire, the message processing stops. When the Central Server starts to publish the global configuration again, the message processing continues. However, the Security Server does not queue messages or provide support for resending failed messages. It’s a service consumer’s responsibility to resend any failed messages regardless of the reason for the failure.

The global configuration download interval is configured using the “configuration-client.update-interval” parameter on the Security Server, and the default value can be overridden locally by the Security Server administrator. Instead, the global configuration validity period is configured using the “confExpireIntervalSeconds” parameter on the Central Server by the X-Road operator, and it cannot be changed on the Security Server. Therefore, all the Security Servers that are registered to the same X-Road ecosystem respect the same global configuration validity period. The download interval and global configuration validity period should be configured according to the requirements of the X-Road ecosystem. However, it is highly recommended to increase the global configuration validity period from minutes to hours or days.

Besides, the Central Server supports high availability through clustering. A Central Server cluster consists of two or more Central Server nodes. In case one of the nodes fails, the Security Server can failover to other available nodes. In a clustered environment, only a simultaneous problem with all the Central Server nodes would cause a situation where there isn’t a valid version of the global configuration available.

OCSP responder service

A certification authority (CA) issues certificates to Security Servers (authentication certificates) and X-Road member organizations (signing certificates). Authentication certificates are used for securing the connection between two Security Servers. Signing certificates are used for digitally signing the messages sent by X-Road members. Only certificates issued by trusted certification authorities that are defined on the Central Server by the X-Road operator can be used. The information about trusted certification authorities is distributed to the Security Servers in the global configuration.

The Security Server checks the validity of the signing and authentication certificates via the Online Certificate Status Protocol (OCSP, RFC 6960). An OCSP responder service providing the status information is maintained by the certificate authority that issued the certificates. Each Security Server is responsible for querying the validity information of its certificates and then sharing the information with other Security Servers as a part of the message exchange process. Only Security Servers with valid authentication certificates and members with valid signing certificates can exchange messages. If the validity information is not available or a certificate is not valid, the message exchange fails. 

To be able to mediate messages, the Security Server must have valid copies of authentication and sign certificates’ OCSP responses all the time. The Security Server downloads the OCSP responses from the OCSP responder service regularly and uses the local copies while processing messages. The Security Server remains operational as long as it has valid copies of the OCSP responses available locally, and the certificates are valid. This means that the OCSP responder service may be unavailable for a limited time without causing any downtime to the ecosystem. The period that the OCSP responder may be unavailable without affecting the ecosystem depends on various factors.

The Security Server fetches new OCSP responses using a fixed interval that is 20 minutes by default. The fetch interval is configured on the Central Server using the “ocspFetchInterval” configuration parameter by the X-Road operator. An OCSP response is considered expired by the Security Server if it was issued too far in the past OR there’s already new status information available. The validity period is defined on the Central Server using the “ocspFreshnessSeconds” configuration parameter by the X-Road operator. By default, the Security Server considers an OCSP response expired if there’s new status information available – meaning that the “nextUpdate” attribute in the OCSP response is in the past. However, the “nextUpdate” attribute can be omitted so that “ocspFreshnessSeconds” alone defines the validity period of an OCSP response. Omitting the “nextUpdate” attribute is done on the Central Server using the “verifyNextUpdate” configuration parameter by the X-Road operator.

All in all, an X-Road ecosystem’s resiliency to failures of an OCSP responder service is controlled through three configuration parameters that are all set on the Central Server by the X-Road operator and distributed to the Security Servers in the global configuration. The “ocspFetchInterval” parameter defines how often the OCSP responses are refreshed, and the “ocspFreshnessSeconds” parameter specifies the validity period of the responses, and the “verifyNextUpdate” parameter defines whether the “nextUpdate” attribute in the OCSP response is omitted. The most resilient configuration can be achieved by keeping the fetch interval short, the validity period long, and ignoring the “nextUpdate” attribute. Besides, when the “nextUpdate” attribute is omitted, it’s also possible to increase the validity period during a service break of the OCSP responder service, which buys more time to solve the problem without affecting the ecosystem. 

Besides, after the first failed OCSP request, the Security Server switches from the regular OCSP fetching interval to a failure mode during which fetching OCSP responses is attempted once a minute, by default. After the first successful OCSP request, the Security Server switches back to the regular interval.

It’s also good to be aware that not all OCSP responder services include the “nextUpdate” attribute in their OCSP responses. Usually, OCSP responder services that are based on a certification revocation list (CRL) include the attribute, but real-time OCSP services don’t. A CRL based OCSP service reads certificate statuses from a static CRL that’s refreshed regularly. In contrast, a real-time OCSP service checks certificate statuses in real-time. In case the “nextUpdate” attribute is missing from the OCSP response, the “ocspFreshnessSeconds” parameter alone defines the validity period for the response just like when the “nextUpdate” attribute is omitted using the “verifyNextUpdate” parameter.

When considering the values for the three parameters, it’s essential to consider how the values affect the evidential value of the logged messages. Since the OCSP response of the signing certificate is used to check the validity of the certificate that’s used to sign messages, the age of the OCSP response may affect the validity of the signature. Therefore, it is vital to understand the consequences that enabling the use of old OCSP responses may legally have. From a technical perspective, it is equally important that the values of the three configuration parameters are aligned with each other and the policies of the certificate authority. For example, the “ocspFetchInterval” parameter must be smaller than the “ocspFreshnessSeconds” parameter, or otherwise, the Security Server considers the responses expired before new ones are fetched.

Time-stamping service 

All the messages sent via X-Road are time-stamped and logged by the Security Server. The purpose of the time-stamping is to certify the existence of data items at a certain point in time. A time-stamping authority (TSA) provides a time-stamping service that the Security Server uses to time-stamp all the incoming/outgoing requests/responses. Only trusted TSAs that are defined on the Central Server by the X-Road operator can be used. The information about trusted TSAs is distributed to the Security Servers in the global configuration. The approved time-stamping authorities must implement the time-stamping protocol (RFC 3161) supported by X-Road.

By default, X-Road uses batch time-stamping, which means that new messages that have been processed since the previous batch time-stamping and do not have a time-stamp yet, are time-stamped once a minute. The time-stamping interval is defined on the Central Server using the “timeStampingIntervalSeconds” parameter by the X-Road operator, and it cannot be changed on the Security Server. If the time-stamping fails, the Security Server continues to process messages until the acceptable time-stamping failure limit is reached. By default, the limit is 4 hours, and it’s configured on the Security Server using the “message-log.acceptable-timestamp-failure-period” parameter. The default value can be overridden locally by the Security Server administrator. When the limit is reached, the Security Server quits processing messages. When the time-stamping service becomes available again, all the messages missing a time-stamp are time-stamped, and the Security Server continues normal operations.

Besides, after the first failed time-stamping attempt, the Security Server switches from the regular time-stamping interval to a failure mode during which time-stamping is attempted once a minute, by default. After the first successful time-stamp, the Security Server switches back to the regular time-stamping interval.

The Security Server also supports automatic failover between time-stamping services if it has more than one configured time-stamping service. It means that the Security Server tries time-stamping with all the configured services until time-stamping succeeds or all the configured services have failed. The behavior is repeated for every batch.

Alternatively, the Security Server can be configured to time-stamp messages synchronously. It means that every message is time-stamped immediately, and if time-stamping the message fails, processing the message fails too. In case a security policy requires that every processed message is time-stamped within a defined time window, this configuration option can be used to guarantee it. However, the downside of synchronous time-stamping is that it increases the load of the time-stamping service tremendously compared to batch time-stamping. When the batch time-stamping is used, the load does not depend on the number of messages exchanged over the X-Road. Instead, it depends on the number of Security Servers in the system. Another downside of the synchronous time-stamping is that it increases the processing time of each message since the time-stamping is done synchronously as a part of the message processing flow. It means that four time-stamping operations are added to the end to end processing time of each message.

All in all, an X-Road ecosystem’s resiliency to failures of a time-stamping service is managed through different factors. The time-stamping interval and number of available time-stamping services are defined on the Central Server by the X-Road operator. Instead, the acceptable time-stamping failure period and the time-stamping mode (batch / synchronous) are defined on the Security Server, and the default values can be overridden locally by the Security Server administrator. The most resilient configuration can be achieved by using batch time-stamping, keeping the time-stamping interval short, keeping the acceptable time-stamping failure period long, and configuring multiple time-stamping services on the Security Server.

However, just like with the OCSP related configuration, it’s essential to consider how the selected values affect the evidential value of the logged messages. For example, the age of a time-stamp may affect its evidential value from a legal perspective. Also, whether it is acceptable to have messages without a valid time-stamp must be considered, and the time-stamping mode (batch / synchronous) should be selected accordingly.

Conclusions

An X-Road ecosystem is exceptionally resilient to failure. Different components may fail separately or at the same time, and the ecosystem is still capable of processing messages and transferring data. How long a single component may be unavailable without affecting the ecosystem depends on the configuration of the ecosystem and the configuration of individual Security Servers. The X-Road operator is responsible for defining and managing the ecosystem’s configuration. Still, the Security Server administrators may define some configuration items locally since the requirements may vary between organisations and Security Servers.

The values of different configuration items vary between X-Road ecosystems, and they depend on the requirements and constraints regarding availability, the evidential value of the logs, costs, etc. Also, financial factors play a role when defining the OCSP fetch interval and time-stamping interval since some commercial trust service providers request a transaction-based fee for the use of their services. In those cases, costs can be optimized by adjusting the intervals without forgetting the legal requirements regarding the age of the OCSP responses and time-stamps. All in all, the configuration should be in balance between different requirements and constraints. Sometimes it may require compromises between objectives.

Security Server Sidecar (part 3)

This is a series of blog posts about X-Road® and containers. The first part provides an introduction to containers and container technologies in general. The second part concentrates on the challenges in containerizing the Security Server. The Security Server Sidecar – a containerized version of the Security Server – is discussed in the third part.

Security Server Sidecar is a containerized version of the Security Server that supports production use. The Sidecar is a Docker container that runs in the same virtual context (virtual host, Kubernetes Pod, etc.) with an information system. The containerized approach makes running the Security Server more cost-effective since no separate host server needs to be allocated for each Security Server. Besides, more and more information systems exchanging data using X-Road are running in containers too, so it’s beneficial to be able to run the Security Server on the same platform with the information systems that are connected to it.

The Sidecar solves the challenges related to running the Security Server in a container, and it uses the standard release versions of the Security Server software. In other words, the Sidecar is built from pre-built packages of the official X-Road releases, and it is a separate project that builds on the X-Road core.

The Sidecar project

From an administrative perspective, Security Server Sidecar is a project of the Finnish Digital Agency (DVV) that is implemented in collaboration with NIIS. The DVV owns the project, and NIIS is responsible for coordinating the daily development activities. All the deliverables are released on NIIS’s GitHub and Docker Hub accounts. The project is currently ongoing, and it will be completed by the end of 2020.

The project will produce a Security Server Sidecar Docker image with a couple of alternative configurations. The Sidecar slim is a lightweight version of the Security Server, and it does not include message log, operational monitoring, and environmental monitoring modules. It means that the slim version does not log messages or provide any monitoring capabilities. However, technically it can be used for both consuming and providing services if the capabilities mentioned before are not required.

Instead, the regular Sidecar includes message log, operational monitoring, and environmental monitoring modules too. Similarly, the regular Sidecar can be used for both consuming and producing services. The Sidecar slim is a lightweight version of the Security Server while the regular Sidecar provides all the features of a full-blown Security Server installation. Besides, also versions with country-specific meta-packages are available. Currently, the only country-specific configuration available is the Finnish meta-package.

In addition to the Security Server Sidecar Docker image, the project also produces documentation to support the use of the image. The documentation will cover best practices and examples of how to run the image on a Kubernetes cluster using Elastic Kubernetes Service (EKS) on the Amazon Web Services (AWS) cloud platform.

What is a sidecar? 

In general, the sidecar is a design pattern commonly used in a microservices architecture. A sidecar is an additional component that is attached to a parent application to extend its functionalities. The pattern aims to divide the functionalities of an application into separate processes. The approach allows adding new capabilities to an application without changing the application itself. In this way, a sidecar is loosely coupled with the application. For example, logging and monitoring are functionalities that are often implemented using a sidecar.

Applying the sidecar pattern to the Security Server

When using the regular Security Server version on a Linux host, it’s strongly recommended that the Security Server is running on its own host and not on the same host with an information system that is connected to it. It means that at least two separate hosts are required. Instead, the idea of the sidecar architecture pattern is that an application and a sidecar run on the same host or context, close to each other. With the containerized Security Server, the goal can be achieved since the Security Server is packaged in a container that runs in its own isolated process.

Image 1. The Security Server Sidecar and an application in the same virtual context.

Image 1. The Security Server Sidecar and an application in the same virtual context.

The original idea of the sidecar pattern is that multiple copies of the same sidecar are attached to the application so that each instance of the application has its own sidecar. In case different applications use the same sidecar, the same approach applies to all the applications and their instances.

Image 2. A single Security Server Sidecar instance is shared between multiple instances of an application, and between different applications.

Image 2. A single Security Server Sidecar instance is shared between multiple instances of an application, and between different applications.

Despite its name, the original sidecar pattern does not work very well with the Security Server Sidecar since the Sidecar requires the same configuration and registration process as the regular Security Server. Also, even if the Security Server is containerized, the footprint of the Sidecar container is still relatively massive compared to the footprint of average containers. Therefore, it’s recommended that a single Sidecar container is shared between multiple instances of the application, and it may also be shared between different applications too. For high availability and scalability, a Sidecar cluster consisting of a primary node and multiple secondary nodes can be considered. Let’s take a better look at the different deployment alternatives next.

Running the Sidecar on Kubernetes

The Sidecar can be deployed to different container management systems, thanks to standardization. One of the most popular container management systems is Kubernetes, which is available as a service on multiple cloud platforms, such as Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform. Kubernetes is open-source, which means that it can be used in on-premise and private cloud environments too. In this case, we’re going to concentrate on running the Sidecar on AWS Elastic Kubernetes Service (EKS). During the development project, the Sidecar has been tested using the Docker Engine and EKS. 

Since the Security Server is a stateful application, it is required that an external database and persistent file storage are used in all the deployment alternatives. In this case, Amazon Relational Database Service (RDS) is used for the Security Server databases, and a Kubernetes persistent volume is used to store the configuration files.

Before going into the deployment models, a few words about Kubernetes and Pods since Pods play an essential role in the deployment models. In Kubernetes, a Pod is a group of one or more containers that run in a shared context, share the same storage and network resources, and share a specification of how the containers are run. Each Pod runs a single instance of an application, and scaling the application horizontally means using multiple Pods, one for each instance of the application.

Deployment is another Kubernetes concept that’s essential to the Sidecar deployment models. A Kubernetes deployment represents an application that consists of a set of identical Pods. The deployment specification defines the configuration of the Pods and the number of replicas to run. The deployment maintains the Pods and monitors that there’s a correct number of Pods running. Also, it’s possible to create a horizontal autoscaler for a deployment that automatically scales the number of running Pods based on the selected metrics.

Security Server as a sidecar

Image 3. The Security Server Sidecar as a real sidecar inside the same Pod with an information system.

Image 3. The Security Server Sidecar as a real sidecar inside the same Pod with an information system.

The first alternative is to deploy the Sidecar as a real sidecar, which means deploying it in the same Pod with an information system. It is a feasible approach if there’s always only one Pod running, and other information systems do not need to access the Security Server. The information system can be a service consumer, service producer, or both.

In case the information system must be scaled horizontally, this approach does not work very well. The reason is that adding a new Pod means that the new Security Server running in the Pod must always be configured and registered before it can be used. Since the onboarding process of the Security Server may take from days to weeks, the approach is not feasible. Also, deploying a Security Server for each Pod would generate quite much overhead from the resource consumption perspective.

Single Security Server

Image 4. The Security Server Sidecar in its own Pod and shared by multiple information systems.

Image 4. The Security Server Sidecar in its own Pod and shared by multiple information systems.

When multiple information systems or several instances of the same information system need to access the Security Server, it’s better to deploy the Security Server in a separate deployment using a single Pod. In this way, the information systems can be scaled independently from the Security Server, and some of them might even be running outside of the AWS EKS cluster. However, the number of Security Server instances is limited to one. Since the external database and persistent volume are used, Kubernetes can automatically recover the Security Server Pod in case of failures. Also, in this case, the information systems can be service consumers, service producers, or both.

In case multiple Security Servers are required because of high availability and/or scalability, there are two alternatives: multiple independent Security Servers or a Security Server cluster. Multiple independent Security Servers mean deploying several Security Servers, with each of them having a unique identity. This approach provides high availability. Instead, a Security Server cluster means deploying a group of Security Servers that share the same identity and that are accessed through an external load balancer. The cluster provides both high availability and scalability. More information about X-Road’s load balancing alternatives can be found here.

Multiple Security Servers

Image 5. Multiple instances of the Security Server Sidecar shared by several information systems.

Image 5. Multiple instances of the Security Server Sidecar shared by several information systems.

Deploying multiple independent Security Servers provides high availability but not scalability from a performance point of view. In this setup, multiple Security Servers with unique identities are deployed as separate, independent applications. In practice, Security Servers are deployed using separate deployments, which means that they have their own run specifications. Besides, the number of the Security Server Pods within a deployment is limited to one for each Security Server. Adding a new Security Server to the setup means creating a new deployment plus configuring and registering the newly created Security Server.

The information systems can be service consumers, service producers, or both. Service consumers may connect to the Security Servers directly, or there may be another load balancer between the consumer information systems and the Security Servers (omitted in the diagram). In the case of service producers, Security Server’s internal load balancing enables publishing services on multiple Security Servers and routing service requests to all of them. However, the configuration (e.g., available services, access rights) must be manually synchronized between Security Servers providing the service. Only the Security Server cluster provides automatic synchronization between Security Servers.

Security Server cluster

Image 6. Security Server Sidecar cluster with an external load balancer shared by multiple information systems.

Image 6. Security Server Sidecar cluster with an external load balancer shared by multiple information systems.

A Security Server cluster provides both high availability and scalability. It consists of a primary node and one or more secondary nodes that all share the same configuration and identity. In this setup, the primary node is used to manage the cluster, and it does not process messages. In practice, configuration changes are done on the primary node, and they’re automatically replicated to the secondary nodes. Replication covers the configuration database and configuration files. Changing the configuration on the secondary nodes is blocked. The secondary nodes are connected to a load balancer that distributes incoming traffic between them. Further implementation details of the Security Server cluster on Kubernetes are studied in more detail in the Sidecar project.

The information systems can be service consumers, service producers, or both. Service consumers may connect to the secondary nodes directly, or there may be another load balancer between the consumer information systems and the secondary nodes (omitted in the diagram).

Multiple Security Servers or a Security Server cluster?

The key difference between multiple independent Security Servers and a Security Server cluster is that the cluster provides both high availability and scalability. In contrast, independent Security Servers provide only high availability. In the cluster, secondary nodes can be scaled with less effort while setting up a new independent Security Server is a manual operation. Also, in the cluster setup, all Security Servers share the same identity and configuration that is synchronized automatically. Instead, multiple independent Security Servers each have their own unique identity and configuration, and there’s no synchronization between them. Which one is the best alternative depends on the use case and its requirements.

Containerized future?

The four deployment models described before give an overview of what kind of models can be considered for the Sidecar. The same models can be applied regardless of the underlying platform or environment where the Security Server is deployed. Of course, the implementation details vary between different platforms and environments, but the high-level architecture patterns remain the same. However, the models do not provide an exhaustive list of available alternatives since different models can be combined and new elements, such as load balancers, can be added to the described ones.

Adding support for containers does not mean dropping support for Linux – running the Security Server on Ubuntu and Red Hat will remain supported in the future too. Containers are an alternative way to run the Security Server, and X-Road members are free to choose between the available alternatives. Containers are a convenient way to run the Security Server when an organization already has the required capabilities to operate and manage containers in production environments. However, in case an organization is not quite there yet, using virtual machines might be a better alternative since mastering Security Server containers on a production level requires time, effort, and experience.

It must also be noted that the Security Server configuration process – registering and onboarding a fresh Security Server to an X-Road ecosystem – is always the same regardless of the Security Server packaging. From a process perspective, the containerized version of the Security Server is not different from the Linux packaged version. In this way, X-Road members can be sure that the same level of trust is always guaranteed in data exchange between X-Road members.

X-Road and Containers (part 2)

This is a series of blog posts about X-Road® and containers. The first part provides an introduction to containers and container technologies in general. The second part concentrates on the challenges in containerizing the Security Server. The Security Server Sidecar – a containerized version of the Security Server – is discussed in the third part.

Container support for X-Road – and for the Security Server especially - has been requested for some years already, but at the moment, production-level support is not available yet. However, both Central Server (xroad-central-server) and Security Server (xroad-security-server, xroad-security-server-standalone) Docker images are already available for testing purposes on NIIS’s Docker Hub account. This means that different X-Road components can be run inside containers, so why production use is not supported yet? Let’s consider the question from the Security Server’s point of view. What needs to be taken into account when running the Security Server in a container?

One process per container

According to the best practices, each container should have only one concern and run only a single process. The Security Server consists of multiple processes, including a PostgreSQL database, and the currently available Docker image runs them all in a single container. Decoupling all the Security Server processes into multiple containers would require a significant effort providing minimal benefits in exchange since the current architecture has not been designed to run and scale different application processes separately. Supporting that kind of approach would require significant changes to the Security Server architecture.

However, rules and best practices are made to be broken. After all, it is quite common to run multiple processes inside a container. A good approach for the Security Server is to deploy the Security Server application and Postgres database separately. In that way, the Security Server is split into two parts. Yet, the Security Server application processes remain in the same container. In this case, no software-level changes are required since the Security Server already supports using a remote database that can be a separate container, managed DB service on the cloud, etc.

Running multiple processes in a container requires that process management is appropriately implemented. When the Security Server is run on a Linux platform, the Security Server processes are managed using systemd service and system manager. The use of systemd is built in the Security Server packaging since it’s used by the Linux distributions supported by the Security Server. However, it is not recommended to run systemd inside a container since systemd does things that are typically controlled by the container runtime. Besides, some things systemd does are prevented inside containers by default, e.g., change host-level parameters. Therefore, the Security Server processes need to be managed using some other more lightweight process manager, such as supervisord.

Persistent storage

The Security Server is a stateful application. Therefore, the configuration in the database and on the filesystem must be persisted over a lifecycle of a single container. The data includes local overrides to the default configuration, keys and certificates, registered clients and their configuration, logs, backups, etc. Without persisting the configuration, the Security Server should be initialized, configured, registered, etc., whenever an existing container is recreated.

When an external database is used, the data in the database is already stored outside the container. However, the configuration data, backups, and message log archives stored on the filesystem must be persisted too. It can be done using persistent storage that is mounted to the Security Server container. Persistent storage stores the data on the host system and not in the container. Besides, X-Road application logs must be persisted as well. It can be done using the persisted storage or redirecting logging to console to enable the container management system to collect and store the logs.

Version upgrades

Security Server version upgrades sometimes require running database migrations and updating the contents of the configuration files. Since the way how version upgrades are handled with containers differs from traditional version upgrades done using Linux package management systems, special attention must be paid to the Security Server version upgrades. In practice, it means that the upgrade mechanism has to be built in the container image. The mechanism must detect that the application version used by the container differs from the version of the persistent configuration, and perform the steps required by the upgrade. In this way, it is possible to change from an older image to a newer one and keep the existing configuration and data.

First run

Similarly to version upgrades, there must be a mechanism that detects when a container is started for the first time, and there’s no existing, persisted configuration already available. For security reasons, each container must have a unique internal and admin UI TLS keys, certificates, and a database password. The secrets are typically generated during the installation process, which in the container context means when the image is created. In practice, it means that all the containers created from the same source image share the same secrets. In case of a public Security Server container image, anyone could access the secrets which would expose all containers created from the image to different kind of attacks. Therefore, the secrets must be recreated on the first run so that each container has its own unique set of secrets that are not shared with any other container.

Hardware security modules (HSMs)

One additional challenge that has not been discussed yet is related to hardware security modules (HSM). For extra security, sign keys and certificates of the Security Server clients may be stored on an HSM instead of a software token that’s used by default. Different cloud platforms provide cloud HSM services that can be accessed over a network, but in case using a physical HSM device is required, how to connect it to containers? Finding an answer to the question is out of the scope of this blog post.

Towards containerization

X-Road version 6 was initially designed to be deployed on Linux hosts (physical or virtual), and therefore, some additional effort is required to enable its production use in containers. However, the challenges related to containerizing the Security Server can be overcome without changing the application itself.

In the long run, the Security Server architecture should be refactored to be able to utilize the benefits that containers can offer fully. At the same time, it’s important to remember that the currently supported Linux platforms must be supported in the future too. Fortunately, the two alternatives are not mutually exclusive. Containers are not going to replace virtual machines, but they will provide an alternative way to run the Security Server.

From Virtual Machines to Containers (part 1)

This is a series of blog posts about X-Road® and containers. The first part provides an introduction to containers and container technologies in general. The second part concentrates on the challenges in containerizing the Security Server. The Security Server Sidecar – a containerized version of the Security Server – is discussed in the third part.

Nowadays, it’s hard to avoid hearing about Docker and containers if you work in the field of IT. It applies to X-Road, too, since questions regarding X-Road and support for containers have been arising regularly during recent years. But what containers are, and how do they differ from virtual machines?

What are the containers?

Containers package an application and all its dependencies, libraries, configuration files, etc., into a single package that contains the entire runtime environment needed to run the application. The package can then be deployed to different computing environments without having to worry about the differences between operating system distributions, versions of available libraries, etc. The differences are abstracted away by the containerization.

The difference between virtual machines and containers is that a virtual machine includes an entire operating system and the application. In contrast, a container only contains the application and its runtime environment. Therefore, containers are more lightweight and use fewer resources than virtual machines. The size of a container may be only tens of megabytes, and it can be started in seconds. Instead, a virtual machine with an entire operating system may be several gigabytes in size, and booting up may take several minutes.

Image 1. A physical server that runs multiple containers compared to a physical server that runs multiple virtual machines.

Image 1. A physical server that runs multiple containers compared to a physical server that runs multiple virtual machines.

A physical server that runs multiple virtual machines has a separate guest operating system running for each virtual machine on top of it. Instead, a server running multiple containers only runs a single operating system which resources are shared between the containers. However, each container runs in a separate, isolated process that has its namespace and filesystem. The number of containers that can be hosted by a single server is far higher than the number of virtual machines that the server can host.

Container technologies

Docker is commonly considered a synonym for containers, even if it’s not the only container technology out there. Besides, Docker is not the first container technology either since several other technologies had existed already before its launch in 2013. However, Docker was the first container technology, which became hugely popular among the masses, which is why the name Docker is often mistakenly used when referring to container technologies in general.

Nowadays, there are multiple container technologies available, and the fundamental building blocks of the technology have been standardized. The Open Container Initiative (OCI) is a project facilitated by the Linux Foundation, which creates open industry standards around container formats and runtime for all platforms. The standardization enables portability between infrastructures, cloud providers, etc., and prevents locking into a specific technology vendor. All the leading players in the container industry follow the specifications.

Images and containers

Images and containers are the two main concepts of container technologies. Therefore, understanding their difference on a high-level, at least, is essential.

A container image can be compared to a virtual machine image – except that it’s smaller and does not contain the whole operating system. A container image is an immutable, read-only file that contains executable code, libraries dependencies, tools, etc., that are needed for an application to run. An image represents an application and its virtual environment at a specific point in time, and it can be considered as a template of an application. An image is compiled of layers built on top of a parent or base image, which enables image reuse.

Containers are running images. When a new container is started, the container is created from a source image. In other words, the container is an instance of the source image, just like a process is an instance of an executable. Unlike images, containers are not immutable, and therefore, they can be modified. However, the image based on which the container was created remains unchanged. Consequently, it’s possible to create multiple containers from the same source image, and all the created containers have the same initial setup that can be altered during their lifecycle.

Images can exist independently without containers, but a container always requires an image to exist. Images are published and shared in image registries that may be public or private. The best-known image registry is probably Docker Hub. Images are published and maintained by software vendors as well as individual developers.

Stateful and stateless containers

Containers can be stateful or stateless. The main difference is that stateless containers don’t store data across operations while stateful containers store data from one time they’re run to the next. In general, a new container always starts from the sate defined by the source image. It means that the data generated by one container is not available to other containers by default. If the data processed by a container must be persisted over a lifecycle of the container, it needs to be stored on a persistent storage, e.g., an external volume stored on the host where the container is running. The persisted storage can then be attached to another container regardless of the source image of the other container. In other words, persistent storage can be used to share data between containers.

Handling upgrades

Upgrading an application running in a container also differs from the way how applications running on a virtual machine are traditionally upgraded. Applications running on a virtual machine are usually upgraded by installing a new version of the application on the existing virtual machine. Instead, applications running in a container are upgraded by creating a new image containing the latest version of the application and then recreating all the containers using the new image. In other words, instead of upgrading the application running in the existing containers, the existing containers are replaced with new containers that run the latest version of the application. However, the approach is not container-specific since handling upgrades on virtual machines in cloud environments often follows the same process nowadays.

Container management systems

Running a single container or an application consisting of a couple of containers on a local machine for testing or development purposes is a simple task. Instead, running a complex application consisting of tens of containers in a production environment is far from simple. Container management systems are tools that provide capabilities to manage complex setups composed of multiple containers across many servers. In general, container management systems automate the creation, deployment, destruction, and scaling of containers. Available features vary between different solutions and may include, for example, monitoring, orchestration, load balancing, security, and storage. However, running a container management system is not a simple task that brings additional complexity to management and operations.

Kubernetes is the best-known open-source container management system. Google originated it, but nowadays, it is widely used in the industry, and by different service providers. For example, all the major cloud service providers offer Kubernetes services. When it comes to commercial alternatives, Docker Enterprise Edition is probably the best-known commercial solution, but there are many other solutions available too.

Pros and cons

The benefits of containerization vary between different applications. And sometimes containerization may not provide any benefits. Therefore, instead of containerizing everything by default, only applications that benefit from containers should be containerized.

Containers provide a streamlined way to distribute and deploy applications. Containers are highly portable, and they can be easily deployed to different operating systems and platforms. They also have less overhead compared to virtual machines, which enables more efficient utilization of computing resources. Besides, containers support agile development and DevOps enabling faster application development cycles and more consistent operations. All in all, containers provide many benefits, but they’re not perfect, they have disadvantages too. 

In general, managing containers in a production setup requires a container management system. The system automates many aspects of container management, but implementing and managing the system itself is often complicated and requires special skills. Managing persistent data storage brings additional complexity as well, and incorrect configuration may lead to data loss. Besides, persistent storage configurations may not be fully compatible between different environments and platforms, which means that they may need to be changed when containers are moved between environments. For example, both Docker and Kubernetes have the concept of volume, but they’re not identical and, therefore, behave differently.

All in all, containers offer many benefits, and they provide an excellent alternative to other virtualisation options. However, containers cannot fully replace the other options, and therefore, different solutions will be used side-by-side in the future too.

New Security Server UI and management REST API are here

X-Road version 6 was released in 2015, and it has been continuously developed further throughout the years. As so far, the most significant change has been adding support for REST services in 2019. However, the system hasn’t changed much visually since its release in 2015. That’s about to change soon since X-Road version 6.24.0 will introduce the biggest changes X-Road 6 has experienced yet.

The beta version of X-Road 6.24.0 is already out, and the official release version will be published on the 31st of August 2020.

It’s got the look

The most significant change in X-Road version 6.24.0 is the fully renewed Security Server user interface (UI). The new UI aims to improve the usability and user experience of the Security Server. The new intuitive UI makes regular administrative tasks easier and supports streamlining the on-boarding process of new X-Road members.

Image 1. Add client wizard.

Image 1. Add client wizard.

For example, the new UI uses wizards to implement tasks that require completing multiple steps in a specific order, such as adding a new client with a new signature key and certificate. Before, the user needed to know what steps are required and their correct order, but from now on the UI provides the information to the user and guides the user through the process.

Image 2. The new UI provides additional information on different configuration options.

Image 2. The new UI provides additional information on different configuration options.

Another essential improvement is providing more additional information regarding different Security Server features in the UI. For example, the Security Server has multiple keys and certificates, and it may not always be clear what different keys and certificates are used for. Therefore, the new UI provides information about different keys, such as authentication and signature keys.

Management REST API

Another significant change in X-Road version 6.24.0 is the brand-new management REST API. The API provides all the same functionalities with the UI, and it can be used to automate common maintenance and management tasks. It means that maintaining and operating multiple Security Servers can be done more efficiently as configuration and maintenance tasks require less manual work. By the way, the new UI uses the same API under the hood too.

The Security Server User Guide provides more information about the API, and there’s also the API’s OpenAPI 3 description available on GitHub. Access to the API is controlled using API keys that can be managed through the Security Server UI or through the API itself. In addition, access to the API can be restricted using IP filtering.

Changes in the architecture

The new UI and management REST API have also caused changes in the Security Server architecture and packaging. The previously existed Nginx (xroad-nginx) and Jetty (xroad-jetty) components have been replaced with the new UI and API (xroad-proxy-ui-api) components. These changes have affected Security Server’s log files, directories, software packages, and services. It’s strongly recommended that Security Server administrators study the details of these changes from the release notes before upgrading to version 6.24.0.

Image 3. Changes in the Security Server architecture - before version 6.24.0 (left) and starting from version 6.24.0 (right).

Image 3. Changes in the Security Server architecture - before version 6.24.0 (left) and starting from version 6.24.0 (right).

Wait, there’s more!

Even though the new UI and management REST API are the most significant and most visible changes in version 6.24.0, the new version contains many other new features, improvements, and fixes. Here’s a short overview of other changes included in the latest version.

  • Support for running Security Server on Red Hat Enterprise Linux 8 (RHEL8).

  • Updates on operational monitoring protocols that enable monitoring of SOAP and REST services in more consistent manner. N.B.! The updates cause breaking changes in the Operational Monitoring protocols.

  • Better support for using external database services on different platforms (e.g. Amazon Web Services, Microsoft Azure, Google Cloud Platform) for both Central Server and Security Server.

  • Changes in allowed characters in X-Road system identifiers and improved validation of the identifiers.

  • Technology updates and decreased technical debt. 

The full list of changes with more detailed descriptions is available in the release notes.

It’s all about users

Another significant change in X-Road over the years is how X-Road is being developed. Nowadays, X-Road users play an essential role in the design and development as a source of input and as validators of the development results. It applies to the new UI, too, since X-Road users have participated in its design and development by providing input, feedback, and comments in different phases of the process. The involvement of the users in the design and development is here to stay, and also the new UI will be further developed and improved based on the feedback received from the field.

Towards the Unicorn

One major change has just been completed, but the next ones are already waiting around the corner. The very first flight of the Unicorn – the release of the beta version of X-Road 7 – is expected to happen by the end of this year, and the first release version should see the daylight in 2021. More information about X-Road 7 and the changes it will introduce will be provided at a later date. Meanwhile, please try out the new X-Road 6.24.0 and tell us your opinion about it!

X-Road Implementation Models

X-Road® has become known as the open-source data exchange layer that is the backbone of the Estonian X-tee and the Finnish Suomi.fi Data Exchange Layer ecosystems. Both ecosystems are nationwide, and they’re open for all kinds of organizations – both public and private sectors. Also, Iceland is currently setting up its national X-Road ecosystem called Straumurinn. Besides, X-Road has been implemented all around the world in many different shapes and sizes.

In general, an X-Road ecosystem is a community of organizations using the same instance of the X-Road software for producing and consuming services. The owner of the ecosystem, the X-Road operator, controls who are allowed to join the community, and the owner defines regulations and practices that the ecosystem must follow.

Image 1. Roles and responsibilities of an X-Road ecosystem.

Image 1. Roles and responsibilities of an X-Road ecosystem.

Technically, the X-Road software does not set any limitations to the size of the ecosystem or the member organizations. The ecosystem may be nationwide, or it may be limited to organizations meeting specific criteria, e.g., clients of a commercial service provider. Thanks to its scalable architecture and organizational model, X-Road is exceptionally flexible, and it supports various kinds of setups. Even if a nationwide implementation of X-Road is probably the best known implementation model, X-Road can be used in many other ways too. Let’s find out more about the different alternatives.

National data exchange layer

National implementation is probably the most typical way to implement X-Road. In a national implementation, X-Road is implemented nationwide within a country, and the aim is to use it in data exchange between organizations across administration sectors and business domains. Typically, the ecosystem is open for all kinds of organizations – both public and private sector organizations. However, it is also possible to restrict the implementation to cover only the public sector, specific administration sector, business domain, or a combination of these.

Besides, X-Road can be used to implement cross-border data exchange with other countries that have a national X-Road implementation. In practice, the ecosystems of different countries are connected using federation – an X-Road feature that enables connecting two X-Road environments. Federation enables member organizations of different ecosystems to exchange data as if they were members of the same ecosystem.

Image 2. X-Road federation - connecting two X-Road ecosystems.

Image 2. X-Road federation - connecting two X-Road ecosystems.

In a national implementation, a government agency is usually the owner of the ecosystem. The owner takes the role of the X-Road operator, who is responsible for all the aspects of the operations. The responsibilities include defining regulations and practices, accepting new members, providing support for members, and operating the central components of the X-Road software. Technical activities can be outsourced to a third party, but administrative and supervising responsibilities are carried out by the operator.

There are multiple implementations around the world where X-Road is used as a national data exchange layer. The best known national X-Road ecosystems are in IcelandFinland, and Estonia.

Data exchange solution for regions

Regional implementation means implementing X-Road within a region or an autonomous community, such as a region, a province, or a state. In a regional implementation, X-Road is used within a region, and the scope is usually very similar to the national implementation – data exchange between organisations across administration sectors and business domains. However, the scope may be more restricted, as well. Besides, X-Road may be used to exchange data with the central government and/or other regions. 

In a regional implementation, a regional agency or authority is usually the owner of the ecosystem. The owner takes the role of the X-Road operator, who is responsible for all the aspects of the operations. Some of the technical activities may be outsourced, just like in the national implementation.

As an alternative approach, the national implementation described earlier may consist of multiple regional implementations too. Every region or some of the regions within a country can have their X-Road ecosystems that are connected using federation. However, compared to a single national implementation, this approach generates more overhead since every region must manage and operate its X-Road ecosystem. Therefore, when targeting for national implementation, a single national ecosystem is recommended over multiple regional ecosystems that are connected using federation.

One example of a regional implementation can be found from Argentina. The province of Neuquén in Argentina is using X-Road as a regional data exchange platform. Also, some regions in other countries are currently considering the use of X-Road on a local level.

Data exchange within a business domain or sector

In national and regional applications, X-Road is implemented within a geographic area, such as a country or a region. However, there are no restrictions on why an X-Road ecosystem could not span multiple states and/or regions as long as there’s an organisation that takes the role and responsibilities of the X-Road operator. A practical example of this kind of approach is implementing X-Road within a business domain or sector in which members are located in different countries around the world. However, X-Road could be implemented within a business domain or sector on the national level too.

The critical factor is that all members commit to follow the rules and policies of the ecosystem set by the X-Road operator. In this case, the use of X-Road is based on a mutual agreement between the members of the ecosystem. In national and regional implementations, the use of X-Road is often based on a law or a regulation issued by a governmental or regional authority. 

In case different business domains have their X-Road ecosystems, they can be connected using federation, which enables data exchange between member organisations of different business domains. Technically, a business domain-specific implementation can be connected to a national or regional X-Road ecosystem too.

X-Road based business domain-specific solutions have been implemented in several countries. For example, in Germany X-Road is being used to exchange healthcare data, and in Estonia, the X-Road based Estfeed platform is utilised in energy sector data exchange. Besides, Estfeed is also applied by the Data Bridge Alliance to exchange energy data on a cross-border level.

A platform for data exchange within an organisation

The primary use case for X-Road is data exchange between organisations, but there are no restrictions on why X-Road could not be used to exchange data within an organisation too. For example, a large international organisation that has branches and departments in different countries and continents may have information systems that communicate over the public Internet. X-Road provides a solution to connect those systems in a standardised and secure manner guaranteeing confidentiality, integrity, and interoperability of the data exchange.

When it comes to the organisational model of X-Road, one of the departments takes the role of the X-Road operator, and other branches and departments are members of the ecosystem. In addition to connecting information systems communicating over the Internet, X-Road could be used inside a private network of an organisation too.

One example of corporate use of X-Road can be found in Japan. A major Japanese gas company uses an X-Road based solution to exchange data between its different organisation units. Another interesting approach to corporate use is building a commercial product on top of X-Road. Since X-Road is open source and licensed under the permissive MIT license, it can be utilised in commercial closed source products too. For example, Planetway, a Japanese-Estonian company, has built its PlanetCross platform using X-Road.

For clarity, X-Road is not a service mesh platform for microservices, such as Istio. X-Road is meant for data exchange between information systems over the public Internet, and service mesh platforms are used as a communication layer between different microservices in a microservices architecture. The high-level capabilities that X-Road and many service mesh solutions provide may seem very similar. Still, the way how they have been implemented is optimised for very different use cases. Therefore, X-Road is not to be mixed with service mesh solutions.

How would you use X-Road?

As we have learned, X-Road can be implemented in many different ways. The right way always depends on the use case, requirements, and operating environment. Thanks to its distributed architecture, X-Road is highly scalable and is, therefore, a good fit for all sizes of implementations. It also enables different approaches when it comes to the speed and scale of the implementation – starting small with few member organisations and services, or going live with a big bang with a bunch of members and connected systems.  

If you’re interested in the upcoming changes in the X-Road core, please visit the X-Road backlog. Anyone can access the backlog, and leave comments and submit enhancement requests through the X-Road Service Desk portal. Accessing the backlog and service desk requires creating an account that can be done in a few seconds using the signup form.

X-Road development going full steam in 2020

The year 2020 has started like the previous ended, with the X-Road development going on full steam. The first X-Road release of the new decade saw the daylight in February, which means that X-Road releases have now been published in three decades. The first production-level X-Road version was released in 2001 – almost 20 years ago. It does not mean that X-Road is cooling down – on the contrary, the near future brings a bunch of changes to X-Road that take it to the whole new level. However, getting there does not happen overnight.

The changes are implemented using an iterative approach, which means that every new X-Road release brings something new to the table. The changes start from version 6.24.0, but the most significant milestone will be the release of X-Road 7 in 2021. We have published a high-level X-Road development roadmap for 2020 so that everyone can see what kind of new features are coming out and when. The roadmap is available on the X-Road website.

The first release of the year, version 6.23.0, was published in February. The release was all about the Central Server, and it introduced changes in the Central Server high-availability support. More information about the changes can be found in my previous blog post and the official release notes.

The first production-level X-Road version was released in 2001 – almost 20 years ago. It does not mean that X-Road is cooling down – on the contrary, the near future brings a bunch of changes to X-Road that take it to the whole new level.

New Security Server admin UI and API

As you probably know, we have been working on the new Security Server UI and administrative REST API for some time already. The work is not fully completed yet, but at this point, it can be said that the new UI and API will be included in version 6.24.0. The release of the new UI and API is probably the most significant change in X-Road core since the first release of X-Road version 6 in 2015 – even more significant than the long-awaited REST support in 2019. Technically, the new UI and API are built on top of the existing X-Road core. However, the implementation technologies have been updated in the process.

The new UI provides improved user experience (UX) for Security Server administrators. The new UI has a new look and feel, and it makes taking care of administrative tasks easier and supports streamlining the onboarding process of new X-Road members. The administrative REST API will enable automation of Security Server maintenance tasks since all the features that are available through the UI are available through the API too. Maintaining and operating multiple Security Servers can be done more efficiently as configuration and maintenance tasks require less manual work.

The release of the new UI and API is probably the most significant change in X-Road core since the first release of X-Road version 6 in 2015 – even more significant than the long-awaited REST support in 2019.

Supported platforms

Currently, the Security Server officially supports Ubuntu 18.04 LTS and Red Hat Enterprise Linux 7 (RHEL7) platforms. Instead, the Central Server and Configuration Proxy officially support only Ubuntu 18.04 LTS.

In 2020 official support for Ubuntu 20.04 LTS will be added to the Central Server, Configuration Proxy, and Security Server. Also, official support for RHEL8 will be added to the Security Server.

Version 6.21 is the last X-Road version that supports Ubuntu 14.04 LTS. It is good to keep in mind that once the version 6.24.0 is released, the version 6.21 drops out of the supported X-Road versions list. X-Road components still running on Ubuntu 14.04 LTS host cannot be upgraded to a newer X-Road version anymore without first upgrading the underlying host operating system.

X-Road 7

The development of the core components of X-Road version 6 continues actively throughout the year 2020. It has been decided that X-Road 7 will be built on top of version 6, which means all the enhancements implemented for version 6 will benefit the development of version 7 too. Making the current codebase more modular and reducing technical debt are also important goals for this year. Enabling the smooth implementation of new features planned for version 7 requires implementing certain changes to the current codebase upfront. However, the aim is to implement all the changes in a backwards-compatible manner. It means that the version upgrade between version 6 and 7 is no different compared to a version upgrade between the minor versions of version 6.

X-Road 7 will be implemented iteratively using agile software development methods. It means that changes and new features will be implemented in small pieces, every new version building on top of the previous one. In practice, this means that the first release of X-Road 7 will not include all the new features planned for version 7, but only a minimal subset of them. In the following versions, new features will then be added piece by piece and existing features are further developed based on the user feedback.

At the same time, with the technical track, we’re also actively working on the design of X-Road 7. Multiple activities will be carried out throughout the year, and X-Road users and stakeholders will have an active role in the process. Feature-wise, the target areas for this year are messaging patterns, message logging, and onboarding process.

X-Road 7 will be implemented iteratively using agile software development methods. It means that changes and new features will be implemented in small pieces, every new version building on top of the previous one.

X-Road extensions

In addition to the X-Road core, the maintenance and further development of two X-Road extensions will be handed over to NIIS by the Estonian Information System Authority (RIA). The extensions are X-Road 6 Monitor Project and Mini Information System Portal 2 (MISP2). The handover will take place during the first half of 2020.

X-Road and eDelivery

X-Road and eDelivery are both data exchange solutions that have been successfully used in multiple implementations in several countries and / or projects. They both provide a standardised and secure way to exchange data over the Internet. eDelivery is a building block of the Connecting Europe Facility (CEF).

NIIS is currently implementing a gateway between eDelivery and X-Road that will enable data exchange between eDelivery and X-Road ecosystems. A technical proof-of-concept level implementation has already been completed, and more detailed design is being drafted in collaboration with the European Commission’s Directorate-General for Informatics (DIGIT). The actual implementation of the gateway will begin later this year.

NIIS is looking for organisations that are interested in piloting the gateway. In case your organisation is an X-Road or eDelivery user and would like to exchange data with an organisation that is using the other platform, please contact NIIS for more detailed information.

NIIS is currently implementing a gateway between eDelivery and X-Road that will enable data exchange between eDelivery and X-Road ecosystems. A technical proof-of-concept level implementation has already been completed.

Want to know more?

If you’re interested in more detailed information about the upcoming changes, please visit the X-Road backlog. Anyone can access the backlog, and leave comments and submit enhancement requests through the X-Road Service Desk portal. Accessing the backlog and service desk requires creating an account that can be done in a few seconds using the signup form.

When X-Road is developed, and new features are added, the X-Road technology stack changes too. X-Road Tech Radar provides up-to-date information on different technologies used in X-Road.

Changes in the X-Road Central Server High Availability Support

Central Server is one of the key components of the X-Road ecosystem. It contains a registry of X-Road member organisations and their Security Servers. In addition, the Central Server contains the security policy of the X-Road instance that includes list of trusted certification authorities, list of trusted time-stamping authorities and configuration parameters. Both the member registry and the security policy are made available to the Security Servers via HTTP protocol. This distributed set of data forms the global configuration that the Security Servers use for mediating messages sent via X-Road. An X-Road operator is responsible for operating the Central Server.

Image 1. X-Road architecture and roles.

Image 1. X-Road architecture and roles.

To be able to mediate messages Security Server must have a valid copy of the global configuration available all the time. Security Server downloads the global configuration from Central Server regularly and uses a local copy while processing messages. Security Server remains operational as long as it has a valid copy of the global configuration available locally. This means that Central Server may be unavailable for a limited time period without causing any downtime to the ecosystem. However, registering new members or subsystems is not possible without Central Server. Both the download interval and global configuration validity period can be configured according to the requirements of the X-Road ecosystem.

Design for Failure

An X-Road ecosystem is very fault tolerant against Central Server failures even with one Central Server node only. However, critical information systems should always be designed for failure so that they remain operational despite of a failure of individual components.

Central Server supports high availability through clustering that provides additional fault tolerance and scalability from performance point of view. A Central Server cluster consists of two or more Central Server nodes. The cluster is based on active-active model which means all the nodes can be used for both read and write operations. In case one of the nodes fails, Security Servers are able to fail over to other available nodes.

Why Changes Are Needed?

Until X-Road version 6.22 the clustering implementation was based on asynchronous, active-active database replication between the nodes. Unfortunately, the technology that was used in the implementation reached its end-of-life in December 2019 and newer versions of the same technology are not available under an open source license. Therefore, there was no other choice than to give up the BDR plugin for PostgreSQL by 2ndQuadrant and update the high availability support implementation for Central Server. Continuing with a newer version of the BDR plugin for PostgreSQL would have meant that every X-Road operator using clustering was required to buy a commercial license for the plugin.

Image 2. Central Server high availability implementation until version 6.22.

Image 2. Central Server high availability implementation until version 6.22.

What Will Change?

Starting from version 6.23 the Central Server high availability implementation is based on a shared, optionally highly available database. Before version 6.23 every Central Server node in a cluster had its own database and changes were synchronized using multi-master database replication between the nodes. X-Road provided tools to setup the cluster and replication between the nodes. Starting from version 6.23 all the Central Server nodes share the same database that can be a standalone database, a database cluster, a fully maintained database service in the cloud etc. X-Road provides instructions how to configure the Central Server nodes in the cluster, but implementing high availability of the database is out of X-Road’s scope. However, the documentation provides instruction for setting up a replicated PostgreSQL database, but the documentation does not cover automatic failover.

Image 3. Central Server high availability implementation starting from version 6.23.

Image 3. Central Server high availability implementation starting from version 6.23.

Compared to the previous implementation the new implementation is more flexible, because it gives the X-Road operator the freedom to choose how high availability is implemented on the database level. Instead, the previous implementation was tied to the BDR plugin for PostgreSQL. At the same time, more flexibility also brings more responsibility as implementing the high availability of the database is now the X-Road operator’s responsibility.

Available Resources

The official X-Road documentation provides an updated Central Server High Availability Installation Guide. In addition, the X-Road Knowledge Base provides an article about migrating Central Server clusters from version 6.22 to version 6.23. It is highly recommended for all the X-Road operators to read these documents before updating clustered Central Servers to version 6.23.

Try It Out!

X-Road 6.23.0-beta is now available for testing and the production version will be released by the end of February 2020. We wish to receive feedback about the new version and/or any possible challenges regarding migration to the new version.

Interoperability Puzzle

In today’s digital world information is stored across multiple information systems owned and maintained by different organisations. In addition to information spreading across multiple organisations, every organisation has internally numerous information systems that store information. Most of the digital services and processes require accessing multiple information systems and combining data from different sources – both inside an organisation and across multiple organisations. Without connections between different information systems building digital services would be extremely challenging if not impossible.

The ability of information systems to exchange and utilize information is known as interoperability. Unlike it may first sound like, interoperability is not only about technology and technical connectivity. On the contrary, interoperability consists of different layers that include also technology. The European Interoperability Framework (EIF) defines four layers of interoperability:

  • legal – aligned legislation

  • organisational – coordinated processes

  • semantical – precise meaning of exchanged information

  • technical – connecting information systems and services.

Image 1. EIF conceptual model. (source)

Image 1. EIF conceptual model. (source)

All the four layers are equally important when building digital services and processes. In addition, challenges on one layer are often reflected to other layers too. Therefore, it is important to be aware of all the layers and not to neglect any of them. That being said, in this blog post I’m going to concentrate on the technical layer and its dimensions because covering all the layers at once would be too big a bite to chew.

Data Exchange Scenarios

When it comes to a public sector organisation exchanging information, three top level data exchange scenarios can be recognized:

  • Internal – data exchange within an organisation

  • National – data exchange on national level

  • Cross-border – international data exchange.

The same rules, laws and regulations don’t apply to national and cross-border data exchange which is why they are two separate scenarios instead of a single “external” scenario. Cross-border data exchange between authorities usually requires both state level agreements and data exchange agreements between the data exchange parties. The two scenarios could probably be combined as a single scenario making the total number of different scenarios two: internal and external.

The common factor between the scenarios is that all three require certain technical basic elements including, but not limited to connectivity, secure communication protocols, interfaces and integration services. The more standardized these elements are, the less work is required to build new connections between information systems and services. For example, if there’s no commonly agreed solution to securely connect information systems to each other and to how the connections are managed, the result is probably a jungle of point-to-point connections which means agreeing on the connection details and then building the connections every time when a new connection is needed – this is repeated again, again and again.

However, even if the technical basic elements in all the scenarios are the same, they are usually implemented using different technical solutions and technologies. Implementing a standardized connectivity layer within an organisation is usually based on different technology than a standardized connectivity layer with external parties. Let’s take a look at an example of an organisation that has a microservice-based information system with REST APIs published to external consumers.

Image 2. A microservice-based information system with REST APIs published to external consumers.

Image 2. A microservice-based information system with REST APIs published to external consumers.

Internal Communications

Internally the information system uses a service mesh to facilitate service-to-service communications between microservices. A service mesh is a dedicated infrastructure layer that provides features such as standardized and secure connections, service discovery, and centralized logging and monitoring capabilities. Microservices communicate with each other through a service mesh proxy that is usually responsible for microservice level authentication, message routing, service discovery, automatic retries, timeouts, logging etc. As these features are provided by the proxy, they do not need to be implemented in the application code of each microservice separately. In addition, a service mesh usually has a centralized control plane that can be used to configure the proxies, and access logging and monitoring information etc.

Requests originating outside of the mesh typically enter the mesh through a service mesh gateway component. Available capabilities vary between different solutions, but in general, a service mesh is designed to manage traffic internal to the service mesh. In this case the example was very simple, but in real life a service mesh could serve multiple information systems and span multiple networks and data centers.

Exposing Services Externally

When it comes to accepting traffic from outside of an organisation, an API gateway comes into the picture. An API gateway exposes backend services as managed APIs and distributes traffic internally – in and out of the service mesh. An API gateway provides a single entry point to all clients, and hides the details of individual microservices. An API gateway also typically provides capabilities such as logging, monitoring, metrics, access control, request limiting, message transformations, orchestration etc. In addition, an API gateway is usually well connected to other components of the API management ecosystem, e.g. API marketplace and API publishing portal.

Even though API gateways and service meshes are complementary solutions, they have many overlapping functionalities and features. They are often deployed together, but they can be deployed separately as well. In addition, an API gateway can be used for internal purposes too – not only for publishing services to external clients. Similarly, a service mesh could be used to publish services to external clients.

What X-Road Brings to the Puzzle?

As so far I have been writing about internal and external data exchange, but I haven’t written a word about X-Road yet. At this point you may be wondering what is X-Road needed for if internal and external data exchange can be implemented using other technologies.

First of all, X-Road is best suited for external data exchange over the public Internet. The most common use case is data exchange between two organisations, but a single organisation may have information systems that are hosted in different locations and communicate with each other over the Internet too. In this case X-Road is a good fit for internal data exchange as well.

At first sight X-Road may seem like a service mesh as the architecture and feature sets have many similarities – both provide secure and standardized connections, service-to-service authentication, logging, reporting etc. In addition, both are based on an architecture model that implements service level communication through a proxy component. However, X-Road is not a service mesh as service mesh is the connection layer between different services in microservices architecture. In other words, service mesh is used as an internal connection layer within an application or between multiple applications of a single organisation whereas X-Road is used as a connection layer between different organisations and information systems.

How about X-Road and an API gateway then – are they mutually exclusive or can they be used side by side? X-Road and an API gateway are both used to publish services to external clients. Their architecture and feature sets are different even though they have features in common too, e.g. publish APIs to external clients, service-to-service authentication, authorization, logging, metrics. The major difference between X-Road and API gateway is that X-Road requires that the Security Server is used on both service consumer and provider side whereas API gateway enables client connections directly without any additional components on the client side.

Image 3. Point-to-point connections, an API gateway and X-Road in comparison.

Image 3. Point-to-point connections, an API gateway and X-Road in comparison.

Overall, an API gateway provides more flexibility and API management related features compared to X-Road, but when the same client communicates with multiple API gateways the client must adapt to different requirements and configurations of multiple service providers. Instead, X-Road provides a single communication channel between multiple service providers and services that all share the same configuration that is automatically distributed and applied by X-Road. In addition, X-Road guarantees that both service consumer and service provider meet the same security requirements, and non-repudiation of all the processed messages by signing, time-stamping and logging every processed message on the consumer and provider side. The logs can be used in a court proceeding as evidence. These features make X-Road ideal solution for secure, reliable and auditable data exchange.

One Happy Family

X-Road, an API gateway and a service mesh all have their place in the interoperability puzzle, and they can be used together side-by-side. They all have their own strengths and they can be used to complement each other.

X-Road is an ideal solution for secure data exchange that requires strong authentication of data exchange parties and non-repudiation with recorded eIDAS compliant evidence. X-Road can connect to backend services directly or through an API gateway. X-Road does not support message transformations, orchestration, rate limiting, quotas etc. which can be implemented in the API gateway layer if they are required.

Some APIs may not require strict security controls or they should be accessible without an additional access point on the client side, e.g. APIs providing open data. There’s no reason why an API could not be published through multiple channels, for example an API providing open data can be published through both X-Road and an API gateway. The benefit of this approach is that organisations that are not using X-Road can access it directly through an API gateway and organisations using X-Road can access it using the same channel they use to access other services and APIs too.

Image 4. The example application with X-Road.

Image 4. The example application with X-Road.

Let’s go back to the different data exchange scenarios mentioned earlier – internal, national and cross-border. X-Road is a good fit for national and cross-border data exchange, and it can be used for certain internal data exchange use cases too. An API gateway can basically be used for all the scenarios, but depending on the use cases and their requirements X-Road might be a better choice for external data exchange and a service mesh for internal data exchange. Last but not least, a service mesh is best suited for the internal scenario for microservice-based applications.

Disclaimer

Finally, it must be said that there’s one major difference between X-Road, an API gateway and service mesh that has not been brought up yet. API gateway and service mesh are architecture patterns which have multiple implementations that all have their own set of features and functionalities. In this blog post I have compared API gateway and service mesh to X-Road on a general level without referring to any specific implementation, solution or product. Instead, X-Road is a product with a specific set of features and functionalities. This means that conceptually X-Road, API gateway and service mesh are not the same thing.