High Availability and Distributed Systems

This page explains the concepts and methods for distributing an Appian installation.

When and Why to Have a Distributed Installation

There are three principle reasons to run a distributed installation of Appian:

  1. High Availability
  2. Scaling and Load Balancing
  3. Segregation

Windows

High-availability and load balancing are not available for Windows environments because running more than one instance of the Appian engines, Kafka, or Zookeeper is not a supported configuration in Windows environments.

High Availability

In order to have a highly-available installation of Appian, it needs to be robust to potential hardware failures. This is only possible if every service that comprises Appian has more than one instance, running on different servers, such that the unexpected loss of one server does not take out all instances of any service. Servers in a high-availability installation may be spread across separate data centers as long as there is low (less than 10ms) network latency between the data centers.

Related documentation for high availability with Appian:

High Availability for Appian Cloud Information about the high-availability offering for Cloud customers.
How to Setup a Highly Available system Instructions that illustrate a recommended, high-availability configuration for on-premises Appian systems.
Setting Up for Disaster Recovery Information that may help planning a high-availability topology.

Scaling and Load Balancing

High-load sites may have more demand for a service than a single instance of it can provide. In this situation, adding additional instances of one, or many, services can increase the capacity of the installation.

Segregation

Some customers have requirements to only run one instance of each service, but want them to run on separate servers for capacity (a single server is not large enough to host all services) or security (a desire to host data-persistence services in a different network zone, for example) reasons.

How to Configure a Distributed Installation

This section explains how to configure a distributed installation of Appian.

Planning the Topology

The first step in setting up a multiple server configuration is mapping out which servers will run the various architectural components of the Appian software. The distribution of the architectural components across one or more servers on a network is referred to by the documentation and the product as the "topology."

The following components of Appian can be configured to run on the same physical machine or on separate machines:

  • Appian Java application
  • Search Server
  • Appian Engines
  • Kafka
  • Zookeeper
  • RDBMS

Similarly, each component can be clustered independently. For example, an environment may choose to have two instances of application servers and three instances of the search server deployed.

Handling the Data Server

Currently, the Data Server cannot be configured for high availability; the only effect of this is that during a server failure application patches will be temporarily unavailable.

The Data Server can be set up to run on a single node by specifying the data-server-cluster configuration in appian-topology.xml file. Because the data server, like the Appian engines, uses Kafka, the data server should be collocated with the engines.

Important High-Availability Topology Checklist

When planning topology for a high-availability installation, ensure it meets the following criteria:

  All Servers are Linux Environments
  Exactly Three (3) Instances of Search Server, Kafka, and Zookeeper
  At Least Two (2) Instances of the Application Server and Appian Engines
  No More Than One (1) Instance of Any Appian Engine on a Single Server
  Multiple Instances of Kafka and Zookeeper Only If There Are Multiple Instances of All Appian Engines
All Servers are Linux Environments

Clustering the Appian engines, Kafka, or Zookeeper is only supported on Linux environments, not on Windows environments.

Due to the way file names and file paths are calculated for documents stored in Appian, the application server and engine servers must be on servers using the same type of operating system. Do not mix Windows and Linux.

Exactly Three (3) Instances of Search Server, Kafka, and Zookeeper

No more or less than three (3) instances of the search server, Kafka, and Zookeeper components in total. Components that require establishing consensus between the different instances (search server, Kafka, and Zookeeper) require three instances in order to have a system that is robust to a failure of one of the instances. Appian does not support more than three of each of these components. And no more than three instances can be configured.

At Least Two (2) Instances of the Application Server and Appian Engines

Components that do not establish consensus (the application server and the Appian engines) require at least two instances in order to be robust to a failure of one instance. While only two instances are required, Appian recommends that high-availability installations have at three instances of each of these services.

No More Than One (1) Instance of Any Appian Engine on a Single Server

Only one instance of any given Appian engine may run on a given server.

Multiple Instances of Kafka and Zookeeper Only If There Are Multiple Instances of All Appian Engines

Configuring multiple Kafka and Zookeeper instances can provide additional resiliency for those services in the event of a hardware or network failure, but the additional resiliency for the system as a whole will only be achieved if there are also multiple copies of all of the Appian engines. If there is only one instance of a certain type of engine, the risk from adding additional components to the system (and therefore adding additional opportunities for failure) outweighs the benefit from adding resiliency to the Kafka and Zookeeper layers. Appian recommends only running multiple instances of Kafka and Zookeeper if you also have multiple instances of all Appian engines.

Network Configuration

Distributed installations require static IP addresses for each server. You must have a static IP address assigned to each machine prior to configuring your distributed installation. If you have not done so already, assign static IP addresses to each machine you plan to use to host Appian.

You must also verify that each machine can communicate with the others over the network over the ports that Appian uses.

Install Appian on Each Machine

Install a full version of Appian on machines that you wish to host any Appian component. Regardless of whether the machine is intended to run just Appian Engines, just the main Java application, just a Search Server node, or some combination thereof, the full installation should exist on each server in the environment in order to eliminate the possibility of misconfiguration due to missing components. An Appian installation is not required on the machine running the RDBMS.

Each installation of Appian must be of the same version with the same hotfixes.

Configuration

Configuration File Consistency

When running across multiple servers, it is especially important to make sure that they are configured the same. All configuration files, such as appian-topology.xml, custom.properties, and others, must be the same on all servers.

Topology XML File

The way to specify which components of Appian run on which hosts is with the appian-topology.xml file, located in <APPIAN_HOME>/conf/. Example configurations can be found in appian-topology.xml.example, which is located in the same directory.

When specifying hostnames in the appian-topology.xml file for a distributed installation, you must not use "localhost" as that will resolve differently on the different machines in the cluster. Hostnames specified in appian-topology.xml must exactly match the host value that is marked with _h in the output from _admin/_scripts/licinfo.sh (.bat).

An appian-topology.xml file that is empty, contains only XML comments, or contains invalid XML will result in the engines using the default topology.

Engine Security Token

As part of a distributed installation, it is a requirement to copy the appian.sec file across all machines in the distributed environment, for it is necessary to enable authorized connections between the engines and specified application servers. It is located in <APPIAN_HOME>/conf/.

Refer to Appian Engine Connection Restrictions for more information.

Service Manager Password

As part of a distributed installation, it is a requirement to copy the service_manager.conf file located in /services/conf/ across all machines in the distributed environment, for it is necessary to enable authorized connections to the service manager and the engines across machines.

The service_manager.conf file is created when running the password script.

Scheduling Checkpoints

When moving to a high-availability configuration you should also remove any custom configurations for checkpoint scheduling. High-availability installations should use the default values for these configurations as engines do not become unavailable when checkpointing when there is more than one set of engines.

Shared Files

The following directories must be shared across all servers that run that component. All servers that run the given component need both read and write access to these directories.

Component Name Folder Name
Application Server APPIAN_HOME/_admin/accdocs1/
Application Server APPIAN_HOME/_admin/accdocs2/
Application Server APPIAN_HOME/_admin/accdocs3/
Application Server APPIAN_HOME/server/archived-process/
Application Server APPIAN_HOME/_admin/search/
Application Server APPIAN_HOME/server/msg/
Application Server APPIAN_HOME/_admin/mini/
Application Server APPIAN_HOME/_admin/models/
Application Server APPIAN_HOME/_admin/plugins/
Application Server APPIAN_HOME/_admin/process_notes/
Application Server APPIAN_HOME/_admin/shared/
Channels Engine APPIAN_HOME/server/channels/gw1/
Content and Collaboration Statistics Engines APPIAN_HOME/server/collaboration/gw1/
Forums Engine APPIAN_HOME/server/forums/gw1/
Notifications and Notifications Email Engines APPIAN_HOME/server/notifications/gw1/
Personalization Engine APPIAN_HOME/server/personalization/gw1/
Portal Engine APPIAN_HOME/server/portal/gw1/
Process Analytics 00 Engine APPIAN_HOME/server/process/analytics/0000/gw1/
Process Analytics 01 Engine APPIAN_HOME/server/process/analytics/0001/gw1/
Process Analytics 02 Engine APPIAN_HOME/server/process/analytics/0002/gw1/
Process Design Engine APPIAN_HOME/server/process/design/gw1/
Process Execution 00 Engine APPIAN_HOME/server/process/exec/00/gw1/
Process Execution 01 Engine APPIAN_HOME/server/process/exec/01/gw1/
Process Execution 02 Engine APPIAN_HOME/server/process/exec/02/gw1/

If you have more than the default three shards of Process Execution and Process Analytics, the gw1 directories for those shards must be shared across servers as well.

The recommended approach for sharing directories between servers is:

  1. Set up a central network attached storage server
  2. Create a directory structure on the storage server that mirrors the directories listed in the table above
  3. Replace the above directories on each server with links to the corresponding directory on the network attached storage server

Both Kafka and Zookeeper are sensitive to latency with regard to CPU, memory, and disk contention. For high-load sites and any site that has multiple Kafka or Zookeeper instances, Appian recommends having enough CPUs on the machines that host these services such that they each have at least one CPU reserved for their use. For example, if you have the default 15 engines, Kafka, Zookeeper, and service manager all on a single server on a heavily-loaded system, that server should have at least 18 CPUs. Appian also recommends keeping the data directories for these two components (services/data/kafka-logs and services/data/zookeeper) on local disks rather than mounting them onto network drives. This recommendation is consistent with industry best practices for these services.

Shared Logs

In addition to the above directories, which must be shared across servers to have a functioning system, many administrators choose to share application logs between servers for ease of access by linking the /logs directory on the local machine to /shared-logs/<local machine name> directory on a network attached storage server and adding a link from APPIAN_HOME/shared-logs to the shared-logs directory on the network storage device.

Appian Health Check's data collection step will look for a directory named "shared-logs" directly inside the APPIAN_HOME directory and will collect logs inside any subdirectories found there.

APPIAN_HOME/shared-logs/<machine A>
APPIAN_HOME/shared-logs/<machine B>
APPIAN_HOME/shared-logs/<machine C>

With this shared logging configured, the data collection step of Health Check only needs to be run on a single server rather than run once on each server.

Load Balancing Application servers

Follow the steps for Load Balancing Multiple Application Servers to route traffic from your web servers across multiple application servers.

When deploying Appian via the configure script, ensure that the names you use in the Configure Tomcat clustering by specifying a node name step match the node names specified in the web server's config file.

How to Run a Distributed Installation

Starting

The procedure for starting a distributed installation of Appian is not different than when starting a non-distributed installation of Appian except that you must start all instances of a given component, across all servers, before moving onto the next component. First make sure the RDBMS is running, then start all of the engines, then start all instances of the the search server, then start the application servers.

If the Appian engines are running on different servers than Kafka & Zookeeper, either can be started first. The engines will wait for Kafka & Zookeeper before they become available.

Stopping

The procedure for stopping a distributed installation of Appian is not different than when stopping a non-distributed installation of Appian except that you must stop all instances of a given component, across all servers, before moving onto the next component. First shut down all application servers, then shut down all instances of the search server, then shut down all of the engines (using the --cluster option of the stop script.

FEEDBACK