Status Script

Purpose

The status script displays a summary of the state of the Appian engines and associated other services as well as a list of any active alerts.

Run this script periodically in order to monitor the state of the Appian engines.

Location

<APPIAN_HOME>/services/bin/status.sh (.bat)

Options

Short Name Long Name Required Meaning
-h –help No Show usage information
-p –password Yes Password for Admin REST API
-c –cluster No Show status of all nodes in cluster
-nc –no-color No Turn off colored output
-wfr –wait-for-running No Wait for all engines in topology to be in RUNNING state with leaders elected

Usage

1
./status.sh -p <password> -c

Alerts

Below is a listing of the alerts that can be reported by the status script, their possible causes, and likely steps to resolve.

Service Manager on host [host] is unreachable

The service manager instance on the specified host is either not running, not responsive, or blocked from network traffic from the host where the status script ran. Restart the down service manager instance or unblock the network connection.

Kafka broker on [host] is unreachable

The Kafka broker instance on the specified host is either not running, not responsive, or blocked from network traffic from the host where the status script ran. Restart the down Kafka broker instance or unblock the network connection.

Zookeeper on [host] is unreachable

The Zookeeper instance on the specified host is either not running, not responsive, or blocked from network traffic from the host where the status script ran. Restart the down Zookeeper instance or unblock the network connection.

Engine [engine name] does not have a primary

Every Appian engine needs an instance in a PRIMARY state in order to service requests from the rest of the application. If there is no instance in a PRIMARY state, either start an instance if none are running or make sure that the other components of the system, like Kafka and Zookeeper, are up and running as well.

The [engine name] engine on [host] has been checkpointing for [time period]

When checkpoints take a long time to complete, the likely causes are either very slow disk I/O speeds or other resource constraints like CPU utilization.

The [engine name] engine on [host] has not checkpointed in over [time period]

Running for a long time without checkpointing leaves the system at increased risk in the case of a disaster recovery scenario. Run the checkpoint script and make sure the checkpoint completes. If the checkpoint fails, confirm that the /services/data/temporary/, /services/data/archived/, and /server/**/gw1/ directories are writable by the user Appian is running as.

The [engine name] engine on [host] has not checkpointed in over [transaction count] transactions

Running for a long time without checkpointing leaves the system at increased risk in the case of a disaster recovery scenario. Run the checkpoint script and make sure the checkpoint completes. If the checkpoint fails, confirm that the /services/data/temporary/, /services/data/archived/, and /server/**/gw1/ directories are writable by the user Appian is running as.

The [engine name] engine on [host] has not checkpointed even though the estimated replay time is over [time period]

Running for a long time without checkpointing leaves the system at increased risk in the case of a disaster recovery scenario. Run the checkpoint script and make sure the checkpoint completes. If the checkpoint fails, confirm that the /services/data/temporary/, /services/data/archived/, and /server/**/gw1/ directories are writable by the user Appian is running as.

Engine [engine name] on [host] has a load metric of [load metric] and the configured limit is [MAX_EXEC_ENGINE_LOAD_METRIC]

The load metric is measure of how much process data each execution engine contains. When the amount of process data reaches the configured load metric, new processes will not start on that execution engine anymore. When the MAX_EXEC_ENGINE_LOAD_METRIC is reached, the options are either to delete processes that are running on that execution engine or to increase the configured load metric limit.

Kafka topic [topic name] in sync replica count [replica count] is below minimum threshold of [replica count], refer to documentation for further guidance. Brokers not in sync: [hosts]

Transaction data is not fully replicating throughout the Kafka cluster, leading to an increased risk of data loss in the event of a server failure. Restart the Kafka brokers listed at the end of the alert, one at a time, to force a re-sync.

This alert is only applicable to high-availability configurations.

FEEDBACK