Troubleshooting with Diagnostic Scripts

Check Engine Script

To check the responsiveness, availability, and status of the Appian engine servers, run the check engine script by performing the following steps.

  1. Open <APPIAN_HOME>/server/_scripts/diagnostic/.
  2. Run checkengine.bat (.sh) - OR - checkengine.bat (.sh`) with an argument from the table below.
Argument Description
gw-xxx.xml The name of the application's gateway XML file, if you do only want to query a specific gateway. By default, all gateways are queried.
-d Request detailed information for each gateway.
-s Displays the status summary information for each gateway.
-i MAX_TIME_TO_CHECK warning will continue to trigger if needed. This argument is useful for sites that perform manual checkpoints, rather than automated checkpointing.
  • TRANSACTION_SIZE_LIMIT_MB
  • TRANSACTION_COUNT_LIMIT
  • TRANCSACTION_REPLAY_TIME_SECONDS

Script Output - Summary Mode

Running the check engine script in the default mode (without any arguments and without listing any gateway configuration files) provides a current status summary for all application gateways.

The script output displays a status (OKAY, ERROR, WARNING, or FATAL error) for each application gateway. If the status is not OKAY, then an error description is also listed.

When an engine is down, it returns the status FATAL. This result can also occur when an engine is checkpointing or is busy. If you receive a negative result, we recommend running the script again after a short delay to verify that it did not return a false problem indication.

The check engine script might display any of the errors and messages listed below.

Error Type Error Detail Comments
WARN WRONG_DBID Both the gateway process and the server reference the ID of the current real-time database file (KDB). This error indicates that the ID values no longer match. This ID is set by the gateway process when the server has finished booting. A mismatch likely means that the database did not boot correctly (perhaps due to a transaction replay error, or an engine migration failure).
TEMP_CHECK_FAIL This warning indicates intermittent checkpointing failures. It occurs when some of the previous checkpoint attempts have failed but the last attempt was successful.
NEAR_LOAD_METRIC_LIMIT This warning indicates that an execution engine is approaching the load metric limit.

See also: MAX_EXEC_ENGINE_LOAD_METRIC
MAX_TIME_TO_CHK This warning indicates that an engine has not checkpointed within the monitoring period (24 hours by default). This warning indicates that we recommend running the Checkpointing script.
TRANSACTION_SIZE_LIMIT_MB This warning indicates that the transactions in an engine have reached the monitored amount of memory (512MB by default).
When triggered, this warning indicates that we recommend running the Checkpointing script.
When the script is run with `-i`, this message is suppressed from triggering.
TRANSACTION_COUNT_LIMIT This warning indicates that the number of transactions stored in memory for an engine have reached the monitored threshold (100000 by default).

When triggered, this warning indicates that we recommend running the Checkpointing script.
When the script is run with `-i`, this message is suppressed triggering.
TRANSACTION_REPLAY_TIME_SECONDS This warning indicates that the estimated amount of time (in seconds) that an engine is anticipated to spend replaying transactions in the event of a restart has reached the monitoring threshold (1800 by default).

An engine with the default amount of estimated transaction replay is expected 30 minutes to start-up.

When triggered, this warning indicates that we recommend running the Checkpointing script.

When the script is run with `-i`, this message is suppressed from triggering.
Missing users: ,`UserC ,`UserB ,`UserA This warning indicates that the users listed are registered with other engines, but not with the engine that is reporting them missing.

If the issue is encountered, wait at least 60 seconds and run the diagnostic again before opening a case with Appian Technical Support.

The warning can be expected to occur transiently and very briefly when new users are added to the system.
Missing content: ,`RuleC ,`RuleB ,`RuleA This warning indicates that the rules and constants listed are registered with other engines, but not with the engine that is reporting them missing.

If the issue is encountered, wait at least 60 seconds and run the diagnostic again before opening a case with Appian Technical Support.
Missing types: ,`TypeC ,`TypeB ,`TypeA This warning indicates that the types listed are registered with other engines, but not with the engine that is reporting them missing.

If the issue is encountered, wait at least 60 seconds and run the diagnostic again before opening a case with Appian Technical Support.
ERROR SERVER_TIMEOUT The server did not respond within the specified timeout period.
CHECKPOINT_FAIL All attempts to perform a checkpoint have failed.
DETECT_SHUTDOWN The engine server restarted due to a transaction rollback.
PAST_LOAD_METRIC_MAX The execution engine is past the calculated load metric limit. No new processes can be started.

See also: MAX_EXEC_ENGINE_LOAD_METRIC
FATAL BROKEN_CHAIN This can only occur if multiple gateways are configured. It indicates that the transaction updates are not replicating. This might occur during a network outage if the gateways reside on separate machines.
CONT_SHUTDOWN Transactions are repeatedly being rolled back, causing the engine to enter a restart loop.

See also: Running the Checkpoint Script

Configuring the Check Engine Script

The following script files list the parameters that can be configured in the checkengine script by editing the checkengine.bat (.sh) script located at <APPIAN_HOME>/server/_scripts/diagnostic.

You can add the following parameters (for the Windows batch file) to configure your checkengine script execution. (Similar properties can be set in the exports.sh script.)

@echo off
call ..\exports.bat
set OLDCD=%CD%
cd %AE_SVR%\gateway

REM ###########################
REM
REM   Customizable arguments
REM
REM ###########################

REM Port number used by the Main application of Check Engine 
 set PORT=33333      
 
REM Time (seconds) to wait in contacting a gateway before timing out
 set TIMEOUT=30      

REM The amount of checkpoint failures to display per \*APPLICATION\* not gateway. 0 means show all.
 set MAX_LOGS_CHECK=10   

REM The amount of tailpoint failures to display per \*APPLICATION\* not gateway. 0 means show all.
 set MAX_LOGS_TAIL=10    

REM The amount of engine shutdowns (possible rollbacks, write_to_disk_failures)  to display per \*APPLICATION\* not gateway. 0 means show all.
 set MAX_LOGS_CLOSE=10   

REM The maximum amount of time (in hours) that an engine can remain without checkpointing before a warning is generated. 
 set MAX_TIME_TO_CHK=24  

REM How big (percentage of the maximum load metric) can Exec & Analytics get before alerting the user.
 set WARN_LOAD_METRIC=75

REM Specify the amount of RAM the transactions in a dual or triple gateway engine can consume before triggering a warning when the checkengine script is executed. The default setting is 512MB. 
 set TRANSACTION_SIZE_LIMIT_MB=512

REM Specify the number of transactions stored in RAM (for a dual or triple gateway engine) can accumulate before triggering a warning. The default setting is 100000. 
 set TRANSACTION_COUNT_LIMIT=100000

REM Specify the amount of time in seconds that a (dual or triple gateway) engine is anticipated to spend replaying transactions in the event of a restart. A warning is generated when this calculated value reaches the specified threshold.
 set TRANSACTION_REPLAY_TIME_SECONDS=1800

set CUSTOM_ARGS=%TIMEOUT% %MAX_LOGS_CHECK% %MAX_LOGS_TAIL% %MAX_LOGS_CLOSE% %MAX_TIME_TO_CHK% %WARN_LOAD_METRIC%  %TRANSACTION_SIZE_LIMIT_MB% %TRANSACTION_COUNT_LIMIT% %TRANSACTION_REPLAY_TIME_SECONDS%

REM ###########################
REM
REM   Severity Levels
REM
REM ###########################

 set WARN=1
 set ERROR=2
 set FATAL=3



REM ###########################
REM
REM   Issue Severity
REM
REM ###########################
 
 set BROKEN_CHAIN=%FATAL%

 set SERVER_TIMEOUT=%ERROR%
 
 set NO_INSTANCE_UP=%FATAL%
 
 set WRONG_DBID=%WARN%
 
 set CHECKPOINT_FAIL=%ERROR%

REM Intermittent checkpoint failures
 set TEMP_CHECK_FAIL=%WARN% 

 set SHOULD_CHECKPT=%WARN%

REM Continuous shutdowns of an engine
 set CONT_SHUTDOWN=%FATAL%  

REM Non-continuous shutdowns
 set DETECT_SHUTDOWN=%ERROR%     

 set NEAR_LOAD_METRIC_MAX=%WARN%

 set PAST_LOAD_METRIC_MAX=%ERROR%

REM An Exec server has stopped updating Analytics
 set NO_EXEC_UPDATE=%ERROR%
 
set SEVERITY= %BROKEN_CHAIN% %SERVER_TIMEOUT% %NO_INSTANCE_UP% %WRONG_DBID% %CHECKPOINT_FAIL% %TEMP_CHECK_FAIL% %SHOULD_CHECKPT% %CONT_SHUTDOWN% %DETECT_SHUTDOWN% %NEAR_LOAD_METRIC_MAX% %PAST_LOAD_METRIC_MAX% %NO_EXEC_UPDATE% 

 %APPIAN_EXEC% checkEngineMain -i %PORT% %CUSTOM_ARGS% %SEVERITY% %1 %2 %3 %4 %5 %6 %7 %8 %9

@cd %OLDCD%
@echo on

Other Troubleshooting Scripts

Rollbacks

If there is a transaction that brings the data repositories to an inconsistent state, Appian services will undo the transaction to regain the previous consistent state. These transactions are written to a rollback log file in the <APPIAN_HOME>/logs/ directory. Use the <APPIAN_HOME>/server/_scripts/diagnostic/convert_l_to_text.bat (.sh) script to convert these logs to text for review.

A Service Cannot Restart

Work with Appian Technical support, and use the <APPIAN_HOME>/server/_scripts/diagnostic/validate.bat (.sh) to verify the integrity of an Appian Engine.

See also: Database Integrity

Disk Usage

You can determine the disk usage of specific process models, process instances, and nodes in a process using sizing.bat (.sh).

See also: Identifying Process Memory Usage

FEEDBACK