Configuring Checkpointing

Appian engines are real-time in-memory (RAM) databases that also persist all data (database and transaction log) in a file on disk. These database files have the extension .kdb. We refer to them as KDB files, which must be treated as a set. Each write transaction is applied immediately to the engine in memory and saved to the transaction log.

When a checkpoint completes, the transaction log is applied to the database and the transaction log is cleared. Each checkpoint writes out a new KDB file with a new number appended to the KDB file name. For example, the name of a personalization engine KDB file that has not been checkpointed is ag0.kdb, but after checkpointing there will be a new KDB file named ag1.kdb. KDB file numbers may increase by more than one for each checkpoint. Files with a higher number are guaranteed to have come after earlier numbered files. When a checkpoint occurs, the old KDB file is left unmodified but is moved to the services/data/archived directory. Schedule the cleanupArchives script to run on a daily basis to cleanup older KDB files out of the kdb archive directory.

During a KDB checkpoint, a temporary *.writing file is created in the services/data/temporary, which gets deleted when the checkpoint is completed. If the checkpoint fails for any reason, the *.writing file may remain on the file system. All pending transactions wait in the queue or timeout, depending on how checkpointing is configured.

While an engine is checkpointing, that engine is unavailable to service transactions from the application server and transactions wait in the queue until the checkpoint is completed or the transaction times out. If there are multiple instances of an engine, only one of them will become unavailable because of the checkpoint, leaving the remaining copies to service requests. If you see transactions timeout during checkpointing, Appian recommends adding more copies of the relevant engines.

For sites that only run one instance of each engine, Appian recommends checkpointing outside of regular business hours to avoid disruption of service. For sites that have multiple instances of each engine, Appian recommends keeping the default checkpoint scheduling configurations.

A proper shutdown of the engines automatically completes a checkpoint. An improper shutdown (such as an abrupt stop using the kill command, a power outage, or an OS shutdown) of the engines does not complete a checkpoint.

When an Appian engine starts up, it loads the KDB file into memory and applies any transactions from the transaction log that had not already been applied to that KDB file.

If there has been a long duration since the last checkpointing of the engines, a large number of transactions may need to be loaded upon start-up, which can lead to increased startup time.

The memory consumed by the engine is utilized both for storing data and for processing data.

Running the Checkpoint Script

The checkpoint (checkpoint.sh or checkpoint.bat) script can be executed manually from the following location.

1
<APPIAN_HOME>/services/bin/

Configuring Checkpointing Frequency

Appian schedules checkpoints automatically based on the amount of time since the last checkpoint and the estimated amount of time it would take to replay the transactions since the last checkpoint.

To adjust the thresholds, set the following values in custom.properties:

Threshold Property Default Value Max Value
Time since last successful checkpoint serviceManager.checkpoint.automatic.boundary.time 22 hours 30 hours
Estimated time to replay transaction log from last successful checkpoint serviceManager.checkpoint.automatic.boundary.replay 20 minutes 30 hours

The time-based configurations in the table above should be specified in the format "number units." So "5 minutes" or "12 hours." If no units are specified the assumed unit is milliseconds.

A restart is required for changes to these properties to take effect.

While automatic checkpointing cannot be disabled, it is possible to configure these thresholds to avoid triggering checkpoints during business hours. Raise the values in the table above to their max value (30 hours) and run the checkpoint script at an off-peak time of day using a cron job or a scheduled task, which will reset the criteria the automatic checkpoint scheduler uses to trigger checkpoints.

FEEDBACK