Configuring the Data Service

Beginning with Appian 25.4, all new versions of Appian will require containers managed by Kubernetes to run in a self-managed environment. Review the 24.4 release notes and frequently asked questions to learn more.

Overview

This page provides information for self-managed customers configuring the data service. The data service topology and setup are required for running an Appian site. If the data service is unreachable when the application starts up or if the application is started before the data service is running, the application server will not start. Additional configurations can be used to scale the data service to support higher scale usage of synced record types.

Topology and Setup

The data service configuration is specified in data-server-cluster element in the appian-topology.xml file, located in both the <APPIAN_HOME>/ae/data-server/conf/ and <APPIAN_HOME>/ae/conf/ directories:

<topology>
    ...
    <data-server-cluster>
        <data-server host="machine1.example.com" port="5400" rts-count="2"/>
    </data-server-cluster>
</topology>
Copy

For a high availability configuration, specify three instances of the data service on different machines:

<topology>
    ...
    <data-server-cluster>
        <data-server host="machine1.example.com" port="5400" rts-count="2"/>
        <data-server host="machine2.example.com" port="5400" rts-count="2"/>
        <data-server host="machine3.example.com" port="5400" rts-count="2"/>
    </data-server-cluster>
</topology>
Copy

You must specify the data-server-cluster configuration in appian-topology.xml. Copy the appian-topology.xml file from <APPIAN_HOME>/conf/ into <APPIAN_HOME>/data-server/conf/ before starting the data service. Copies of the topology file in each location must always be in sync, irrespective of the configurations specified.

Host

Set the host attribute in the data-server element to the name of the machine hosting the data service. If the host is not specified, the following error is printed in watchdog.log:

"ERROR com.appian.data.server.Watchdog - data-server host must be specified"

Port

You can define the port attribute on both the data-server-cluster and the data-server elements. If the port attribute is defined on both, the data-server element takes precedence. If not supplied, the default port number is 5400.

Caution: In order for the data service to function properly, make sure that you open all of the required ports. See Port Usage for more information.

Licensing

A valid license (k4.lic) is required to run the data service. See Requesting and Installing a License for information on obtaining and installing a k4.lic license.

Security

Requests to the data service are secured with a security token that's unique to every customer environment:

For Appian Cloud customers, this token is generated during the site deployment.
For self-managed customers, this token is generated by the configure script.

If the token has not been set properly, the data service will not start, which will result in the application server not starting. See Data Service Connection Restrictions for more information.

Changing the topology

Topology changes can include:

Add or remove data service nodes.
Change the host name.
Change the port.
Change the real-time store count.

To make these changes, you'll need to make configuration changes to the data-server-cluster parameter and restart both the data service and the application server by following these steps:

Stop the data service on all of the servers:
- <APPIAN_HOME>/data-server/bin/stop.sh (.bat on Windows)
Delete the <APPIAN_HOME>/ae/data-server/node/election directory from all the servers. This directory contains runtime data that needs to be deleted when certain topology changes are made.
If adding nodes (for example, when migrating from a single node to High Availability), copy the <APPIAN_HOME>/ae/data-server/data directory to the new servers.
Make the required topology change on all of the servers.
Start the data service on all of the servers in any order:
- <APPIAN_HOME>/data-server/bin/start.sh (.bat on Windows)
Restart the application server on all servers:
- To stop the server: <APPIAN_HOME>/tomcat/apache-tomcat/bin/stop-appserver.sh (.bat on Windows)
- To start the server: <APPIAN_HOME>/tomcat/apache-tomcat/bin/start-appserver.sh (.bat on Windows)

A data-server node element must be present and must be structured similarly to the following:

<data-server host="machine3.example.com" port="5400" rts-count="2"/>
Copy

If there is no data-server node element specified, the following error is printed in watchdog.log:

"ERROR com.appian.data.server.Watchdog - At least one data-server node must be specified"

Monitoring and Troubleshooting

Starting and stopping

To start or stop the data service, refer to Starting and Stopping Appian.

Tip: When logging out of Windows, the data service process started by the user using the script will stop.

Consider installing the data service as a Windows service and using the Windows Service management console to start and stop the service. For instructions, see Installing the data service as a Windows Service.

Logging

Each component of the data service writes its logs to the <APPIAN_HOME>/logs/data-server/ directory:

Historical Store: hs-gateway.log
Real-time Store: rts-gateway-*.log
Appender: appender-gateway.log
Bulk Ingestion: binge-gateway.log
Data Client: client.log
Watchdog: watchdog.log

The log files contain important information about startup and shutdown, process execution, configuration, and errors. In the event of a system issue, these files should be shared with Appian Support. Note that for the real-time store components, the logs are enumerated as rts-engine-0.log, rts-engine-1.log, etc. for each real-time store component.

Note: The <APPIAN_HOME>/logs/data-server/ directory will always be free of any customer business data, and can be safely exported without any risk of exposing sensitive data.

The data service also logs other data, including performance metrics and traces. See Logging for a more comprehensive overview of Appian logs.

Recovery and monitoring

Watchdog continuously monitors each component of the data service and restores functionality of each component in the event of an isolated failure.

To validate that the data service is running correctly, execute the <APPIAN_HOME>/data-server/bin/health.sh script (health.bat on Windows). The following information is displayed after executing the health script:

For the data service cluster:

node_count: Number of nodes in the cluster.
healthy: true if the data service is functioning normally, otherwise false.

For each node in the data service cluster:

hostname: Host name of the node.
ip: IP address of the node.
healthy: true if the data service is functioning normally on this node, otherwise false.

Transaction Log

The data service uses the Kafka Topic ads2_tx_effects to persist and distribute transactions. The default retention time is one hour, but you can change the retention time by adding kfk.trunc.buffer.seconds to {APPIAN_HOME/data-server/conf/custom.properties and setting it to the appropriate amount of time. This property should be set in seconds.

File system

The <APPIAN_HOME>/ae/data-server/ directory stores the data service binaries, scripts, configuration details, and data.

The data files are located in the <APPIAN_HOME>/ae/data-server/data/ directory. The ss folder contains all snapshot files. Since access to data service is latency-sensitive, it is recommended that the data is hosted locally on the machine, rather than a shared drive or an external drive, such as shared network-attached storage (NAS). This is true in High Availability (HA) topologies as well, since each data service node stores its own version of the data.

For disaster recovery purposes, the <APPIAN_HOME>/data-server/data/ directory and the Kafka logs should be backed up regularly.

See Internal Data for a comprehensive overview of where Appian persists data on the file system.

Troubleshooting

If the data service cannot start and the watchdog.log indicates an issue with the security token, see Data Service Connection Restrictions for troubleshooting.

If the data service stops running while the application server is running, any record types with data sync enabled will be temporarily inaccessible. See Troubleshooting Data Sync for more information.

Sizing guidance

After configuring the data service, the amount of disk space and memory consumed by the data service will vary based on data volumes and usage patterns.

Disk space

After it is started for the first time in your environment, the data service is expected to take approximately 50 MB of disk space by default. If a site is not syncing record data, the data service will not occupy additional disk space.

Any additional disk space usage by the data service is expected to be proportional to the total amount of data synced into Appian. Data synced into the data service is compressed to optimize for storage efficiency. The exact disk usage will vary depending on the compression ratio and the current state of data sync activities and data service background processes running on the site.

Note: Background processing in the data service requires sufficient disk space to be available. When over 90% of disk space is used on a site, the data service will halt background processing until sufficient space is cleared or additional disk space is added.

Memory

In total, all data service components require a minimum of approximately 1GB of memory to run. Additional memory usage will vary based on the current activity on the site. For cloud sites and sites running in Appian on Kubernetes, memory is bounded by a configurable container memory limit.

During periods of high write load, memory for each real-time store and appender component will increase proportionally to the amount of data written to the data service. Background processing is triggered automatically based on write usage to ensure memory stays within the configured container limit.

Memory spikes are expected for real-time store components during query execution. The magnitude of the spikes will vary based on the complexity of query operations and the volume of data being queried. The maximum memory usage for individual real-time store components varies based on the amount of available memory for a site. See configuring the real-time store for more detailed information about memory and compute configurations.

Configuring the real-time store

The rts-count attribute specifies the number of real-time stores in the data service. The real-time store component is responsible for processing queries to the data service. The rts-count is set to 2 by default, but can be increased as needed to support higher query throughput. The maximum recommended rts-count varies based on the amount of memory available as shown above.

In addition to configuring the number of real-time stores, you can configure limits on the amount of memory (rts.queryMemoryLimits.circuitBreaker.threshold.bytes) and number of threads (rts.secondaryThreads.num) that each real-time store uses when processing queries through the custom properties file. Increasing the amount of memory and number of threads will improve the performance of queries against synced record types, but may result in higher resource usage during query execution.

For self-managed customers, the rts.queryMemoryLimits.circuitBreaker.threshold.bytes is unenforced and the rts.secondaryThreads.num property is set to 1 by default. In addition to the guidelines based on available memory resources listed below, it is recommended that the rts.secondaryThreads.num * rts-count be set less than or equal to the number of available CPU cores.

Available memory	Real-time store count	thread count	memory limit
Less than 32GB	2	1	1GB
Between 32GB - 63GB	4	1	2GB
Between 64GB - 127GB	4	4	4GB
Between 128GB - 383GB	8	4	8GB
Greater than or equal to 384GB	12	8	8GB

Feedback

Was this page helpful?

Configuring the Data Service

OverviewCopy link to clipboard

Topology and SetupCopy link to clipboard

HostCopy link to clipboard

PortCopy link to clipboard

LicensingCopy link to clipboard

SecurityCopy link to clipboard

Changing the topologyCopy link to clipboard

Monitoring and TroubleshootingCopy link to clipboard

Starting and stoppingCopy link to clipboard

LoggingCopy link to clipboard

Recovery and monitoringCopy link to clipboard

Transaction LogCopy link to clipboard

File systemCopy link to clipboard

TroubleshootingCopy link to clipboard

Sizing guidanceCopy link to clipboard

Disk spaceCopy link to clipboard

MemoryCopy link to clipboard

Configuring the real-time storeCopy link to clipboard

FeedbackCopy link to clipboard

Overview

Topology and Setup

Host

Port

Licensing

Security

Changing the topology

Monitoring and Troubleshooting

Starting and stopping

Logging

Recovery and monitoring

Transaction Log

File system

Troubleshooting

Sizing guidance

Disk space

Memory

Configuring the real-time store

Feedback