Beginning with Appian 25.4, all new versions of Appian will require containers managed by Kubernetes to run in a self-managed environment. Review the 24.4 release notes and frequently asked questions to learn more. |
OverviewCopy link to clipboard
This page provides information for self-managed customers configuring the data service. The data service topology and setup are required for running an Appian site. If the data service is unreachable when the application starts up or if the application is started before the data service is running, the application server will not start. Additional configurations can be used to scale the data service to support higher scale usage of synced record types.
Topology and SetupCopy link to clipboard
The data service configuration is specified in data-server-cluster
element in the appian-topology.xml
file, located in both the <APPIAN_HOME>/ae/data-server/conf/
and <APPIAN_HOME>/ae/conf/
directories:
1
2
3
4
5
6
<topology>
...
<data-server-cluster>
<data-server host="machine1.example.com" port="5400" rts-count="2"/>
</data-server-cluster>
</topology>
Copy
For a high availability configuration, specify three instances of the data service on different machines:
1
2
3
4
5
6
7
8
<topology>
...
<data-server-cluster>
<data-server host="machine1.example.com" port="5400" rts-count="2"/>
<data-server host="machine2.example.com" port="5400" rts-count="2"/>
<data-server host="machine3.example.com" port="5400" rts-count="2"/>
</data-server-cluster>
</topology>
Copy
You must specify the data-server-cluster
configuration in appian-topology.xml
. Copy the appian-topology.xml
file from <APPIAN_HOME>/conf/
into <APPIAN_HOME>/data-server/conf/
before starting the data service. Copies of the topology file in each location must always be in sync, irrespective of the configurations specified.
HostCopy link to clipboard
Set the host
attribute in the data-server
element to the name of the machine hosting the data service. If the host
is not specified, the following error is printed in watchdog.log
:
"ERROR com.appian.data.server.Watchdog - data-server host must be specified"
PortCopy link to clipboard
You can define the port
attribute on both the data-server-cluster
and the data-server
elements. If the port
attribute is defined on both, the data-server
element takes precedence. If not supplied, the default port number is 5400
.
Caution: In order for the data service to function properly, make sure that you open all of the required ports. See Port Usage for more information.
LicensingCopy link to clipboard
A valid license (k4.lic
) is required to run the data service. See Requesting and Installing a License for information on obtaining and installing a k4.lic
license.
SecurityCopy link to clipboard
Requests to the data service are secured with a security token that's unique to every customer environment:
- For Appian Cloud customers, this token is generated during the site deployment.
- For self-managed customers, this token is generated by the configure script.
If the token has not been set properly, the data service will not start, which will result in the application server not starting. See Data Service Connection Restrictions for more information.
Changing the topologyCopy link to clipboard
Topology changes can include:
- Add or remove data service nodes.
- Change the host name.
- Change the port.
- Change the real-time store count.
To make these changes, you'll need to make configuration changes to the data-server-cluster
parameter and restart both the data service and the application server by following these steps:
- Stop the data service on all of the servers:
<APPIAN_HOME>/data-server/bin/stop.sh
(.bat
on Windows)
- Delete the
<APPIAN_HOME>/ae/data-server/node/election
directory from all the servers. This directory contains runtime data that needs to be deleted when certain topology changes are made. - If adding nodes (for example, when migrating from a single node to High Availability), copy the
<APPIAN_HOME>/ae/data-server/data
directory to the new servers. - Make the required topology change on all of the servers.
- Start the data service on all of the servers in any order:
<APPIAN_HOME>/data-server/bin/start.sh
(.bat
on Windows)
- Restart the application server on all servers:
- To stop the server:
<APPIAN_HOME>/tomcat/apache-tomcat/bin/stop-appserver.sh
(.bat
on Windows) - To start the server:
<APPIAN_HOME>/tomcat/apache-tomcat/bin/start-appserver.sh
(.bat
on Windows)
- To stop the server:
A data-server
node element must be present and must be structured similarly to the following:
1
<data-server host="machine3.example.com" port="5400" rts-count="2"/>
Copy
If there is no data-server
node element specified, the following error is printed in watchdog.log
:
"ERROR com.appian.data.server.Watchdog - At least one data-server node must be specified"
Monitoring and TroubleshootingCopy link to clipboard
Starting and stoppingCopy link to clipboard
To start or stop the data service, refer to Starting and Stopping Appian.
Tip: When logging out of Windows, the data service process started by the user using the script will stop.
Consider installing the data service as a Windows service and using the Windows Service management console to start and stop the service. For instructions, see Installing the data service as a Windows Service.
LoggingCopy link to clipboard
Each component of the data service writes its logs to the <APPIAN_HOME>/logs/data-server/
directory:
- Historical Store:
hs-gateway.log
- Real-time Store:
rts-gateway-*.log
- Appender:
appender-gateway.log
- Bulk Ingestion:
binge-gateway.log
- Data Client:
client.log
- Watchdog:
watchdog.log
The log files contain important information about startup and shutdown, process execution, configuration, and errors. In the event of a system issue, these files should be shared with Appian Support. Note that for the real-time store components, the logs are enumerated as rts-engine-0.log
, rts-engine-1.log
, etc. for each real-time store component.
Note: The <APPIAN_HOME>/logs/data-server/
directory will always be free of any customer business data, and can be safely exported without any risk of exposing sensitive data.
The data service also logs other data, including performance metrics and traces. See Logging for a more comprehensive overview of Appian logs.
Recovery and monitoringCopy link to clipboard
Watchdog continuously monitors each component of the data service and restores functionality of each component in the event of an isolated failure.
To validate that the data service is running correctly, execute the <APPIAN_HOME>/data-server/bin/health.sh
script (health.bat
on Windows). The following information is displayed after executing the health script:
For the data service cluster:
- node_count: Number of nodes in the cluster.
- healthy:
true
if the data service is functioning normally, otherwisefalse
.
For each node in the data service cluster:
- hostname: Host name of the node.
- ip: IP address of the node.
- healthy:
true
if the data service is functioning normally on this node, otherwisefalse
.
Transaction LogCopy link to clipboard
The data service uses the Kafka Topic ads2_tx_effects
to persist and distribute transactions. The default retention time is one hour, but you can change the retention time by adding kfk.trunc.buffer.seconds
to {APPIAN_HOME/data-server/conf/custom.properties
and setting it to the appropriate amount of time. This property should be set in seconds.
File systemCopy link to clipboard
The <APPIAN_HOME>/ae/data-server/
directory stores the data service binaries, scripts, configuration details, and data.
The data files are located in the <APPIAN_HOME>/ae/data-server/data/
directory. The ss
folder contains all snapshot files. Since access to data service is latency-sensitive, it is recommended that the data is hosted locally on the machine, rather than a shared drive or an external drive, such as shared network-attached storage (NAS). This is true in High Availability (HA) topologies as well, since each data service node stores its own version of the data.
For disaster recovery purposes, the <APPIAN_HOME>/data-server/data/
directory and the Kafka logs should be backed up regularly.
See Internal Data for a comprehensive overview of where Appian persists data on the file system.
TroubleshootingCopy link to clipboard
If the data service cannot start and the watchdog.log
indicates an issue with the security token, see Data Service Connection Restrictions for troubleshooting.
If the data service stops running while the application server is running, any record types with data sync enabled will be temporarily inaccessible. See Troubleshooting Data Sync for more information.
Sizing guidanceCopy link to clipboard
After configuring the data service, the amount of disk space and memory consumed by the data service will vary based on data volumes and usage patterns.
Disk spaceCopy link to clipboard
After it is started for the first time in your environment, the data service is expected to take approximately 50 MB of disk space by default. If a site is not syncing record data, the data service will not occupy additional disk space.
Any additional disk space usage by the data service is expected to be proportional to the total amount of data synced into Appian. Data synced into the data service is compressed to optimize for storage efficiency. The exact disk usage will vary depending on the compression ratio and the current state of data sync activities and data service background processes running on the site.
Note: Background processing in the data service requires sufficient disk space to be available. When over 90% of disk space is used on a site, the data service will halt background processing until sufficient space is cleared or additional disk space is added.
MemoryCopy link to clipboard
In total, all data service components require a minimum of approximately 1GB of memory to run. Additional memory usage will vary based on the current activity on the site. For cloud sites and sites running in Appian on Kubernetes, memory is bounded by a configurable container memory limit.
During periods of high write load, memory for each real-time store and appender component will increase proportionally to the amount of data written to the data service. Background processing is triggered automatically based on write usage to ensure memory stays within the configured container limit.
Memory spikes are expected for real-time store components during query execution. The magnitude of the spikes will vary based on the complexity of query operations and the volume of data being queried. The maximum memory usage for individual real-time store components varies based on the amount of available memory for a site. See configuring the real-time store for more detailed information about memory and compute configurations.
Configuring the real-time storeCopy link to clipboard
The rts-count
attribute specifies the number of real-time stores in the data service. The real-time store component is responsible for processing queries to the data service. The rts-count
is set to 2 by default, but can be increased as needed to support higher query throughput. The maximum recommended rts-count
varies based on the amount of memory available as shown above.
In addition to configuring the number of real-time stores, you can configure limits on the amount of memory (rts.queryMemoryLimits.circuitBreaker.threshold.bytes
) and number of threads (rts.secondaryThreads.num
) that each real-time store uses when processing queries through the custom properties file. Increasing the amount of memory and number of threads will improve the performance of queries against synced record types, but may result in higher resource usage during query execution.
For self-managed customers, the rts.queryMemoryLimits.circuitBreaker.threshold.bytes
is unenforced and the rts.secondaryThreads.num
property is set to 1 by default. In addition to the guidelines based on available memory resources listed below, it is recommended that the rts.secondaryThreads.num
* rts-count
be set less than or equal to the number of available CPU cores.
Available memory | Real-time store count | thread count | memory limit |
---|---|---|---|
Less than 32GB | 2 | 1 | 1GB |
Between 32GB - 63GB | 4 | 1 | 2GB |
Between 64GB - 127GB | 4 | 4 | 4GB |
Between 128GB - 383GB | 8 | 4 | 8GB |
Greater than or equal to 384GB | 12 | 8 | 8GB |