This page applies to Appian Cloud only. It may not reflect the differences with Appian Government Cloud. |
Note: Enhanced Data Pipeline is available to customers that are on Advanced or Enterprise Support with High Availability. Appian customers must purchase Advanced or Enterprise Support to use the functionality described below. The functionality described below is not included in the base Appian platform.
Appian Cloud customers enrolled in Advanced or Enterprise Support who use the Appian-provided cloud database as a business data source can connect to it through Enhanced Data Pipeline (EDP). EDP supports two modes of operation: Business Data Direct Query (formerly known simply as Enhanced Data Pipeline) and Business Data Replication. Customers can use either mode to transfer business data to downstream systems for business intelligence, data warehousing, and reporting.
Some benefits of Enhanced Data Pipeline include:
Once the secure connection to your Appian Cloud database has been established and credentials created (see Setup below), you can begin to configure and use compatible tools to retrieve data using one or both of the supported methods.
Business Data Direct Query, as the name implies, provides query access to the Appian cloud business data source through EDP. You can use the Direct Query mode for SQL-based business intelligence (BI) or ETL tools.
Direct Query normally requires writing SQL queries to extract tables or specific data sets to transfer to target systems. Once written, the queries must be executed periodically to extract an updated snapshot of your business data and refresh the target.
The Direct Query mode routes queries to a replica instance in your highly available Appian Cloud database cluster instead of to the primary node. This ensures better query performance and will reduce the load on the primary database node.
Appian Cloud database cluster in high availability configuration has three nodes: a primary node and two replica nodes. The primary node services the requests generated by your Appian applications and the replica nodes are waiting for a failover, providing redundancy. Under normal circumstances, the replica is always in sync with the primary node. However, under certain edge case scenarios, the replica nodes can lag behind the primary node.
Customers using Direct Query are advised to test and plan for additional database load and replica lag. Appian recommends testing your database queries thoroughly in lower environments first and assessing any performance impact before using your queries against your production instances. Queries from your tools could degrade the overall performance in your Cloud instance if the queries overload the Cloud business data source instance. Consequently, Appian recommends using Direct Query during periods of low usage. In most cases, connecting data analysis tools directly to production transactional databases is not considered a good practice as queries generated by heavily used reporting tools could have performance implications in production instances.
To eliminate bulk load updating and batch windows, use the Business Data Replication mode. EDP Business Data Replication can be used for regular and incremental movement of data into a data warehouse or data lake, reducing the overhead associated with Direct Query. It provides a mechanism for streaming business data changes to an external environment.
Business Data Replication exposes the binary logs of the Appian Cloud database. Binary logs are used to synchonize the three database nodes in high availability configuration. Using Business Data Replication, the logs from a replica node are made available to external systems, exposing both data changes (DML statements) and structural changes (DDL statements).
Using Business Data Replication requires tooling that can read and act on the database binary logs. This can be an open source or commercial change data capture (CDC) tool like Debezium or Qlik, or a separate instance of MariaDB in your own environment. Tool selection, configuration, and management is the customer's responsibility. Appian does not provide support for specific tools.
To enable Business Data Replication, Appian Support will create one or multiple database users with replication access.
Caution: These user accounts have access to all the schemas in the Cloud database. Access to specific schemas or tables must be controlled in your replicated instance or tool in your environment.
Appian recommends replicating only the schema(s) and/or table(s) you need. In addition, you should avoid operations like mass delete and mass update. Such statements can cause replica lag and adversely affect cluster health.
Setting up Enhanced Data Pipeline requires assistance from Appian Support. Review the prerequisites to ensure compatibility and then follow the steps below.
Prerequisite | Description | Organizational Role |
---|---|---|
Advanced or Enterprise Support (including High Availability Order Form) | This offering is available via Advanced or Enterprise Support (including High Availability) | Business relationship owner |
Ensure database compatibility | Your database client tools must be compatible with MariaDB. You may need to install additional connectors or drivers to be able to query the business data source in your Appian Cloud instance. Consult your tool's documentation for details. | Server administrator |
Set up IPSec VPN Tunnel or AWS PrivateLink connection | Configure VPN tunnel(s) from your corporate network to your Appian Cloud instance or an AWS PrivateLink connection. | Network Administrator / Authorized support contact |
Set up name resolution | Your database tools and any other systems will connect to your business data source using the FQDN <your-instance>.db.appiancloud.com . This typically requires creating a record in your DNS infrastructure pointing to the private IP address(es) of your Appian Cloud instance(s). |
DNS/Server administrator |
Note: Creating additional schemas in phpMyAdmin is only available for Appian Cloud databases using MariaDB. If you don't have additional schemas, just provide the default schema name: "Appian" or "AppianAnywhere".
Database traffic between your tools and the business data source will be forwarded over an IPSec VPN tunnel or an AWS PrivateLink, established to your Appian Cloud instance. As an additional security mechanism, connections to your business data source will be encrypted using SSL/TLS to a custom hostname <your-instance>.db.appiancloud.com
.
Optionally, your SSL/TLS connection can perform server identity verification by validating the server certificate installed on your database. The server certificate for the Appian Cloud database is signed by the Appian Cloud Private Certificate Authority (CA).
To enable server identity verification, you must install the Appian Cloud database CA certificate in the certificate trust store of your tools and systems.
To download the CA certificate:
Installing third-party certificates varies on each tool and platform. See your tool's documentation on how to connect to an external database using SSL/TLS and how to import a third-party certificate.
Note: Only TLS 1.2 or 1.3 are supported for Enhanced Data Pipeline. You should update your tool configuration to remove any older versions of TLS.
If you encounter connection issues after TLS configuration updates, add the properties enabledSslProtocolSuites=TLSv1.2
and useSsl=true
to your database connection string.
A maximum statement timeout of 12 hours is applied to all queries executed through the EDP. If a query exceeds this time limit, it will be automatically aborted. This default value can be updated by opening a Support Case.
Enhanced Data Pipeline for Appian Cloud