Free cookie consent management tool by TermsFeed

Enhanced Data Pipeline for Appian Cloud

This page applies to Appian Cloud only. It may not reflect the differences with Appian Government Cloud.

Note:  Enhanced Data Pipeline is available to customers that are on Advanced or Enterprise Support with High Availability. Appian customers must purchase Advanced or Enterprise Support to use the functionality described below. The functionality described below is not included in the base Appian platform.

Overview

Appian Cloud customers enrolled in Advanced or Enterprise Support who use the Appian-provided cloud database as a business data source can connect to it through Enhanced Data Pipeline (EDP). EDP supports two modes of operation: Business Data Direct Query (formerly known simply as Enhanced Data Pipeline) and Business Data Replication. Customers can use either mode to transfer business data to downstream systems for business intelligence, data warehousing, and reporting.

Some benefits of Enhanced Data Pipeline include:

  • Increased flexibility to access your data: You can query the database for specific result sets or monitor and consume data changes to replicate the entire database or portions of it to a target.
  • Seamless integration with data analysis tools: You can integrate your Appian Cloud business data source to your enterprise reporting processes for unified enterprise analytics with tools like Tableau and PowerBI.
  • Simplified ETL processes: You can export, transform, and load your data to your existing organization data marts and data warehouses located on your corporate infrastructure.

Using Enhanced Data Pipeline

Once the secure connection to your Appian Cloud database has been established and credentials created (see Setup below), you can begin to configure and use compatible tools to retrieve data using one or both of the supported methods.

Business Data Direct Query

Business Data Direct Query, as the name implies, provides query access to the Appian cloud business data source through EDP. You can use the Direct Query mode for SQL-based business intelligence (BI) or ETL tools.

Direct Query normally requires writing SQL queries to extract tables or specific data sets to transfer to target systems. Once written, the queries must be executed periodically to extract an updated snapshot of your business data and refresh the target.

Enhanced Data Pipeline BDDQ

The Direct Query mode routes queries to a replica instance in your highly available Appian Cloud database cluster instead of to the primary node. This ensures better query performance and will reduce the load on the primary database node.

Appian Cloud database cluster in high availability configuration has three nodes: a primary node and two replica nodes. The primary node services the requests generated by your Appian applications and the replica nodes are waiting for a failover, providing redundancy. Under normal circumstances, the replica is always in sync with the primary node. However, under certain edge case scenarios, the replica nodes can lag behind the primary node.

Customers using Direct Query are advised to test and plan for additional database load and replica lag. Appian recommends testing your database queries thoroughly in lower environments first and assessing any performance impact before using your queries against your production instances. Queries from your tools could degrade the overall performance in your Cloud instance if the queries overload the Cloud business data source instance. Consequently, Appian recommends using Direct Query during periods of low usage. In most cases, connecting data analysis tools directly to production transactional databases is not considered a good practice as queries generated by heavily used reporting tools could have performance implications in production instances.

Business Data Replication

To eliminate bulk load updating and batch windows, use the Business Data Replication mode. EDP Business Data Replication can be used for regular and incremental movement of data into a data warehouse or data lake, reducing the overhead associated with Direct Query. It provides a mechanism for streaming business data changes to an external environment.

Enhanced Data Pipeline BDR

Business Data Replication exposes the binary logs of the Appian Cloud database. Binary logs are used to synchonize the three database nodes in high availability configuration. Using Business Data Replication, the logs from a replica node are made available to external systems, exposing both data changes (DML statements) and structural changes (DDL statements).

Using Business Data Replication requires tooling that can read and act on the database binary logs. This can be an open source or commercial change data capture (CDC) tool like Debezium or Qlik, or a separate instance of MariaDB in your own environment. Tool selection, configuration, and management is the customer's responsibility. Appian does not provide support for specific tools.

To enable Business Data Replication, Appian Support will create one or multiple database users with replication access.

Caution:  These user accounts have access to all the schemas in the Cloud database. Access to specific schemas or tables must be controlled in your replicated instance or tool in your environment.

Appian recommends replicating only the schema(s) and/or table(s) you need. In addition, you should avoid operations like mass delete and mass update. Such statements can cause replica lag and adversely affect cluster health.

Setup

Setting up Enhanced Data Pipeline requires assistance from Appian Support. Review the prerequisites to ensure compatibility and then follow the steps below.

Prerequisites checklist

Prerequisite Description Organizational Role
Advanced or Enterprise Support (including High Availability Order Form) This offering is available via Advanced or Enterprise Support (including High Availability) Business relationship owner
Ensure database compatibility Your database client tools must be compatible with MariaDB. You may need to install additional connectors or drivers to be able to query the business data source in your Appian Cloud instance. Consult your tool's documentation for details. Server administrator
Set up IPSec VPN Tunnel or AWS PrivateLink connection Configure VPN tunnel(s) from your corporate network to your Appian Cloud instance or an AWS PrivateLink connection. Network Administrator / Authorized support contact
Set up name resolution Your database tools and any other systems will connect to your business data source using the FQDN <your-instance>.db.appiancloud.com. This typically requires creating a record in your DNS infrastructure pointing to the private IP address(es) of your Appian Cloud instance(s). DNS/Server administrator

Steps

  1. Open a Support Case to request setting up access to the Appian Cloud business data source.
    • Provide the instance name(s) you wish to configure with this functionality.
    • Provide the name of the schema(s) that the database user must have access to.

      Note:  Creating additional schemas in phpMyAdmin is only available for Appian Cloud databases using MariaDB. If you don't have additional schemas, just provide the default schema name: "Appian" or "AppianAnywhere".

    • Specify the mode(s) of operation to enable: Direct Query, Data Replication, or both.
  2. Appian Support will configure your instance(s) to enable external access over the VPN tunnel.
  3. Appian Support will create database user(s) in your business data source with the appropriate permissions:
    • read-only permissions for Direct Query,
    • binary log replication permissions for Business Data Replication.
  4. Appian Support will schedule a maintenance window and deploy the necessary configurations.
  5. Prior to the agreed maintenance window, Appian Support will communicate the connection parameters to your business data source including port (3306), database name, username(s) and temporary password(s).
  6. External access to the business data source of your instance will be enabled after the maintenance window.
  7. Have a database administrator generate a new password for the provided database user(s) in phpMyAdmin by using the AppianProcess.changeEDPUserPassword stored procedure. You can also update the user password programmatically using the Update Enhanced Data Pipeline user credential endpoint.
  8. Configure your database client tools with these credentials, along with required security connection parameters (see note below).

Enable server identity verification

Database traffic between your tools and the business data source will be forwarded over an IPSec VPN tunnel or an AWS PrivateLink, established to your Appian Cloud instance. As an additional security mechanism, connections to your business data source will be encrypted using SSL/TLS to a custom hostname <your-instance>.db.appiancloud.com.

Optionally, your SSL/TLS connection can perform server identity verification by validating the server certificate installed on your database. The server certificate for the Appian Cloud database is signed by the Appian Cloud Private Certificate Authority (CA).

To enable server identity verification, you must install the Appian Cloud database CA certificate in the certificate trust store of your tools and systems.

To download the CA certificate:

  1. Navigate to the Support page of MyAppian.
  2. Click on the Downloads tab.
  3. Choose Appian Cloud option.
  4. Select EDP CA Certificates.
  5. Depending on your security compliance requirements, download the following:
    • If you are a FedRAMP/GovCloud customer, download the FEDRAMP Customer CA certificate.
    • If you are a customer with elevated data security compliance requirements, such as PCI and Canada Protected B, download the Elevated Compliance Customer CA certificate.
    • Otherwise, download the Customer CA certificate.

Installing third-party certificates varies on each tool and platform. See your tool's documentation on how to connect to an external database using SSL/TLS and how to import a third-party certificate.

Note:  Only TLS 1.2 or 1.3 are supported for Enhanced Data Pipeline. You should update your tool configuration to remove any older versions of TLS.

If you encounter connection issues after TLS configuration updates, add the properties enabledSslProtocolSuites=TLSv1.2 and useSsl=true to your database connection string.

Usage considerations

Limitations

A maximum statement timeout of 12 hours is applied to all queries executed through the EDP. If a query exceeds this time limit, it will be automatically aborted. This default value can be updated by opening a Support Case.

Feedback