Document Extraction Suite

Appian's suite of document extraction features provide you with intelligent, low-code smart services and functions to easily create your own document extraction process. By leveraging the power of AI, you no longer have to rely on manual data extraction or using expensive, high-maintenance optical character recognition (OCR) software.

We think you are going to get so much benefit out of Appian's document extraction suite that we're giving you access to a pre-built application called Intelligent Document Processing (IDP). IDP takes advantage of all of the document extraction features, plus adds the capability to automatically classify documents, monitor the performance, and securely process documents across multiple teams. Check out the Intelligent Document Processing Documentation for more information.

How does it work?

Document extraction identifies data relationships within a PDF document as key-value pairs. For example, an invoice document contains several form fields, so the process will identify the field names and values that are paired together e.g. Invoice Number and INV-12.


Using either Google Cloud Document AI or Appian's native document extraction functionality, each key will be mapped to a data type field. This mapping gets smarter over time as you reconcile and correct the extracted data. To reconcile extracted data, Appian auto-generates a form for human-in-the-loop validation of automated extraction results.

After this manual reconciliation, Appian will store and recall the mapping of the extracted key to an Appian field. For example, if you provide mappings, then eventually Appian Document Extraction will recognize that Invoice Number, Invoice #, and Invoice No. all map to the invoiceNumber Appian data type field.


How does my data move in the process?

Appian's document extraction process can be powered by either Google Cloud Document AI or using Appian's native document extraction functionality. Depending on your document extraction needs, the vendor you choose will vary, as well as the way your data flows.

Google Cloud Document AI

Google Cloud Document AI is part of the Appian AI offering. Through the offering, you have the power to use this Google Cloud AI service out-of-the-box while Appian ensures your data privacy and protection with both Appian and Google Cloud. As an Appian AI customer, Appian will configure a Google Cloud Platform project to segregate data flows, security, and storage.

When you use Google Cloud Document AI, your document is sent to Google Cloud Storage within your configured Google Cloud Platform project so that Document AI can be performed on it.

The document is then analyzed using the Google Cloud Document AI API. This analysis data is stored in a JSON document in a Google Cloud storage bucket and sent back to Appian.

data extraction diagram

Learn more about your data's security using Google Cloud Document AI.

Appian's native capabilities

Although the document extraction features using Google Cloud Document AI include strict data privacy and protection, we understand that your business may require you to keep your data within Appian.

To account for these situations, Appian allows you to use native document extraction features to extract text from PDFs with an existing digital data layer. Because this content can be accessed programmatically, Appian can extract it without sending it to Google's services if required.

Learn which type of documents can be processed by Google or Appian.

How do I get started?

Document extraction is a powerful tool to use in your business, but before you put in the work to create your own process, think about what you want to do. For example:

  • What kind of documents will you process?
  • Who is responsible for reconciling and correcting results?
  • Where do you want to display the data after?

If you want the ability to customize these aspects of the document extraction process, like how the data moves post-extraction or who corrects results, you may want to create your own document extraction process. Get started by selecting a document on which to base your process.

If your goal is to extract data and collect insights quickly with minimal to no set up, you may want to use the pre-built Intelligent Document Processing (IDP) application. IDP uses a standardized document extraction process in conjunction with automatic document classification. All you have to do is upload your document.

To take advantage of Appian's full-stack automation features, consider pairing your document extraction process with other Appian AI features and robotic process automation (RPA).

Open in Github Built: Fri, Nov 04, 2022 (07:10:52 PM)

On This Page