When creating a document extraction process, you want to build around a specific type of document. Whether this is an invoice, an application, or a simple form, the type and format of your document will determine how much data can be successfully extracted from the process.
There are certain types of forms that work best with document extraction. To obtain the best results in your document extraction process, the following page provides guidance on the different types of forms to use.
Appian Document Extraction supports PDF documents up to 15 pages or 7 MB.
In the context of Appian document extraction, there are three types of PDFs that can be processed:
Depending on your security needs and the type of PDF you want to process, you will select your preferred vendor.
Data security is top of mind in many industries. Appian's document extraction features provide strict data privacy and protection, but if your business requires you to keep data within Appian, you have the option to choose which vendor processes the documents. Select
IDP secures your data and protects privacy no matter which vendor you select. See Data Security in IDP to learn more about how your data moves in the application.
In addition to your security needs, you'll also want to consider the type of content you want to extract, and how you want to extract it, before choosing a vendor.
Refer to the table below to see which vendor supports certain data extraction processes and tools based on the type of PDF being processed:
Note that Appian's native document extraction capabilities automatically handle fillable PDFs, regardless of the vendor you select. To use Document AI for all of your forms, including fillable forms, flatten your PDFs before beginning extraction. For example, you can add the Community supported PDF Tools plug-in in your process model to flatten PDFs before the extraction nodes.
Document extraction works best with forms that tend to contain similar information in each document. For example, almost all invoices will have an invoice number and total and almost all purchase orders will have a PO number and purchaser.
Appian Document Extraction can use field position to learn more about your data and improve extraction results. To help train the feature, you can use consistently structured documents that place the same fields in the same places. As you complete reconciliation tasks, the feature learns to recognize data based on its position.
Document extraction also works best with forms that have clearly labeled values to help extract data from the document.
For example, the following form contains numerous labels and values associated with those labels, such as the label INVOICE and the value 101 and the label DATE and the value MAR 20, 2020.
Appian document extraction allows you to extract information from tables in your documents. The data from each table is presented neatly in the reconciliation task for quick and simple verification.
Since Appian Document Extraction is meant to extract labels and values, extracting paragraphs of text is not a good use case for it. If you need to extract paragraphs of text, try using the Google Cloud Vision Connected System.
Likewise, if you need to find specific information in text, such as footnotes in a document, you will be better served by the Google Cloud Vision Connected System along with expressions to analyze the output.
On This Page