Intelligent Document Processing

Introduction

Up to this point, companies that needed to extract data from documents and forms had two options: slow, labor-intensive manual entry or outdated, hard-to-customize optical character recognition software.

But if you have Appian, you have another option: intelligent document processing (IDP) capabilities. With IDP, you can automate some of your most common tasks when it comes to documents your business receives. IDP is no longer a single application for document classification and extraction. Instead, IDP is a suite of capabilities within Appian that allow you to automate the most labor-intensive parts of document management.

What's possible? We're glad you asked. With Appian, you can:

If you want to...	Use	Additional details
Apply optical character recognition (OCR)	Document extraction AI skill Document classification AI skill
Recognize handwriting	Document extraction AI skill Document classification AI skill
Recognize text in multiple languages	Document extraction AI skill Document classification AI skill
Extract text in multiple languages	Document extraction AI skill	See supported languages.
Integrate with bring-your-own OCR services	Integration object	Set up integration with extraction OCR engine.
Extract data from documents	Document extraction AI skill
Extract data from tables	Document extraction AI skill
Extract data from tables across multiple pages	Document extraction AI skill PDF Tools plug-in	See below.
Extract data from tables with multi-line rows	Document extraction AI skill
Extract data from tables without grid lines	Document extraction AI skill
Extract data from tables with merged cells	Document extraction AI skill
Merge documents	Document classification AI skill PDF Tools plug-in	See below.
Split documents	Document classification AI skill PDF Tools plug-in	See below.
Convert documents	Image files: PDF Tools plug-in HTML files: HTML to PDF plug-in MS Word files: Dynamic Document Generator plug-in	Not recommended for use with Excel. See Using Excel with Appian.
Deskew documents	Document extraction AI skill	Deskewing occurs during OCR. The image will not be deskewed to the end user, but the text will be identified accurately.
Capture documents	Image files: PDF Tools plug-in. HTML files: HTML to PDF plug-in. MS Word files: Dynamic Document Generator plug-in. Uploading on an interface with a!fileUploadField(). Using a Document Generation smart service. Receiving a binary or Base64 document through an integration. From a robotic task using document actions.	Not recommended for use with Excel. See Using Excel with Appian.
Implement capture rules, such as thresholds to accept a document (including quality)	Document extraction AI skill Expression rules	See below.
Validate documents	Document extraction AI skill Expression rules	See below.
Adjust images	Document extraction AI skill	Occurs during OCR. The image isn't updated for end users, but the text is accurately identified.
Classify documents	Document classification AI skill
Manage metadata	Edit Document Properties smart service Records	Can be used in conjunction with AI skills.
Store documents	Document folders
Apply document retention rules	Delete Document	Customers can control how long docs are stored in the Appian platform.
Apply rules for archiving	Records Folder properties
Apply legal holds	Records Folder properties	Configure security on any records or folders containing legal data.
Search documents using metadata	Integration object

Implementation patterns

Extract data from tables across multiple pages

To extract data from tables that span multiple pages in a document:

Use the PDF Tools plug-in to split the document into individual pages.
Use the document extraction AI skill to extract table data from each page.
Combine the results using post-processing logic.

Merge documents

When documents are received in packets containing multiple document types:

Split the file into individual pages.
Classify each page into its appropriate document type using the document classification AI skill.
Combine like pages into new files.
Send the newly composed files to the document extraction AI skill created for the corresponding document type.

Split documents

When documents are received in packets containing multiple document types:

Split the file into individual pages.
Classify each page into its appropriate document type using the document classification AI skill.
Send the split files to the document extraction AI skill created for the corresponding document type.

Implement capture rules

Add the Extract from Document smart service to a process.
Configure the Confidence Threshold input and Confidence Score output according to your requirements.
Configure additional verification using expression rules or user input tasks to confirm documents near the confidence threshold meet your requirements.

Validate documents

Extract data from the documents using the document extraction AI skill and Extract from Document smart service.
To verify the completeness or accuracy of required fields, use expression rules or user input tasks.

Feedback

Was this page helpful?