IntroductionCopy link to clipboard
Up to this point, companies that needed to extract data from documents and forms had two options: slow, labor-intensive manual entry or outdated, hard-to-customize optical character recognition software.
But if you have Appian, you have another option: intelligent document processing (IDP) capabilities. With IDP, you can automate some of your most common tasks when it comes to documents your business receives. IDP is no longer a single application for document classification and extraction. Instead, IDP is a suite of capabilities within Appian that allow you to automate the most labor-intensive parts of document management.
What's possible? We're glad you asked. With Appian, you can:
If you want to... | Use | Additional details |
---|---|---|
Apply optical character recognition (OCR) | Document extraction AI skill Document classification AI skill |
|
Recognize handwriting | Document extraction AI skill Document classification AI skill |
|
Recognize text in multiple languages |
Document extraction AI skill Document classification AI skill |
|
Extract text in multiple languages | Document extraction AI skill | See supported languages. |
Integrate with bring-your-own OCR services | Integration object | Set up integration with extraction OCR engine. |
Extract data from documents | Document extraction AI skill | |
Extract data from tables | Document extraction AI skill | |
Extract data from tables across multiple pages | Document extraction AI skill PDF Tools plug-in |
See below. |
Extract data from tables with multi-line rows | Document extraction AI skill | |
Extract data from tables without grid lines | Document extraction AI skill | |
Extract data from tables with merged cells | Document extraction AI skill | |
Merge documents |
Document classification AI skill PDF Tools plug-in |
See below. |
Split documents | Document classification AI skill PDF Tools plug-in |
See below. |
Convert documents |
Image files: PDF Tools plug-in HTML files: HTML to PDF plug-in MS Word files: Dynamic Document Generator plug-in |
Not recommended for use with Excel. See Using Excel with Appian. |
Deskew documents | Document extraction AI skill | Deskewing occurs during OCR. The image will not be deskewed to the end user, but the text will be identified accurately. |
Capture documents |
Image files: PDF Tools plug-in. HTML files: HTML to PDF plug-in. MS Word files: Dynamic Document Generator plug-in. Uploading on an interface with a!fileUploadField(). Using a Document Generation smart service. Receiving a binary or Base64 document through an integration. From a robotic task using document actions. |
Not recommended for use with Excel. See Using Excel with Appian. |
Implement capture rules, such as thresholds to accept a document (including quality) | Document extraction AI skill Expression rules |
See below. |
Validate documents | Document extraction AI skill Expression rules |
See below. |
Adjust images | Document extraction AI skill | Occurs during OCR. The image isn't updated for end users, but the text is accurately identified. |
Classify documents | Document classification AI skill | |
Manage metadata | Edit Document Properties smart service
Records |
Can be used in conjunction with AI skills. |
Store documents | Document folders | |
Apply document retention rules | Delete Document | Customers can control how long docs are stored in the Appian platform. |
Apply rules for archiving | Records Folder properties |
|
Apply legal holds | Records Folder properties |
Configure security on any records or folders containing legal data. |
Search documents using metadata | Integration object |
Implementation patternsCopy link to clipboard
Extract data from tables across multiple pagesCopy link to clipboard
To extract data from tables that span multiple pages in a document:
- Use the PDF Tools plug-in to split the document into individual pages.
- Use the document extraction AI skill to extract table data from each page.
- Combine the results using post-processing logic.
Merge documentsCopy link to clipboard
When documents are received in packets containing multiple document types:
- Split the file into individual pages.
- Classify each page into its appropriate document type using the document classification AI skill.
- Combine like pages into new files.
- Send the newly composed files to the document extraction AI skill created for the corresponding document type.
Split documentsCopy link to clipboard
When documents are received in packets containing multiple document types:
- Split the file into individual pages.
- Classify each page into its appropriate document type using the document classification AI skill.
- Send the split files to the document extraction AI skill created for the corresponding document type.
Implement capture rulesCopy link to clipboard
- Add the Extract from Document smart service to a process.
- Configure the
Confidence Threshold
input andConfidence Score
output according to your requirements. - Configure additional verification using expression rules or user input tasks to confirm documents near the confidence threshold meet your requirements.
Validate documentsCopy link to clipboard
- Extract data from the documents using the document extraction AI skill and Extract from Document smart service.
- To verify the completeness or accuracy of required fields, use expression rules or user input tasks.