The Intelligent Document Processing (IDP) application is primarily a mechanism to extract information from documents in order to digitize that data.
This process requires users to:
- Periodically complete tasks in order to help classify documents
- Confirm or correct the extracted information using an easy point-and-click technique.
This document teaches users how to use IDP to upload, classify, and reconcile documents, as well as how to view information about processed documents, such as document status and the extracted information. It also provides an overview of the document processing metrics.
Not all users will have access to all actions and views. See the groups reference page for more information on what type of access each security group provides.
IDP transforms unstructured data from PDF documents into structured data. The application only accepts documents in PDF format, so before you get started, you may need to convert your documents. Appian Community offers multiple plug-ins to convert documents from other file formats to PDF:
We don't recommend converting Excel files to PDF for use in IDP. Instead, Appian can parse information from Excel files using rules. Use the Excel Tools plug-in to extract information from this file format.
If you need to upload documents manually, IDP features an easy-to-use upload form for document processing. However, you can also use it as a subprocess in a larger workflow. Furthermore, you can upload documents from external systems automatically, using a web API.
To upload documents manually:
- From the DOCUMENTS tab, click UPLOAD DOCUMENTS.
- If your site is configured to use document channels, select the Document Channel.
- (Optional) If all of the documents you are uploading are the same document type, select the Classification.
- Note: This field is only editable for Classification and Extraction document channels.
- To upload individual documents, select Upload PDFs, then select the files to upload. You can upload one or more documents at a time.
- Note: Each file must be a PDF that is fewer than 15 pages in length and less than 7 MB in size.
- To upload multiple files that are bundled into a ZIP file, select Upload ZIP, then select the files to upload.
- Note: The ZIP file can only contain PDF files and each file must be fewer than 15 pages in length and less than 7 MB in size.
- Click START PROCESSING.
Viewing uploaded document information
After you upload the documents, the classification and extraction process will start.
The status of each document displays on the DOCUMENTS tab, along with other information and metrics about each document.
On the top-right corner of the page, click refresh to view the updated status. You can also use the filters at the top of the page to find certain documents.
The possible statuses are:
- Auto-Classifying: If the document type is not selected when uploading the document, this status displays to indicate that the auto-classification is running to determine what type of document it is. Classification usually takes up to 3 minutes.
- Pending Classification: The status that displays for documents that did not meet the classification confidence threshold during auto-classification. These documents will have to be manually classified.
- Auto-Extracting: The status that indicates that the auto-extraction is running to extract the data from the document.
- Pending Reconciliation: The status that displays after documents are classified. These documents are ready for reconciliation.
- Completed: The status that indicates a document has been reconciled and the data for the document has been written to the database.
After documents are uploaded, tasks are assigned to users so that they can classify documents and confirm or correct the extracted information. See Tasks for more information on using tasks in Appian.
If there are any documents that are in the Pending Classification or Pending Reconciliation status, you can classify and reconcile them in the TASKS tab.
In this tab, you can search for tasks by Task Name and filter by Task Type (Classification or Reconciliation), Document Channel (if configured), or who the task is Assigned To.
Classification tasks are represented by icons:
- Classification tasks: signpost icon .
- Reconciliation task: arrow icon .
Completing the manual classification task
For documents that didn't meet the minimum confidence threshold during auto-classification, a task will automatically be created for a user to manually classify the document.
These documents will be in the Pending Classification status.
To complete a manual classification task:
- From the TASKS tab, click a classification task.
- At the top of the page, click Accept to accept the task.
- In the Classification dropdown list, the document type that was predicted by the classification machine learning model displays. If it is not correct, select another document type.
- Click CLASSIFY.
If the document is invalid, click INVALIDATE. For example, if an unsigned process order is uploaded instead of a signed one, you can classify it as invalid. Invalid documents won't go through data extraction and reconciliation.
Completing the reconciliation task
All documents need to be reviewed by a user for accuracy and to fill in any missing fields. This is called reconciliation. After a document is uploaded, the Appian Document Extraction runs, extracting data from the document. Extraction usually takes about 2 - 5 minutes. When it is finished, a reconciliation task is automatically generated.
While the data is extracting, these documents will be in the Auto-Extracting status. After extraction is complete, they will be in the Pending Reconciliation status.
To complete the reconciliation task:
- If you have just completed a classification task and the auto-extraction process is complete, the reconciliation task displays immediately. Skip to step 4.
- From the TASKS tab, click a reconciliation task.
- At the top of the page, click Accept to accept the task.
- For Classification and Extraction document channels, if the document was classified incorrectly, at the bottom of the page, click RECLASSIFY. Then follow the instructions in Completing the manual classification task.
- For Extraction-only document channels, if the document should be classified as an Invalid document, at the bottom of the page, click INVALIDATE.
- Reconcile the data using the reconciliation task. For more information on this task, see Appian Document Extraction.
The status for reconciled documents will change to Completed and the data extracted will be written to the database.
Viewing and editing the extracted data for a document
To view the information that was input for a document, go to the DOCUMENTS tab and click the document name.
The Summary tab lists Overview information about the document at the top of the page. It also displays the data that was extracted from the document, along with a document viewer.
To edit the information that was extracted, click the edit button. Then update the information in the fields.
Viewing IDP metrics
The METRICS tab is used for reporting and governance so that users can see how well the application is performing.
You can filter the information by Document Channel (if configured) and Document Type, as well as only show information for documents processed in the Past 3 Months or the Past 6 Months.
Key performance indicators
The first section on this page shows some key performance indicators for documents that have completed processing, including:
- DOCUMENTS PROCESSED: The number of documents that have completed processing.
- AUTO-CLASSIFIED DOCS: The percentage of documents that were confirmed to be auto-classified correctly.
- AVG CLASSIFICATION TIME: The average time between when the user accepts and then completes the classification task.
- AVG AUTO-EXTRACTION: The average percentage of fields that were correctly extracted at the time that the reconciliation task was assigned.
- AVG RECONCILIATION TIME: The average time between when the user accepts and then completes the reconciliation task.
Classification and extraction charts
The next section displays charts that show:
- Document Types: The current breakdown of the documents that are processed by document type.
- Auto-Classification Accuracy: The weekly average auto-classification confidence for documents that were confirmed to be auto-classified correctly.
- Note: Because the AI classification model is periodically retraining based on new data, the accuracy improves over time.
- Auto-Extraction Accuracy: The weekly average percentage of fields that were correctly extracted at the time the reconciliation task was assigned.
- Note: Because the mappings that are selected from the document reconciliation viewer are learned, the auto-extraction accuracy improves over time.
- Reconciliation Time: The weekly average time between when the user accepts and then completes the reconciliation task.
- Because the mappings that are input by the user are learned, the time spent reconciling the documents decreases over time.
At the bottom of the page is a grid that shows the metadata for processed documents.