Google has deprecated legacy versions of AutoML services, which directly impacts IDP's core functionality. Additionally, the IDP application was deprecated with Appian 23.2. Customers who wish to use the application will need to refactor plug-ins using AutoML. |
Intelligent Document Processing
Dive deeper into IDP and learn how to add new document channels and document types. Through this course, you'll also learn how to integrate IDP process models into larger workflows.
Before you start using the Intelligent Document Processing (IDP) application, you will need to set up the initial configuration. Additionally, after you have started using IDP, you may want to update this initial configuration.
IDP allows members of the manager security groups to easily configure and update the configuration settings using the Configure tab.
This configuration includes:
After all of these steps are complete, users can start to classify, extract, and reconcile documents. See the User Guide for instructions on how to use the IDP site.
Because IDP classifies documents, it is important to use document types that can be easily categorized by a user. If a user can't classify a document by reading it, the AI classification model likely won't be able to either. For example, a human reader that is familiar with invoices and purchase orders would be able to easily classify them based on the information they contain.
Also, make sure you understand the documents that work best for document extraction by referring to Appian Document Extraction.
If you have previously configured your document channel with dummy documents or document types, we recommend having the application administrator reset the configuration.
To reset the configuration:
dudocchannel
database table, for the document channel you are configuring, update the following values:
modelid
: NULL
modeltrainedon
: NULL
numdocsfortraining
: 0
invalidtypeincludedinmodel
: 0
dudocunderstanding
database table, delete all of the rows where the channelid
matches the channelid
of your document channel.Before you get started, we recommend making sure the latest version of the IDP application is installed.
The latest version of IDP is version 1.8.
Compare the latest version with the application version displayed in the ABOUT tab. If the installed version is behind the latest version, contact your application administrator about installing the latest version.
By default, the application includes Invoice, Purchase Order, Claim, and Receipt document types. If you need to extract more or fewer fields than what is available out of the box for these document types, see Modifying Fields for Document Types. If you need to extract data from different document types than what is provided out of the box, see Adding a Document Type. If you need to add a new document channel, see Adding a Document Channel.
Whether you are configuring the application for the first time, or updating the configuration settings, you will access the configuration settings on the CONFIGURE tab.
Keep in mind that only system administrators and members of certain groups have access to the CONFIGURE tab. If you don't see the tab, contact the application administrators. See the Groups Reference Page for more information about what actions the different user groups can take in IDP.
Whether you are running the configuration for the first time, or editing an existing configuration, you will see the current configuration settings in a grid. To edit the configuration, click the edit icon in the right-most column of the grid.
The next step in configuring IDP is to select whether you need to process multiple document types and need IDP to classify them, or if you only need to extract information from one document type.
If you only need to process one document type, you can skip training the classification model by selecting the extraction-only workflow. You will be able to start processing documents as soon as the configuration is complete, rather than waiting for the classification model to be trained.
To select the automation services:
Note: Classification and Extraction is disabled if you don't use Google's AutoML service. If you don't use Google's AutoML service, all IDP document channels will use Extraction as the automation service.
Now you will select the document types that you want to process.
By default, the application includes Invoice, Purchase Order, Claim, and Receipt document types. If you need to extract more or fewer fields than what is available out of the box for these document types, see Modifying Fields for Document Types. If you need to extract data from different document types than what is provided out of the box, see Adding a Document Type.
To select the document types:
If you chose Classification and Extraction in Step 2, you will need upload example documents to train the AI classification model to learn the characteristics that are common for each document type. Keep in mind that this process only trains the classification model. It does not affect the document extraction.
The documents you upload must be ZIP files containing PDFs.
When you are collecting documents to upload for training, choose a representative set of the actual documents that you will want to process. The better this sample set matches your actual documents, the better the classification model will perform.
Note: If you retrain the classification model, you will need a new set of documents for the training. If documents are the same or too similar to documents that were previously uploaded, the process model might encounter an error during the training and time out.
To upload your example documents:
Tip: If you upgrade to IDP 1.5 from an older version, document types without an extraction vendor set will be presumed to be Google
and the processor id field will be available.
This step applies only if you're not using Google's AutoML service with IDP. If you choose Extraction in Step 2, you won't need to upload example documents to train the AI classification model. However, you'll need to configure a few settings:
The final step is to set the preferences for your configuration. Note that if you chose Extraction in Step 2, some fields in the Set Preferences page will not display since they don't affect extraction-only workflows.
After you set these preferences, you can start the training for the AI classification model.
Tip: To determine an appropriate confidence threshold, it may be helpful to understand a couple of concepts.
A confidence interval is a range of plausible values for an unknown parameter.
The confidence interval has a confidence level that the true parameter is in the proposed range.
If a prediction has a 95% confidence level, then it has a 95% probability of containing the true parameter. If the confidence level meets the confidence threshold, then the document will be auto-classified.
You can configure preferences based on the services you chose for the channel:
Classification training begins and can take several hours. You will receive an email when the training is complete. Once the training is complete, users can start to classify, extract, and reconcile documents. See the User Guide for instructions on how to get started using the application. If there is an issue with the classification model training, contact your application administrator who can follow the troubleshooting steps.
Users can start uploading documents right away.
If you plan to process a high volume of documents and want to save time, you can configure IDP to automatically validate extractions and bypass the reconciliation step for certain fields. Users can still review document extractions if the information wasn't successfully extracted.
Tip: You'll need to complete the configuration workflow for a channel before configuring automated validations for certain document fields.
To configure automated validations:
On the Manage Field Validations page, you can enable IDP to automatically validate certain fields in your documents. This speeds up document processing time by eliminating the need for reconciliation. If you're confident in IDP's ability to detect and extract information in certain fields, you can automate this validation.
To add automated validations:
To edit automated validations:
If you want to remove validations for all active document types in the channel, click Set all to Manual Review.
The Documents tab shows documents that skip the reconciliation task when automated validations are properly set up and configured. Straight Through Processed appears in the Reconciled By column for these documents.
You can configure custom validations in IDP using expression rules. Custom validations can be useful if you want documents with certain data to be automatically validated. For example, you can use custom validation types to specify that invoices with balances over a certain amount shouldn't be validated automatically. If invoices are below this amount, IDP can still automatically validate the extracted information.
You'll create custom validation types using expression rules in Appian Designer. You'll then need to update a constant to include these expression rules so that the options appear as custom validation types on the Manage Field Validations page.
To create a custom validation type:
data
and Type Any Type
.ri!data
. The evaluated expression must return a single true
or false
result. Documents that evaluate to true
will be processed automatically when using this custom validation type. Documents that evaluate to false
will require manual review.Here are some example expression validations:
Example 1
1
2
3
4
5
6
7
8
9
10
11
/*Example custom validation on invoice document type*/
/*This expression returns true as long as both fields have values,*/
/*invoice id is shorter than 8 characters,*/
/*and the invoice date is later than January 10th, 2021*/
and(
not(rule!DU_checkIsNullorEmpty(index(ri!data,"invoiceId",null))),
not(rule!DU_checkIsNullorEmpty(index(ri!data,"invoiceDate",null))),
length(index(ri!data,"invoiceId",{}))<8,
index(ri!data,"invoiceDate",null)> date(2021,01,10)
)
Example 2
1
2
3
4
5
6
/*Example custom validation that checks that 3 required fields are all populated*/
and(
not(rule!DU_checkIsNullorEmpty(index(ri!data,"field1",null))),
not(rule!DU_checkIsNullorEmpty(index(ri!data,"field2",null))),
not(rule!DU_checkIsNullorEmpty(index(ri!data,"field3",null)))
)
For the validation type to appear in the configuration menu:
DU_CUSTOM_VALIDATION_TYPES
constant.DU_returnCustomValidationResultForCustomValidationType
constant.In the expression designer, update the choose()
function to include the expression rule you created above. Note the order of the expressions included in the choose()
function since it needs to match the order of the labels in the DU_CUSTOM_VALIDATION_TYPES
constant.
To test that your custom validation type appears properly:
If you have the Manage Field Validations page open in another tab or browser while you edit the expression or constant, you'll need to refresh the page for your changes to appear.
You can deactivate document channels you no longer need. Deactivating a channel can help you control costs related to models using Google's AutoML service, which remain active even if you're not currently using them. When you deactivate a channel, you won't be able to change its configuration settings until you reactivate it.
To deactivate a document channel:
When a document channel is deactivated, that channel's document types no longer appear in the metrics report.
You can reactivate a document channel if you need to use it at a later time. When you reactivate a previously deactivated channel, you'll have the option to configure it.
To reactivate a document channel:
You'll be asked to choose a channel workflow type if you choose to configure the reactivated channel. If you choose the same channel workflow that the channel used prior to deactivation, the Google AutoML model is not impacted. However, if you make a different selection, the Google AutoML model is impacted in these ways:
Previous selection | New selection | Model impact |
---|---|---|
Extraction and Classification | Extraction only | Existing AutoML model deleted, with a warning to let you know. |
Extraction only | Classification and Extraction | A new Google AutoML model is created for the channel. |
If you receive an email notification about an error with the classification model training, refer to the table below to troubleshoot the cause of the error.
After you have determined the cause of the error:
DU Configure Document Understanding
process instance.
Issue | Resolution |
---|---|
There was an error when uploading the dataset CSV file. |
|
There was an error when creating the training dataset. |
|
There was an error when importing the training dataset. |
|
There was an error when deploying the model. |
|
Configuring IDP