Configuring IDP [Intelligent Document Processing v1.2]

Configuring IDP

Google has deprecated legacy versions of AutoML services, which directly impacts IDP's core functionality.

Additionally, the IDP application was deprecated with Appian 23.2. Customers who wish to use the application will need to refactor plug-ins using AutoML.

Introduction

Before you start using the Intelligent Document Processing (IDP) application, you will need to set up the initial configuration. Additionally, after you have started using IDP, you may want to update this initial configuration.

IDP allows members of the manager security groups to easily configure and update the configuration settings using the Configure tab.

This configuration includes:

Select whether you are processing multiple document type, or only one document type
Selecting the types of documents you want to process.
Uploading example documents to train the AI classification machine learning model.
Setting preferences that determine:
- How often the classification model will be retrained.
- What the confidence thresholds are.
- Which users will be in the manager, reconciliation desk member, editor, and viewer security groups.
Starting the classification model training

After all of these steps are complete, users can start to classify, extract, and reconcile documents. See the User Guide for instructions on how to use the IDP site.

Before you begin

Evaluate whether documents are suitable

Because IDP classifies documents, it is important to use document types that can be easily categorized by a user. If a user can't classify a document by reading it, the AI classification model likely won't be able to either. For example, a human reader that is familiar with invoices and purchase orders would be able to easily classify them based on the information they contain.

Also, make sure you understand the documents that work best for document extraction by referring to the Appian Document Extraction page.

Reset the configuration

If you have previously configured your document channel with dummy documents or document types, we recommend having the application administrator reset the configuration.

To reset the configuration:

In the dudocchannel database table, for the document channel you are configuring, update the following values:
- modelid: NULL
- modeltrainedon: NULL
- numdocsfortraining: 0
- invalidtypeincludedinmodel: 0
In the dudocunderstanding database table, delete all of the rows where the channelid matches the channelid of your document channel.

Ensure the latest version of IDP is installed

Before you get started, we recommend making sure the latest version of the IDP application is installed.

The latest version of IDP is version 1.8.

Compare the latest version with the application version displayed in the ABOUT tab. If the installed version is behind the latest version, contact your application administrator about installing the latest version.

IDP application version number

Set up document types

By default, the application includes Invoice, Purchase Order, Claim, and Receipt document types. If you need to extract more or fewer fields than what is available out of the box for these document types, see Modifying Fields for Document Types. If you need to extract data from different document types than what is provided out of the box, see Adding a Document Type. If you need to add a new document channel, see Adding a Document Channel.

Step 1: Start the Configuration workflow

Whether you are configuring the application for the first time, or updating the configuration settings, you will access the configuration settings on the CONFIGURE tab.

Keep in mind that only system administrators and members of certain groups have access to the CONFIGURE tab. If you don't see the tab, contact the application administrators. See the Groups Reference Page for more information about what actions the different user groups can take in IDP.

configure_idp_configure_tab

Whether you are running the configuration for the first time, or editing an existing configuration, you will see the current configuration settings in a grid. To edit the configuration, click the edit icon in the right-most column of the grid.

configure_idp_configure_process

Step 2: Select channel workflow type

The next step in configuring IDP is to select whether you need to process multiple document types and need IDP to classify them, or if you only need to extract information from one document type.

If you only need to process one document type, you can skip training the classification model by selecting the extraction-only workflow. You will be able to start processing documents as soon as the configuration is complete, rather than waiting for the classification model to be trained.

To select the channel workflow type:

If you are processing multiple document types in the document channel, select Classification and Extraction.
If you are only processing one document type in the document channel, select Extraction.

Select the channel workflow type

Step 3: Select document types

Now you will select the document types that you want to process.

To select the document types:

Click the document type or types that you want to process. If you chose Extraction in the last step, you will only be able to select one document type.
- Note: Hover over the information symbol to view the fields that each document type extracts.
Click NEXT.

configure_idp_select_document_types

Step 4: (Classification workflows only) Upload example documents

If you chose Classification and Extraction in Step 2, you will need upload example documents to train the AI classification model to learn the characteristics that are common for each document type. Keep in mind that this process only trains the classification model. It does not affect the document extraction.

The documents you upload must be ZIP files containing PDFs.

When you are collecting documents to upload for training, choose a representative set of the actual documents that you will want to process. The better this sample set matches your actual documents, the better the classification model will perform.

Note: If you retrain the classification model, you will need a new set of documents for the training. If documents are the same or too similar to documents that were previously uploaded, the process model might encounter an error during the training and time out.

To upload your example documents:

For each document type, prepare a ZIP file that contains example PDFs of the document type. The model predictions will be more accurate if you follow these guidelines:
- Upload documents that are representative of the actual documents you will be using.
- Upload as many documents as possible. It is important to upload at least 10 documents to avoid an exception during the classification training. The more documents you include, the more accurate the model will be.
- Make sure the documents are sufficiently different from each other. Otherwise they may be treated as duplicates which could possibly cause an exception in the model training.
- Make sure the documents have not been uploaded for training that document type in the past.
Click UPLOAD, then choose the ZIP file for that document type.
Check Has Structured Data? if the example documents contain information in the same fields in the same place. Structured data helps train the feature and improve recognition.
Click NEXT.

configure_idp_upload_ex_docs-1.2

Step 5: Set preferences and start training the AI classification model

The final step is to set the preferences for your configuration. Note that if you chose Extraction in Step 2, some fields in the Set Preferences page will not display since they don't affect extraction-only workflows.

After you set these preferences, you can start the training for the AI classification model.

Tip: To determine an appropriate confidence threshold, it may be helpful to understand a couple of concepts.

A confidence interval is a range of plausible values for an unknown parameter.

The confidence interval has a confidence level that the true parameter is in the proposed range.

If a prediction has a 95% confidence level, then it has a 95% probability of containing the true parameter. If the confidence level meets the confidence threshold, then the document will be auto-classified.

To set the IDP preferences and start the training:

For Classification Confidence Threshold, enter the percentage of confidence that the AI classification model must meet when it makes a prediction. If it doesn't meet this threshold, a task will be created for a user to confirm the classification.
- Tip: A higher threshold percentage, such as 95%, means fewer documents will be auto-classified incorrectly, but it will also increase the number of documents that must be manually classified.
For Documents for Retraining, enter the number of documents that will need to be classified before triggering the retraining of the AI classification machine learning model. This enables the AI classification model to improve over time. A lower number allows the model to learn more quickly, but also increases the Google Cloud Platform costs. This could cause you to hit your Google account limits more quickly.
For Extraction Confidence Threshold, enter the percentage of confidence that the data must meet when it is extracted from a document. If it doesn't meet this threshold, the value will not be automatically populated. During the reconciliation task, a user will need to provide the value.
- Tip: As with the classification threshold, a higher confidence percentage will increase auto-extraction accuracy, but it will also increase the number of fields that need to be populated by the user.
Select users to add to the Managers, Reconciliation Desk Members, Editors, and Viewers groups. See the Groups Reference Page for more information on the access level that each group provides.
Click TRAIN MODEL.
- Note: For extraction-only workflows, this button will say Submit.

configure_idp_set_preferences

If you chose Extraction in Step 2, users can start uploading documents right away.

If you chose Classification and Extraction in Step 2, the classification training will begin. The training can take several hours. You will receive an email when the training is complete. Once the training is complete, users can start to classify, extract, and reconcile documents. See the User Guide for instructions on how to get started using the application.

If there is an issue with the classification model training, contact your application administrator who can follow the troubleshooting steps.

Troubleshooting the classification model training

If you receive an email notification about an error with the classification model training, refer to the table below to troubleshoot the cause of the error.

After you have determined the cause of the error:

Cancel the DU Configure Document Understanding process instance.
- Note: It is not possible to manage the configuration on the configure tab until the process instance is completed or canceled.
Have the user who initiated the configuration repeat the document channel configuration.

Issue	Resolution
There was an error when uploading the dataset CSV file.	Make sure the service account has the correct permissions. Make sure the `DU_AUTOML_CLOUD_BUCKET` constant reflects the bucket you're using.
There was an error when creating the training dataset.	Make sure the service account has the correct permissions. Make sure the region in the `Google Cloud AutoML` connected system is supported by Google Cloud.
There was an error when importing the training dataset.	Make sure the service account has the correct permissions. Make sure there are no duplicates in your training dataset. This includes the ZIP file uploaded for training and documents uploaded for processing. Make sure the bucket you're using matches the `DU_STORAGE_CLOUD_BUCKET` constant and is a persistent storage bucket.
There was an error when deploying the model.	Make sure the service account has the correct permissions. Make sure the region in the `DU_CLOUD_REGION` constant is correct. For storage buckets in the US region, it should be `us-central1`. For storage buckets in the EU region, it should be either `europe-north1`, `europe-west1`, `europe-west2`, `europe-west3`, or `europe-west4` depending on your storage bucket set up. Make sure the region is supported by Google Cloud.

Open in Github Built: Mon, Apr 22, 2024 (07:48:53 PM)

Configuring IDP