Configuring IDP

Introduction

Before you start using the Intelligent Document Processing (IDP) application, you will need to set up the initial configuration. Additionally, after you have started using IDP, you may want to update this initial configuration.

IDP allows members of the manager security groups to easily configure and update the configuration settings using the Configure tab.

This configuration includes:

After all of these steps are complete, users can start to classify, extract, and reconcile documents. See the User Guide for instructions on how to use the IDP site.

Skipping document classification

If you don't need classification because you will only be processing one type of document, in Step 2: Select document types, only select one document type.

For Step 3: Upload example documents, upload example documents for just that one document type. You do not need to upload example documents for the Invalid document type.

When uploading documents for processing, you can pre-classify the document to skip the classification step. Eventually, you can skip pre-classification because the classification model will have been trained to differentiate between the documents of that type and invalid documents.

Before you begin

Evaluate whether documents are suitable

Because IDP classifies documents, it is important to use document types that can be easily categorized by a user. If a user can't classify a document by reading it, the AI classification model likely won't be able to either. For example, a human reader that is familiar with invoices and purchase orders would be able to easily classify them based on the information they contain.

Also, make sure you understand the documents that work best for document extraction by referring to the Appian Document Extraction page.

Reset the configuration

If you have previously configured your document channel with dummy documents or document types, we recommend having the application administrator reset the configuration.

To reset the configuration:

  1. In the dudocchannel database table, for the document channel you are configuring, update the following values:
    • modelid: NULL
    • modeltrainedon: NULL
    • numdocsfortraining: 0
    • invalidtypeincludedinmodel: 0
  2. In the dudocunderstanding database table, delete all of the rows where the channelid matches the channelid of your document channel.

Ensure the latest version of IDP is installed

Before you get started, we recommend making sure the latest version of the IDP application is installed.

The latest version of IDP is version 1.1.

Compare the latest version with the application version displayed in the ABOUT tab. If the installed version is behind the latest version, contact your application administrator about installing the latest version.

IDP application version number

Set up document types

By default, the application includes Invoice, Purchase Order, Claim, and Receipt document types. If you need to extract more or fewer fields than what is available out of the box for these document types, see Modifying Fields for Document Types. If you need to extract data from different document types than what is provided out of the box, see Adding a Document Type.

Step 1: Start the Configuration workflow

Whether you are configuring the application for the first time, or updating the configuration settings, you will access the configuration settings on the CONFIGURE tab.

Keep in mind that only system administrators and members of certain groups have access to the CONFIGURE tab. If you don't see the tab, contact the application administrators. See the Groups Reference Page for more information about what actions the different user groups can take in IDP.

configure_idp_configure_tab

Whether you are running the configuration for the first time, or editing an existing configuration, you will see the current configuration settings in a grid. To edit the configuration, click the edit icon configure_idp_edit_icon.png in the right-most column of the grid.

configure_idp_configure_process

Step 2: Select document types

The next step in configuring IDP is to select the document types that you want to process.

By default, the application includes Invoice, Purchase Order, Claim, and Receipt document types. If you need to extract more or fewer fields than what is available out of the box for these document types, see Modifying Fields for Document Types. If you need to extract data from different document types than what is provided out of the box, see Adding a Document Type.

If you do not need document classification, see Skipping document classification for more information on how to perform this step.

To select the document types:

  1. Click the document types that you want to process. Selected document types are blue.
    • Note: Hover over the information symbol configure_idp_info_icon to view the fields that each document type extracts.
  2. Click NEXT.

configure_idp_select_document_types

Step 3: Upload example documents

Next, you will upload example documents to train the AI classification model to learn the characteristics that are common for each document type. Keep in mind that this process only trains the classification model. It does not affect the document extraction.

The documents you upload must be ZIP files containing PDFs.

When you are collecting documents to upload for training, choose a representative set of the actual documents that you will want to process. The better this sample set matches your actual documents, the better the classification model will perform.

If you retrain the classification model, you will need a new set of documents for the training. If documents are the same or too similar to documents that were previously uploaded, the process model might encounter an error during the training and time out.

If you do not need document classification, see Skipping document classification for more information on how to perform this step.

To upload your example documents:

  1. For each document type, prepare a ZIP file that contains example PDFs of the document type. The model predictions will be more accurate if you follow these guidelines:
    • Upload documents that are representative of the actual documents you will be using.
    • Upload as many documents as possible. It is important to upload at least 10 documents to avoid an exception during the classification training. The more documents you include, the more accurate the model will be.
    • Make sure the documents are sufficiently different from each other. Otherwise they may be treated as duplicates which could possibly cause an exception in the model training.
  2. For the Invalid document type, you may want to prepare a ZIP file of documents that you want to automatically classify as invalid.
    • Example: If you want to process order forms and users tend to mistakenly send you unsigned forms when they are supposed to send you signed forms, you could upload a batch of unsigned order forms.
  3. Click UPLOAD, then choose the ZIP file for that document type. Uploading example documents for the Invalid document type is optional.
  4. Click NEXT.

configure_idp_upload_ex_docs

Step 4: Set preferences and start training the AI classification model

The final step is to set the preferences for your configuration.

After you set these preferences, you can start the training for the AI classification model.

To determine an appropriate confidence threshold, it may be helpful to understand a couple of concepts.

A confidence interval is a range of plausible values for an unknown parameter.

The confidence interval has a confidence level that the true parameter is in the proposed range.

If a prediction has a 95% confidence level, then it has a 95% probability of containing the true parameter. If the confidence level meets the confidence threshold, then the document will be auto-classified.

To set the IDP preferences and start the training:

  1. For Classification Confidence Threshold, enter the percentage of confidence that the AI classification model must meet when it makes a prediction. If it doesn't meet this threshold, a task will be created for a user to confirm the classification.
    • Tip: A higher threshold percentage, such as 95%, means fewer documents will be auto-classified incorrectly, but it will also increase the number of documents that must be manually classified.
  2. For Documents for Retraining, enter the number of documents that will need to be classified before triggering the retraining of the AI classification machine learning model. This enables the AI classification model to improve over time. A lower number allows the model to learn more quickly, but also increases the Google Cloud Platform costs. This could cause you to hit your Appian AI or Google account limits more quickly.
  3. For Extraction Confidence Threshold, enter the percentage of confidence that the data must meet when it is extracted from a document. If it doesn't meet this threshold, the value will not be automatically populated. During the reconciliation task, a user will need to provide the value.
    • Tip: As with the classification threshold, a higher confidence percentage will increase auto-extraction accuracy, but it will also increase the number of fields that need to be populated by the user.
  4. Select users to add to the Managers, Reconciliation Desk Members, Editors, and Viewers groups. See the Groups Reference Page for more information on the access level that each group provides.
  5. Click TRAIN MODEL.

configure_idp_set_preferences

The training can take several hours. You will receive an email when the training is complete. Once the training is complete, users can start to classify, extract, and reconcile documents. See the User Guide for instructions on how to get started using the application.

If there is an issue with the classification model training, contact your application administrator who can follow the troubleshooting steps.

Troubleshooting the classification model training

If you receive an email notification about an error with the classification model training, refer to the table below to troubleshoot the cause of the error.

After you have determined the cause of the error:

  1. Cancel the DU Configure Document Understanding process instance.
    • Note: It is not possible to manage the configuration on the configure tab until the process instance is completed or canceled.
  2. Have the user who initiated the configuration repeat the document channel configuration.
Issue Resolution
I don't need to classify documents, but the classification model training was triggered.
  • Make sure that only one document type was selected.
  • Make sure example documents for the Invalid document type were not uploaded.
  • Make sure the configuration was reset before the configuration was started.
There was an error when uploading the training files.
  • Make sure the service account has the correct permissions.
  • Make sure the DU_STORAGE_CLOUD_BUCKET constant reflects the bucket you're using.
There was an error when uploading the dataset CSV file.
  • Make sure the service account has the correct permissions.
  • Make sure the DU_AUTOML_CLOUD_BUCKET constant reflects the bucket you're using.
There was an error when creating the training dataset.
There was an error when importing the training dataset.
  • Make sure the service account has the correct permissions.
  • Make sure there are no duplicates in your training dataset. This includes the ZIP file uploaded for training and documents uploaded for processing.
  • Make sure the bucket you're using matches the DU_STORAGE_CLOUD_BUCKET constant and is a persistent storage bucket.
There was an error when deploying the model.
  • Make sure the service account has the correct permissions.
  • Make sure the region in the DU_CLOUD_REGION constant is correct. For storage buckets in the US region, it should be us-central1. For storage buckets in the EU region, it should be either europe-north1, europe-west1, europe-west2, europe-west3, or europe-west4 depending on your storage bucket set up.
  • Make sure the region is supported by Google Cloud.
Open in Github Built: Fri, Jun 03, 2022 (01:08:29 PM)

On This Page

FEEDBACK